Last but not least, XSLT1.0/2.0 are non-streaming, meaning that they will read the entire source document into memory, so this clearly excludes XSLT from the possibilities.
From here: St AX2 is an experimental API that is intended to extend basic St AX specifications in a way that allows implementations to experiment with features before they end up in the actual St AX specification (if they do).
As such, it is intended to be freely implementable by all St AX implementations same way as St AX, but without going through a formal JCP process.
The ideal candidate is in this case of course St AX.
I'm not going into the SAX ↔ St AX comparison here, fact is that St AX is able to validate against schema's (at least some parsers can) and can also write XML.
What I understand from this is that its required for handling default namespaces.
Fact is that if it is not enabled the default namespace is not written in any way. Setting it does not actually write it to the stream.Our best option is to create some pre-processing tool that will first split the big file in multiple smaller chunks before they are processed by the middle-ware.The XML file comes with a corresponding W3C schema, consisting of a mandatory header part followed by a content element which has several 0..* data elements nested.This will split after a fixed number of bytes leaving the XML corrupt for sure.I'm not really sure but tools such as Split don't know anything about encoding either. At first sight one could be tempted: using XSLT2.0 it is possible to create multiple output files from a single input file.Currently Woodstox is the only known implementation.The "open Output File And Write Header" will create a XMLStream Writer (which is again part form the cursor API, the iterator API has XMLEvent Writer) to which we can output or part of the original XML file.Both the source files, XSD and test XML can be accessed here on Git Hub.It has a Maven pom file so you should be able to import it in your IDE of choice.For the demo code I re-created the schema in simplified form: The header is neglectable in size.A single data element repetition is also pretty small, lets say less less then 50k B.