XML Java Library to process large XML dumps

droom1995 · May 7, 2020, 8:53pm

Hello everyone, I am now working on Wikipedia dump processing, similarly to what was done in the Extraction Framework. I am using Jackson for XML processing(https://www.baeldung.com/jackson-xml-serialization-and-deserialization)
While the library is quite useful for deserialization, I have encountered many issues while trying to process large XML files, mostly due to the files being larger than the total allocated memore.
Do you know of any Java libraries that work well with splitting XML for processing?

sophiaab98 · December 30, 2021, 7:39am

DOM Parser is the easiest java xml parser to learn. DOM parser loads the XML file into memory and we can traverse it node by node to parse the XML. DOM Parser is good for small files but when file size increases it performs slow and consumes more memory.
192.168…l00.1