Converting a WordPress XML export file to JSON for MongoDB
Article

TJ Dahunsi
Feb 01 2016 · 1 min
WordPress conveniently lets users export their data into an XML file to do whatever with. In my case, I exported to import into a MongoDB database.
The steps required to do this consist of the following:
Removing unnecessary WordPress nodes.
Cleaning the resulting XML document.
Saving the new Document.
To do this I used the contents of the Java javax.xml packages, especially the XPathFactory to match specific expressions for the nodes I was interested in. I was then able to take the resulting cleaned up XML file, and put it in an online XML to JSON tool and then imported it to MongoDB.
Considering the actual blog content was under CDATA tags in the XML, it was necessary to properly escape unsafe HTML strings so they could be stored easily in JSON. Fortunately, Google's GSON library which I've had experience with on Android came in handy. The snippet that summarizes that follows:
static void encodeNode(Node nodeToEncode) { Gson gson = new GsonBuilder().create(); String unencodedString = nodeToEncode.getTextContent(); String encodedString = gson.toJson(unencodedString); String removedQuotes = encodedString.length() > 2 ? encodedString.substring(1, (encodedString.length() - 1)) : encodedString; nodeToEncode.setTextContent(removedQuotes); Document blogDocument = nodeToEncode.getOwnerDocument(); blogDocument.renameNode(nodeToEncode, null, "body"); }
It worked a charm, client side with Angular, parsing the encoded string was similar. The string returned in the line below was $compiled to the DOM in a directive.
content \= JSON.parse('"' \+ blogPost.body \+ '"');
The entire Java program can be found embedded in the Github gist below.