Special Thanks:

I should give a cordial thanks to Mahbub Shahriar (Project Manager, KAZ Software Ltd. Bangladesh) for finding out the solution for this problem.

Problem:

Sometimes it is required to keep the special characters same in the output XML file as in the input XML file.

Solution:

For example:

input.xml


<?xml version="1.0" encoding="UTF-8"?>
<abc>
<def>'single quotation' "double quotation" Apostrophe' Apostrophe&apos; &quot;Quotation&quot; A&amp;B</def>
</abc>

We want output should be as it is in the output. That is special characters should not be changed and also the characters single quotation (‘), double quotation (“) should also not be changed.

If we read the input xml file (input.xml) file by the following way


SAXReader reader = new SAXReader();//DEFAULT SAXReader is used.
reader.setValidation(false);
reader.setFeature(feature, false);
Document inputDoc = reader.read("input/input.xml");

Reader will read &quote; as ” and ‘ as ‘ and when it will be written to the output file special characters will not be seen.

A little trick is used in the following Java program so that this problem is resolved. There you will see MySAXParser() is used as argument which extends SAXParser().

//LOOK !! Here default is not taken.
SAXReader reader = new SAXReader(new MySAXParser());
reader.setValidation(false);
reader.setFeature(feature, false);
Document inputDoc = reader.read("input/input.xml");


//MySAXParser.java
public class MySAXParser extends SAXParser {

private String entityNane;

@Override
public void characters(XMLString text, Augmentations augs) throws XNIException {
if (this.entityNane != null) {
char[] charArray = this.entityNane.toCharArray();

text.setValues(charArray, 0, charArray.length);

this.entityNane = null;
}
super.characters(text, augs);
}

@Override
public void startGeneralEntity(String name, XMLResourceIdentifier identifier, String encoding, Augmentations augs) throws XNIException {
super.startGeneralEntity(name, identifier, encoding, augs);

this.entityNane = “&” + name + “;”;
}
}

So now what’s the trick here in MySAXParser.java?

Normally all texts in the XML file is run through the method

public void characters(

but if special characters are encountered then the method

public void startGeneralEntity(

is called first and then public void characters( is called.

Here the trick is done. The trick is very easy.

Jar-files:

dom4j-1.5.1.jar
jaxen-1.1-beta-8.jar
xercesImpl.jar

Source Codes:

//ADom4jDemoMain.java

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;

import org.dom4j.Document;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;
import org.dom4j.tree.FlyweightProcessingInstruction;

public class ADom4jDemoMain {

public static String feature = “http://apache.org/xml/features/nonvalidating/load-external-dtd&#8221;;

public static void main(String[] args) {

//This will change in the output
try {
SAXReader reader = new SAXReader();//LOOK HERE DEFAULT SAXReader is read.
reader.setValidation(false);
reader.setFeature(feature, false);
Document inputDoc = reader.read(“input/input.xml”);
writeDocument(inputDoc, new File(“output/changed_output.xml”));
}catch(Exception e) {
e.printStackTrace();
}

//This WILL NOT change in the output.
try {
SAXReader reader = new SAXReader(new MySAXParser());//LOOK !! Here default is not taken.
reader.setValidation(false);
reader.setFeature(feature, false);
Document inputDoc = reader.read(“input/input.xml”);
writeDocument(inputDoc, new File(“output/same_output.xml”));
}catch(Exception e) {
e.printStackTrace();
}

}

public synchronized static void writeDocument(Document doc, File df) {
System.out.println((df.exists() ? “Overwriting” : “Creating”) + ” file – ” + df);
df.getParentFile().mkdirs();
try {
OutputFormat format = OutputFormat.createPrettyPrint();
format.setIndentSize(4);
XMLWriter writer = new XMLWriter(new OutputStreamWriter(new FileOutputStream(df), “UTF-8”),format);
FlyweightProcessingInstruction xmlDeclaration = new FlyweightProcessingInstruction(“xml”,
“version=\”1.0\” encoding=\”UTF-8\””);
writer.setEscapeText(false);//Look escaping is turned off from writer.
writer.write(xmlDeclaration);
writer.write(doc.processingInstructions());
writer.write(doc.getDocType());
writer.write(doc.getRootElement());
writer.close();

} catch (Exception e) {
e.printStackTrace();
}
}
}


//MySAXParser.java

import org.apache.xerces.parsers.SAXParser;
import org.apache.xerces.xni.Augmentations;
import org.apache.xerces.xni.XMLResourceIdentifier;
import org.apache.xerces.xni.XMLString;
import org.apache.xerces.xni.XNIException;

public class MySAXParser extends SAXParser {

private String entityNane;

@Override
public void characters(XMLString text, Augmentations augs) throws XNIException {
if (this.entityNane != null) {
char[] charArray = this.entityNane.toCharArray();
text.setValues(charArray, 0, charArray.length);

this.entityNane = null;
}
super.characters(text, augs);
}

@Override
public void startGeneralEntity(String name, XMLResourceIdentifier identifier, String encoding, Augmentations augs) throws XNIException {
super.startGeneralEntity(name, identifier, encoding, augs);

this.entityNane = “&” + name + “;”;
}
}

Advertisements