Working with non-ASCII XML tags

This section discusses the issue of non-ASCII tags in XML documents. A DTD is used as an example. The same approach can also be applied to XML schema.

This example shows a DTD and XML file with non-ASCII tags. Because the system cannot natively handle this, the solution is to substitute the non-ASCII tags with valid ASCII sequences. The XML file can then be handled.

After translation, ASCII sequences are substituted back to their non-ASCII values.

For example:

dtd: EXâMPLE2.dtd

<?xml encoding="ISO8859-1"?>
     <!ELEMENT EXâMPLE2 (â,B,C)+>
<!ELEMENT â (#PCDATA)>
<!ELEMENT B (#PCDATA)>
<!ELEMENT C (#PCDATA)>

XML file

<EXâMPLE2><â>TOSHâO</â><B>KABUTO</B><C>rømår</C>
</EXâMPLE2>

To perform this substitution:

  1. Replace the â in the DTD with a two-character ASCII substitute. This involves renaming the file and substituting the non-ASCII characters. You can use _a as a substitute for â. Do the substitution with an editor or use Tcl commands.
  2. Make your translation using this modified DTD. If you are translating from non-XML into XML, then use a callout at the end of the translate to substitute the message back to non-ASCII.

The CALL operation creates a new message with the _a substituted back to â. The SUPPRESS operation prevents the default message from being generated.

If your translation goes from XML, then you can use a translate pre-procedure to substitute out the non-ASCII tags before translating. You can also put the post-processing into a translate post-procedure. The CALL is best to use because it works with the translate tester.

Using XML TRXID determination

When you use XML TRXID determination, the TRXID is extracted from the XML message before any translate procedures are run. Therefore, the TRXID contains non-ASCII values that cannot be routed. If you are using XML TRXID, then put the procedure to substitute out the non-ASCII characters in inbound TPS processing. By doing so, the TRXID does not contain non-ASCII characters.