Sometimes, the Batch Checker encounters problematic character entities that can’t be rendered correctly in the input text preview for XML, SGML, or HTML documents. You can configure the Batch Checker so that it attempts to convert the entities into a form that can be rendered correctly.
To convert the entities, the Batch Checker must refer to an entity conversion file . An entity conversion file contains instructions for the Batch Checker on how to interpret special character codes and entities. The Batch Checker comes installed with several entity conversion files that convert some standard entity types.
If an entity conversion file isn’t configured, the Batch Checker renders the codes for the entities instead of the intended characters. These entity codes are then sent to the server with the text, and can cause incorrect flags. This issue is common for SGML and rarer for XML and HTML files.
Entity Conversion Files That Come Installed with the Batch Checker
The Batch Checker uses the appropriate entity conversion file based on the selected CSD profile . The following entity conversion files come installed with the Batch Checker:
This file contains instructions for resolving both alphanumeric and Unicode entities.Tip: The default CSD profile for SGML files which comes installed with the Batch Checker is configured to use this entity conversion file.
This file contains instructions for resolving alphanumeric entities only.
This file contains instructions for resolving Unicode entities only.
The standard entity conversion files are stored in the following directory:
Using Entity Conversion with Check and Apply
The file ISO_entities.txt is sufficient for standard checks, but can’t be used in combination with the Check and Apply feature . The Check and Apply functionality can’t resolve entities if there are multiple entries for the same character in an entity conversion file.
If you use Check and Apply, you can configure entity conversion for Unicode or alphanumeric entities but not for both types of entities. Before using Check and Apply, configure your CSDs to use one of the following entity conversion files :
- ISO_entities-alphanumeric.txt - Use this file if your documents normally contain alphanumeric entities.
- ISO_entities-unicode.txt - Use this file if your file documents normally contain Unicode entities.
Creating or Editing an Entity Conversion File
To create or edit an entity conversion file, follow these steps:
- Create a new file in a text editor or open an existing entity conversion file.
For each conversion, type the character code and the desired output character separated by a
Example: ä ä
Specify each entity conversion on a separate row.Tip: If you don’t want to render the character for a particular entity, add a row containing the entity only. When no output character is provided, the entity is hidden during conversion.
- Save the text file in the directory <INSTALL_DIR>\BatchChecker\profiles using the same encoding as the documents which contain the specified entities (the documents you intend to check).
- (Follow this step if you have created a new entity conversion file) Configure the entity conversion file for checking .
Configuring an Entity Conversion File for Checking
To configure an entity conversion file for checking, follow these steps:
- Open an existing CSD profile or create a new CSD profile.
Add or edit the following properties:
entity_conversion_encoding=UTF-8Important: The entity_conversion_encoding property is necessary because the Batch Checker is unable to detect the encoding of the entity conversion file automatically. Using the following link for a list of the canonical names which this property accepts http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
- Save the CSD profile and restart the Batch Checker.
Use the CSD profile whenever you check a document that contains the
Tip: CSD profiles which use entity conversion have a heavy performance overhead. It may take up to nine times as long to analyze a document with entity conversion, depending on the size of the entity conversion file.