What is the Link Checker?
The link checker is a feature which enables you to check the validity of URLs used in documentation. When you check documents with an Acrolinx plug-in, the Acrolinx Server attempts to visit the URL destination to see if the URL is still valid. URLs which are not valid are listed in the Acrolinx Scorecard as style issues.
How Does Link Checking work?
When you run a check in a plug-in, the server detects URLs in the document according to specific rules .
The server attempts to retrieve HTTP header information from each new URL, and saves the URL in a link cache which is stored on the core server.
Each URL is saved in the link cache with a timestamp which indicates when it was checked and a status which indicates the validity of the URL.
Every 5 minutes the core server reviews the cache to see if there are any links that are due for rechecking. The default time interval for checking a link in the link cache is every 12 hours . The core server rechecks any links that are older than this limit, and updates the timestamp and status information.
When you check documents, the server searches the link cache for URLs that have already been validated during previous checks. If the server finds a match in the link cache, the server retrieves the validation information for the URL. If the status indicates an invalid link, the server passes the status information back to the plug-in or Batch Checker as a style issue.
You can view a CSV version of the cache in the server output directory:
You can also open the CSV version of the cache in a web browser with the URL:
Enabling and Configuring the Link Checker
To enable the Link Checker, follow these steps:
Open your overlay of the core server properties file.
You find the overlay for the core server properties file in the following location:
Add the following property:
Add one or more the following properties:
Property Description linkChecker.linkRepositoryPath Configure an alternative location for the CSV version of the link cache.
The default value is:
The value must be relative to the directory:
linkChecker.maxWaitTimeInMs Configure the maximum amount of time that the server can spend checking links during one check.
The default value is 1000 milliseconds.
The Batch Checker and plug-ins cannot display the check results until the link checker has validated all links or has exceded the maximum wait time (even if the server has completed all linguistic analysis for a document).
linkChecker.refreshIntervalInMin Configure how often the server rechecks each link in the link cache.
The default value is every 720 minutes (12 hours) after the link was last checked.
linkChecker.maxCacheSize Configure the maximum number of links that the cache can contain before the oldest links are overwritten.
The default value is 20,000 links.
linkChecker.flagRedirects=false Prevent redirected links from being flagged.
The default value is true.
- Save your changes and restart the core server.
How the Link Checker Detects and Flags Links
The link checker is enabled for checking when you select the Style checking option in the Acrolinx plug-ins or the Acrolinx Batch Checker.
You can use the link checker to check the following objects:
Links that are written in the text of a document.
Example: Please visit our website http://www.acrolinx.com for more information.
You can check links in all XML attributes except links to namespaces or document types.
Invalid links that are written in the text of a document are flagged as style issues, and are listed as style issues in the Acrolinx Scorecard.
If the link is redirected to an alternative destination, the originating link is flagged and the plug-in shortcut menu provides the redirected destination as a suggestion.
To prevent the link checker from flagging redirected links, you can also add the property linkChecker.flagRedirects=false to the core server properties file .
Links that appear as an attribute of an HTML or XML tag.
Please visit our <A HREF="http://www.acrolix.com"> website </A>
Please visit our website at <xref format="html" href="http://www.acrolinx.com" scope="external"></xref>
Invalid links that appear as an attribute of an HTML or XML tag are not flagged in the document, but are listed as style issues in the Scorecard. In version 2.5 or later of the Acrolinx Plug-in for Arbortext Editor you can also correct flagged link attributes in the corrections dialog box.Restriction: Currently, you can only check links in HTML or XML tags with version 1.2 or later of the Acrolinx Batch Checker or version 2.5 or later of the Acrolinx Plug-in for Arbortext Editor . All other Acrolinx plug-ins check links in the document text only. Additionally, links to namespaces and document types are ignored.
The link checker looks for any text that begins with "http://", "https://" or "www" and validates the link syntax according to the standard URL syntax guidelines . Any links which do not conform to the correct URL syntax are logged with the status 'ERROR' in the link cache.
Example of invalid link syntax: http://emptylink:nothing
Link Checker Statuses
The link checker saves the following status information in the link cache.
|NOT_CHECKED||The page has not yet been checked.|
|OK||Response code 200 to 210. The server has received HTTP header information from the link destination.|
|REDIRECTED||Response code 300 to 307. Could connect to the page, but the request was redirected.|
|PAGE_NOT_FOUND||Response code 404 from the web server. An UnknownHostException is also mapped to this state.|
|TIMEOUT||The request timed out.|
|ERROR||All other unknown server and client errors (response code 400-449, except 404, and 500-510). Also indicates java exceptions during the check procedure.|