Analytics is only useful if your data is clean — especially if you have custom fields. However, it's often difficult to get people to fill out custom fields properly. Instead, you can configure Acrolinx to extract data from your document and automatically populate your custom fields. No human intervention necessary.
For example, suppose that you have custom fields for "product" and "department" that need to be filled out for each document that you check. You can configure Acrolinx so that whenever someone checks a document, the product and department fields are filled out automatically. This feature only works with structured documents. For example, a suitable document could be an XML document that has "product" and "department" attributes in the header section. In this case, you would write XPath expressions for your custom fields so that Acrolinx knows where to find the relevant data.
To configure automatic data mapping, follow these major steps:
Before you start collecting data, make sure that your document-level fields are configured to receive this data. This feature is only available for document-level fields because document-level data isn't that useful for any other field types.
To configure custom fields to receive data from your content, follow these steps:
-
Navigate to Reporting > Analytics settings > Custom Fields and select the DOCUMENTS tab.
-
In the Input Type column, select the option From Content for each relevant field.
-
-
If the field doesn't exist yet, click ADD FIELD and choose From Content as the Input Type.
-
Once you've set up your fields, you can head over to the Content Profiles section for the next step.
-
Open the Content Profile for your intended document type.
For example, you might want to extract data from HTML meta tags. Let's suppose that you already have a Content Profile for HTML files called "Published HTML". Open that Content Profile and select the DOCUMENT INFORMATION tab. You should see the fields that you configured in the previous step. For example, if you configured the "Product" and "Department" fields, you should now see them on the DOCUMENT INFORMATION tab.
-
In the location field, enter an XPath that defines where to find your data.
For example, suppose that you want to extract the product name from the following meta tag.
<meta name="Product.Name" content="Widget Detection API">
In this case, you would enter the following XPath.
//meta[@name="Product.Name"]/@content
Your changes take effect immediately.
The following table shows some more examples of XPaths for different file formats.
Document Type |
Target Content |
Corresponding XPath |
---|---|---|
DITA |
The topic title in the following code block. <!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd"> <topic id="topic_hdx_w2s_2p"> <title>git: Distributed and Shared Access to Content</title> |
/topic/title |
HTML |
The document name in the following code block. <meta name="Product.Version" content="Version 6.2.173"> <meta name="Document.Name" content="Widget Detection API Developer's Guide"> <meta name="Document.Id" content="192893721"> <meta name="Date.Created" content="2019/2/1, 16:08 (GMT)"> |
//meta[@name="Document.Name"]/@content |
Once you've set up a custom field, it's time to see if it works!
To test your data-mapping configuration, follow these steps:
-
Check a document that will match the Content Profile.
For example, if you just edited the "Published HTML" Content Profile, you would check an HTML file published on your website.
After the check has finished, open the Scorecard and check that the matching Content Profile is the one that you just updated.
-
Go to Reporting > Analytics > Scorecard Archive.
-
Select your field:value combination in the Field filter.
For example, suppose that you checked a document with the following meta tag
<meta name="Product.Name" content="My Test Product">
You want to see if Acrolinx correctly extracted the value "My Test Product. In this case, you'd look for "Product: My Test Product" in the Field filter.
The following screenshot shows how the field filter would look when data was successfully extracted from our example meta tag:
When you click the filter value, you should see the Scorecard for the file you just checked. This means that your data is being extracted correctly. Acrolinx automatically detected the product name from the metadata in the document.
If you didn't see the results you expected, check the extraction settings in the Content Profile. Acrolinx can't extract data from elements that are excluded or ignored.