When you add content to a collection, Acrolinx visualizes each page of content with a content details card. You'll see a page title and the corresponding Acrolinx Score on each details card. But in some cases, your page titles might not show up correctly on your details cards. For example, a title might be listed as "Unknown."
To make sure that the preferred page title appears on each card, you can set a custom extraction.
Acrolinx scans for h1
, h2
, and title
tags by default. If you notice that the default settings don't extract the right page titles, you can define custom XPath-based extraction settings.
Define custom settings under Title Extraction > Set Title Extraction. Any custom settings that you make will override the default settings.
To set custom title extraction in Content Cube, do the following:
-
Go to Content Cube Profile and settings > Admin Console > Advanced.
-
Under Title Extraction > Set Title Extraction, add your preferred title extraction settings to the following columns:
Column
Description
Priority
For each page, Content Cube starts with the settings in the first row of the title extraction table. It applies the first XPath whose URL pattern matches the page URL. More specific settings need to be higher up in the table than more general settings.
ID
Unique internal identifier for each extraction setting.
URL Pattern
Regular expression applied to the page URL. If it matches, the associated XPath title source is evaluated for the page.
Title Source
An XPath expression that resolves to one or more nodes on the page (for example,
title
orh1
). If it resolves to a list of nodes, it'll use the first node that has nonempty text.If you want to add multiple extractions, click Add New for each new setting.
Note
Most of the time, the extraction settings are probably quite simple. For example:
-
Use the first title element:
//title
-
If h1 doesn't work, use title:
//h1 | //title
-
-
Optional: Test your title extraction.
-
Click Save.
Before you save your title extraction settings, it's possible to test them. You can do this in the Test Title Extraction tab.
To test your extraction settings, do the following:
-
Go to Content Cube > Profile and settings > Admin Console > Advanced > Title Extraction.
-
Open the Test Title Extraction tab. There, you'll see the following columns:
Column
Description
Page URL
URL for the page whose title you want to extract.
Outer HTML
To catch the HTML of Javascript-rendered apps, you'll need to use the developer tools to inspect the page and copy the HTML from there:
-
Open the page that you want to test in Google Chrome.
-
Right click anywhere on the open page.
-
In the pop-up menu, click Inspect. Then, right click on the
html
element that includes your title and go to Copy > Copy outer HTML.
Extracted Title
Title extracted from the outer HTML based on your settings.
Mapping ID
Tells you which rule was applied to extract the title and can:
-
correspond with an ID in the Set Title Extraction tab, or
-
be “fallback,” which means the default extraction will apply.
-
-
Add the URL for a test page to the Page URL column.
-
Paste a snippet of the page's outer HTML in the Outer HTML column.
-
Click Run.
After you click Run, you'll see a title and mapping ID in the Extracted Title and Mapping ID columns. For example: