If you're familiar with the ins and outs of web crawling, you can customize your crawl with advanced crawl settings in Content Cube. These are useful if you only want to track certain parts of your website. It's also a good resource if Acrolinx hits any snags during a crawl.
To add advanced crawl settings, do the following:
-
Go to Reporting > Content Cube settings > Web crawling.
-
For a new domain or subdomain, click Add new domain.
-
For an existing domain or subdomain, click Edit crawl settings next to the domain name.
-
-
Choose the settings that you'd like to apply to your crawl:
Setting
Description
Never crawl URLs with query parameters
Turns off the option to specify query parameters. Selected by default.
Only crawl URLs with these query parameters
Specify the query parameters that you want the Acrolinx-bot to crawl.
Never crawl URLs with these query parameters
Specify the query parameters that you want the Acrolinx-bot to ignore.
Respect
nofollow
tagsThe Acrolinx-bot will ignore
nofollow
directives and crawl these pages.Respect
noindex
tagsThe Acrolinx-bot will ignore
noindex
directives and crawl these pages.Follow alternates
The Acrolinx-bot will crawl any links listed as "alternate."
Turn on AJAX crawling
The Acrolinx-bot will crawl AJAX applications.
Follow canonicals
The Acrolinx-bot will crawl any URLs mentioned in canonical tags.
Turn on JavaScript crawling
The Acrolinx-bot will crawl JavaScript-rendered content.
Follow HTTP redirects (3xx)
The Acrolinx-bot will crawl every page in a page's redirect chain.
Turn on mobile crawling
The Acrolinx-bot will identify itself as a mobile device.
Note
The Acrolinx-bot identifies itself as a desktop device by default.
Follow links on error pages (4xx and 5xx)
The Acrolinx-bot will crawl any links on 4xx and 5xx error pages.
Crawl behind sign-in
Provide HTTP basic authentication sign-in details for a password-protected site that you want the Acrolinx-bot to crawl.
Custom request headers
Specify any authentication headers needed for the Acrolinx-bot to access your content.
-
Click Save to start crawling.