If you pass the switch --baseline
on the command line, this program doesn't start the webhook receiver. Instead, it runs in a batch mode that downloads repositories and checks all their files. Baseline checking works like this:
-
Read the local configuration. It must have the
:data-repo-content-url
-
Read the remote configuration if available.
-
Read the remote repository state from the repository registry—this is a file called
repositories.edn
orrepositories.json
. You can find it in the same folder as the remote configuration file. We'll describe this file later on. -
If the file exists and there are repositories that don't have a
:last-baseline
value, proceed with those repositories. -
For each repository:
-
Extract the contents.
-
Load the repo-specific configuration if it exists.
-
Filter all files according to the current filter settings.
-
Check each relevant file with Acrolinx.
-
Create a summary of how many files were successfully checked.
-
Update the repository state file with the summary and the timestamps of the last baseline check (regardless of success). This update uses the BatchID of the baseline check in the commit message.
Usually, you run a baseline check regularly using Cron or something similar.
The repository registry is a file called repositories.edn
or repositories.json
. It lives in the folder that the :data-repo-content-url
points to and is never local.
This file uses either EDN or JSON format. It's a vector that consists of one (nested) two-element vector per repository. Each nested vector has the repository URL as the first element and a nested hash map as the second. Each nested hash map contains information for that repository like the timestamp of the last baseline check. When this program writes the file, it sorts the elements by repository URL.
Example:
[["https://api.github.com/repos/orga/my-repo-1/" {:last-baseline nil}] ["https://api.github.com/repos/orga/my-repo-2/" {:last-baseline #inst "2019-03-01T12:00:00.000-00:00" :last-total 5 :last-success 5}]]
This URL isn’t a URL you'd see in your browser. Instead, it’s an API repository URL that you get via the GitHub API call described at here.
For versions 1.7.0 and later, this file is automatically maintained. You don't need to add new repositories for baseline checking. These are added automatically during normal webhook operations.
Warning
If you create syntax errors in this file, you may lose information. Be careful when editing this file manually. This file is always in a git repository, so you always have the history to track all changes.
-
A syntax error can cause the baseline check to fail without warning. You may only find some log messages on the server-side.
-
Huge repositories may fail to download or may interfere with your daily Acrolinx usage by flooding Acrolinx with checks. Consider running baseline checks over the weekend or at night only.
-
To limit the number of parallel requests to Acrolinx, you can use the configuration file option
:acrolinx-parallel-workers
. The default value is 4. The maximum number of parallel requests made to Acrolinx is based on the smallest of the following values::acrolinx-parallel-workers
or number of available threads. -
To limit the size of the files that are processed through the baseline operation, you can use the configuration file option
:file-size-limit
. You'll specify this as:file-size-limit
"Number Unit"
, whereUnit
can beB
,kB
, orMB
. For example,:file-size-limit "1 MB"
. -
There’s no defined behavior for editing conflicts on the repository state file.
-
Baseline checks must not run at the same time with the same repository state file. There’s no defined behavior for conflicting runs.
-
All GitHub API limits and rate limits apply.