What is (and why) anubis
Many libraries are facing issues with AI robots aggressively harvesting data which often nearly accounts to DDOS attack.
What is Anubis?
Anubis is a JavaScript-based protection tool designed to stop bots, scrapers, and automated crawlers from accessing a website.
How it works
1. A visitor accesses the website.
Instead of immediately loading the site, the visitor’s browser is given a small JavaScript challenge.
2. Anubis runs JavaScript in the browser of the end user.
It checks whether the environment behaves like a real user ie checking if JavaScript is enabled and functioning
3. Based on the result, Anubis decides:
– If the request looks like it’s from a real user, access is granted to the site.
– If it looks like a bot, access is blocked or redirected.
Why it works
Most crawlers and scrapers don’t run JavaScript or behave like real browsers. They often send requests too quickly, use inconsistent headers, or skip browser rendering entirely. Because of this, they fail the challenge and are blocked before they can access the actual site.
Technical side
Anubis acts as a „filter proxy,“ placed between Koha’s Apache server and the end users. This guide is intended for those who prefer not to dive into technical details.
The simplest approach is configuring Apache on ports 80 and 443. Usually, port 80 either redirects traffic to the encrypted port 443 or is accessible only internally (less common).
Here, we’ll utilize port 80 as the Koha entry point—but note, we will bind Apache only to localhost
. The intranet part of Koha may remain unchanged.
# OPAC
<VirtualHost localhost:80>
<IfVersion >= 2.4>
Define instance "<instancename>"
</IfVersion>
Include /etc/koha/apache-shared.conf
Include /etc/koha/apache-shared-opac-plack.conf
Include /etc/koha/apache-shared-opac.conf
ServerName <instancename>
ServerAlias <instancename>
SetEnv KOHA_CONF "/etc/koha/sites/<instancename>/koha-conf.xml"
AssignUserID <instancename>-koha <instancename>-koha
ErrorLog log
TransferLog log
</VirtualHost>
In the SSL configuration, we remove all Koha-specific settings and instead redirect all traffic to Anubis.
<VirtualHost *:443>
<IfVersion >= 2.4>
Define instance "<instancename>"
</IfVersion>
ServerName <servername>
AssignUserID <
SSLEngine on
Include /etc/apache2/ssl/ssl-options.conf
SSLCertificateFile /etc/letsencrypt/live/catalog.<instancename>.de/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/catalog.<instancename>.de/privkey.pem
Include /etc/letsencrypt/options-ssl-apache.conf
RequestHeader set "X-Real-Ip" expr=%{REMOTE_ADDR}
RequestHeader set X-Forwarded-Proto "https"
ProxyPass / http://localhost:8082/
ProxyPassReverse / http://localhost:8082/
</VirtualHost>
Now let us set up an Anubis as a proxy to filter out the bot traffic going to Koha.
It is going to replace our HTTPS listener, and route deserving traffic to an unsecured Koha that’s not reachable from the outside world.
First we install our anubis
binary somewhere sensible, like /usr/local/bin
.
We’ll use systemd to manage anubis, so let’s create a systemd service file looking like this:
[Unit]
Description="Anubis HTTP defense proxy"
[Service]
ExecStart=/usr/local/bin/anubis
Restart=always
RestartSec=30s
EnvironmentFile=/etc/anubis/env
LimitNOFILE=infinity
DynamicUser=yes
CacheDirectory=anubis/hks3
CacheDirectoryMode=0755
StateDirectory=anubis/hks3
StateDirectoryMode=0755
ReadWritePaths=/run
[Install]
WantedBy=multi-user.target
in /etc/systemd/system/anubis.service
. Don’t forget to run systemctl daemon-reload
so that systemd picks the file up.
We have the executable, but we still need some configuration. Our /etc/anubis.env will look like this:
BIND=localhost:8082
BIND_NETWORK=tcp
DIFFICULTY=4
POLICY_FNAME=/etc/anubis/botPolicies.json
TARGET=http://localhost
# random, openssl rand -hex 32
ED25519_PRIVATE_KEY_HEX=897077318cbba0b62a6e43494dd69a3485b9a184ebb7b6145d6eecc605ac169d
The botPolicies.json file should look somewhat like this
{
"bots": [
{
"name": "well-known",
"path_regex": "^/.well-known/.*$",
"action": "ALLOW"
},
{
"name": "API",
"path_regex": "^/api/.*$",
"action": "ALLOW"
},
{
"name": "favicon",
"path_regex": "^/favicon.ico$",
"action": "ALLOW"
},
{
"name": "robots-txt",
"path_regex": "^/robots.txt$",
"action": "ALLOW"
},
{
"name": "everyone",
"user_agent_regex": ".",
"action": "CHALLENGE"
}
]
}
Now everything should be set up, we should be ready to go! Let’s first start anubis and verify that it works
systemctl start anubis
systemctl status anubis
The logline should show the target (being http://localhost) and our other settings, indicating that the env
file was loaded properly.
If you still want/need the 80 => 443 rewrite you may add
RewriteEngine on
RewriteCond %{SERVER_NAME} = <instancename> RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [END,NE,R=permanent]
To your apache default.conf (eg 000-default.conf)
Now let’s apply our new apache2 configuration
systemctl restart apache2
If we now navigate to our Koha instance, we should see a short (at least for DIFFICULTY=4
that we configured) splash screen verifying that we’re not a primitive bot. This process will only happen once, and then as long as we have a within.website-x-cmd-anubis-auth
cookie set, we will be able to use Koha without further interruptions — and hopefully much faster than before, now that all the pesky AI crawlers are bouncing off the Anubis‘ challenge.
If you are using anubis please consider donating https://anubis.techaro.lol/docs/funding