cgit ❤ Anubis
With the recent AI hype, the web is becoming a more hostile place, and most
people would like to have their “gitweb” instances protected from aggressive
and pointless crawling. People who self-host cgit are usually not the same
people who try to solve all bot problems with Cloudflare. Thankfully, there is
a tool called Anubis developed by Xe Iaso. It works by sitting in the
middle of your web server and the HTTP service you want to protect.
The problem is: cgit is a CGI application, as you could already guess from its
name, meaning that it does not listen on any HTTP port or socket. For each
incoming HTTP request, a new process is started, that reads some environment
variables, prints response to stdout and quits.
Normally I’d use a combination of fcgiwrap and spawn-fcgi to translate CGI into
FastCGI and make it work with Nginx, but I need to insert Anubis in the middle,
which does not speak either of the two protocols natively. So I’m going to use
uWSGI.
This guide is relevant for the given software versions:
- cgit 1.2.3_p20240802
- uWSGI 2.0.27
- Anubis 1.16.0
- Nginx 1.27.5
General overview
What happens when a client requests “https://git.example.com”?
- Nginx process (as user
nginx:nginx
) terminates TLS, does some stuff and
forwards the request to Anubis.
- Anubis process (as user
cgit:nginx
) either allows, denies or requires a
challenge. On success, forwards the request to the uWSGI HTTP socket.
- uWSGI process (as user
cgit:nginx
) spawns a new cgit process.
- cgit process generates a response, which is then returned to the client.
Configuration
uWSGI setup
My final uWSGI configuration looks as the follows:
# /etc/uwsgi.d/cgit.ini
[uwsgi]
# enable master process
master = true
# set/append a logger
logger = syslog
# load uWSGI plugins
plugins = cgi
# add a cgi mountpoint/directory/script
cgi = /var/www/localhost/cgi-bin/cgit.cgi
# bind to the specified UNIX/TCP socket using default protocol
# (NB: This socket will be used by Nginx later)
socket = /run/uwsgi_cgit/uwsgi_cgit.sock
# bind to the specified UNIX/TCP socket using HTTP protocol
# (NB: This socket will be used by Anubis later)
http-socket = /run/uwsgi_cgit/http_cgit.sock
# set uwsgi protocol modifier1
# (NB: CGI uses modifier '9')
http-modifier1 = 9
# force the specified modifier1 when using HTTP protocol
http-socket-modifier1 = 9
# set permissions for sockets
# (NB: It should be read-writeable by both Nginx and Anubis processes)
chmod-socket = 660
# setuid to the specified user/uid
uid = cgit
# setgid to the specified group/gid
gid = nginx
# set master process name
procname-master = uwsgi cgit
# spawn the specified number of workers/processes
processes = 1
# run each worker in prethreaded mode with the specified number of threads
threads = 2
Make sure that you have routing support in uWSGI and these plugins are
available:
- cgi
- corerouter
- http
- syslog
Useful links:
Anubis setup
It only makes sense to challenge paths that AI crawlers can scrape infinitely.
In fact, no bot should ever scrape gitweb instances, because it is magnitudes
more wasteful than cloning repositories and examining them locally.
In contrast, snapshots, Atom feeds and “About” pages should be always available
to everyone.
The following Anubis policy meets the conditions:
# /etc/anubis/cgit.policy.json
{
"bots": [
{
"name": "cgit-expensive",
"path_regex": "^/.+/(refs|log|tree|commit|diff)/.*$",
"action": "CHALLENGE"
}
]
}
You will also need to configure Anubis via environment variables or,
alternatively, using command-line flags:
# /etc/anubis/cgit.env
TARGET="unix:///run/uwsgi_cgit/http_cgit.sock"
BIND="/run/anubis_cgit/anubis.sock"
BIND_NETWORK="unix"
POLICY_FNAME="/etc/anubis/cgit.policy.json"
Useful links:
Nginx setup
I serve my cgit instance both via clearweb and overlay networks (Tor, I2P,
Yggdrasil). Since AI crawlers only know about the former, I don’t have to
challenge security-conscious people using the latter, especially given that
they often browse with JavaScript disabled.
If that’s not your case, you can proxy_pass to Anubis unconditionally.
location @cgit {
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-Ip $remote_addr;
if ($host = "git.example.com") {
proxy_pass http://unix:/run/anubis_cgit/anubis.sock;
break;
}
include uwsgi_params;
uwsgi_param HTTP_HOST $host;
uwsgi_modifier1 9;
if ($host != "git.example.com") {
uwsgi_pass unix:/run/uwsgi_cgit/uwsgi_cgit.sock;
break;
}
Make sure that you have the following Nginx modules installed:
Acknowledgements
I want to thank:
I hope this guide was helpful and not entirely wrong! Follow me on Fediverse
and read my occasional writings in the Geminispace.
Spotted a typo or bad grammar? See this file’s source in Markdown on my git
server.
Posted 2025-04-22