cyber@sysrq ~ $ _


cgit ❤ Anubis

With the recent AI hype, the web is becoming a more hostile place, and most people would like to have their “gitweb” instances protected from aggressive and pointless crawling. People who self-host cgit are usually not the same people who try to solve all bot problems with Cloudflare. Thankfully, there is a tool called Anubis developed by Xe Iaso. It works by sitting in the middle of your web server and the HTTP service you want to protect.

The problem is: cgit is a CGI application, as you could already guess from its name, meaning that it does not listen on any HTTP port or socket. For each incoming HTTP request, a new process is started, that reads some environment variables, prints response to stdout and quits.

Normally I’d use a combination of fcgiwrap and spawn-fcgi to translate CGI into FastCGI and make it work with Nginx, but I need to insert Anubis in the middle, which does not speak either of the two protocols natively. So I’m going to use uWSGI.

This guide is relevant for the given software versions:

General overview

What happens when a client requests “https://git.example.com”?

  1. Nginx process (as user nginx:nginx) terminates TLS, does some stuff and forwards the request to Anubis.
  2. Anubis process (as user cgit:nginx) either allows, denies or requires a challenge. On success, forwards the request to the uWSGI HTTP socket.
  3. uWSGI process (as user cgit:nginx) spawns a new cgit process.
  4. cgit process generates a response, which is then returned to the client.

Configuration

uWSGI setup

My final uWSGI configuration looks as the follows:

# /etc/uwsgi.d/cgit.ini

[uwsgi]
# enable master process
master = true

# set/append a logger
logger = syslog

# load uWSGI plugins
plugins = cgi

# add a cgi mountpoint/directory/script
cgi = /var/www/localhost/cgi-bin/cgit.cgi

# bind to the specified UNIX/TCP socket using default protocol
# (NB: This socket will be used by Nginx later)
socket = /run/uwsgi_cgit/uwsgi_cgit.sock

# bind to the specified UNIX/TCP socket using HTTP protocol
# (NB: This socket will be used by Anubis later)
http-socket = /run/uwsgi_cgit/http_cgit.sock

# set uwsgi protocol modifier1
# (NB: CGI uses modifier '9')
http-modifier1 = 9

# force the specified modifier1 when using HTTP protocol
http-socket-modifier1 = 9

# set permissions for sockets
# (NB: It should be read-writeable by both Nginx and Anubis processes)
chmod-socket = 660

# setuid to the specified user/uid
uid = cgit

# setgid to the specified group/gid
gid = nginx

# set master process name
procname-master = uwsgi cgit

# spawn the specified number of workers/processes
processes = 1

# run each worker in prethreaded mode with the specified number of threads
threads = 2

Make sure that you have routing support in uWSGI and these plugins are available:

Useful links:

Anubis setup

It only makes sense to challenge paths that AI crawlers can scrape infinitely. In fact, no bot should ever scrape gitweb instances, because it is magnitudes more wasteful than cloning repositories and examining them locally.

In contrast, snapshots, Atom feeds and “About” pages should be always available to everyone.

The following Anubis policy meets the conditions:

# /etc/anubis/cgit.policy.json

{
    "bots": [
        {
            "name": "cgit-expensive",
            "path_regex": "^/.+/(refs|log|tree|commit|diff)/.*$",
            "action": "CHALLENGE"
        }
    ]
}

You will also need to configure Anubis via environment variables or, alternatively, using command-line flags:

# /etc/anubis/cgit.env

TARGET="unix:///run/uwsgi_cgit/http_cgit.sock"
BIND="/run/anubis_cgit/anubis.sock"
BIND_NETWORK="unix"
POLICY_FNAME="/etc/anubis/cgit.policy.json"

Useful links:

Nginx setup

I serve my cgit instance both via clearweb and overlay networks (Tor, I2P, Yggdrasil). Since AI crawlers only know about the former, I don’t have to challenge security-conscious people using the latter, especially given that they often browse with JavaScript disabled.

If that’s not your case, you can proxy_pass to Anubis unconditionally.

location @cgit {
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-Ip $remote_addr;
    if ($host = "git.example.com") {
        proxy_pass   http://unix:/run/anubis_cgit/anubis.sock;
        break;
    }

    include          uwsgi_params;
    uwsgi_param      HTTP_HOST $host;
    uwsgi_modifier1  9;
    if ($host != "git.example.com") {
        uwsgi_pass   unix:/run/uwsgi_cgit/uwsgi_cgit.sock;
        break;
    }

Make sure that you have the following Nginx modules installed:

Acknowledgements

I want to thank:


I hope this guide was helpful and not entirely wrong! Follow me on Fediverse and read my occasional writings in the Geminispace.

Spotted a typo or bad grammar? See this file’s source in Markdown on my git server.

Posted 2025-04-22