operations

Traefik

Traefik is used as a reverse proxy, certificate manager and loadbalancer for sn06 and sn07.
It is deployed using an ansible-playbook which uses usegalaxy-eu’s Traefik role. This role internally initializes a swarm cluster on the target host, creates secrets, the specified network and docker swarm services.
Docker swarm was used for mainly two reasons:

  1. Secret handling: Secrets are not saved inside env files, but are encrypted on disk and only available to the container they are attached to.
  2. Scalability and future failover safe deployments: By using Docker swarm you can quite easily add a second proxy node and use e.g. keepalived as ingress.
  3. Features: Traefik comes with many options for middlewares and Plugins. If you would like, for instance, to block certain IP ranges, this could be easily done using a plugin.

Basics

| | | | —————— | ————————– | | Default user | rocky |
| IP | 132.230.103.37 | | Host | traefik.galaxyproject.eu |
| Traefiks logs | /var/log/traefik |

To have a pretty output use hl:

sudo ./hl /var/log/traefik/traefik./glog --follow

The machine is a ESXi VM. The University provides this fancy dashboard. Log in with your university handle and password.

Docker swarm basics

docker service ls shows the “services” which are similar to the ones in kubernetes.
docker service rm deletes the service and all its respective containers, in case you would like to start from clean slate.
docker service logs would not work with Traefik, because it writes directly to /var/log/traefik

New subdomains

To add new subdomains, add a new line to the file files/traefik/rules/template-subdomains.yml in the infrastructure-playbook repo. The language there is a go template for a yaml file, which might look similar to ansible at first. (In case you wonder about the syntax).
The line should look like the ones above, like this scheme:


Be careful, the word subdomain in the second colum needs to stay literaly “subdomain”, only the 3rd column is changed to the new subdomain, but without any .usegalaxy.eu.
Once this is deployed, Traefik will automatically create a router for it and fetch certificates for the subdomain as well as a wildcard certificate for ITs.
If you did everything correctly, the new router appears on Traefik’s dashboard.

How to debug

🚑 Galaxy not reachable

no available server

This could mean that either

Self signed certificate warning

Could only appear when many certs have to be fetched newly at the same time.
AWS route53 has a harsh rate limit of 5 req/s, if Traefik tries to create and check the TXT records during DNS-01 challenge for all subdomains, this could result in >100 req/s. It will take some time and Traefik will get more and more certs. If you see error messages after 1h, you can try to restart Traefik.

Letsencrypt i/o error

Probably a egress issue with Docker networks. Recreating the bridge helped:

# systemctl stop docker
# iptables -t net -F
# ip link set docker0 down
# brctl delbr br100
# systemctl start docker