Traefik is used as a reverse proxy, certificate manager and loadbalancer for sn06 and sn07.
It is deployed using an ansible-playbook which uses usegalaxy-eu’s Traefik role. This role internally initializes a swarm cluster on the target host, creates secrets, the specified network and docker swarm services.
Docker swarm was used for mainly two reasons:
| | |
| —————— | ————————– |
| Default user | rocky |
| IP | 132.230.103.37 |
| Host | traefik.galaxyproject.eu |
| Traefiks logs | /var/log/traefik |
To have a pretty output use hl:
sudo ./hl /var/log/traefik/traefik./glog --follow
The machine is a ESXi VM. The University provides this fancy dashboard. Log in with your university handle and password.
docker service ls shows the “services” which are similar to the ones in kubernetes.
docker service rm deletes the service and all its respective containers, in case you would like to start from clean slate.
docker service logs would not work with Traefik, because it writes directly to /var/log/traefik
To add new subdomains, add a new line to the file files/traefik/rules/template-subdomains.yml in the infrastructure-playbook repo. The language there is a go template for a yaml file, which might look similar to ansible at first. (In case you wonder about the syntax).
The line should look like the ones above, like this scheme:
Be careful, the word subdomain in the second colum needs to stay literaly “subdomain”, only the 3rd column is changed to the new subdomain, but without any .usegalaxy.eu.
Once this is deployed, Traefik will automatically create a router for it and fetch certificates for the subdomain as well as a wildcard certificate for ITs.
If you did everything correctly, the new router appears on Traefik’s dashboard.
aws.yml)route53hosted zonesusegalaxy.euA Record for usegalaxy.eu and point it to sn06’s public IP address (132.230.223.239)proxy_pass all requests to one headnode directly.
404 not foundMost likely something happened to the router.
files/traefik/rules/ usegalaxy-eu-router.yml for usegalaxy.eu and template-subdomains.yml for subdomains. Take a close look at the HostRegexp rule.servers in usegalaxy-eu-service.yml are correct and reachable.
Bad Gateway errorsn07.galaxyproject.eu directly. Traefik should automatically skip unhealth hosts, see the dashboard and docs/etc/traefik/acme.json and delete all its contents (not the file itself), then restart Traefik using docker restart <Traefik container name>usegalaxy-eu-service.ymlno available serverThis could mean that either
sudo tailscale up --accept-dns=false --advertise-tags=tag:critical
docker restart <Traefik container name>
This should not be necessary, because it is set in group_vars/all.yml
Could only appear when many certs have to be fetched newly at the same time.
AWS route53 has a harsh rate limit of 5 req/s, if Traefik tries to create and check the TXT records during DNS-01 challenge for all subdomains, this could result in >100 req/s. It will take some time and Traefik will get more and more certs. If you see error messages after 1h, you can try to restart Traefik.
Probably a egress issue with Docker networks. Recreating the bridge helped:
# systemctl stop docker
# iptables -t net -F
# ip link set docker0 down
# brctl delbr br100
# systemctl start docker