Traefik is used as a reverse proxy, certificate manager and loadbalancer for sn06
and sn07
.
It is deployed using an ansible-playbook which uses usegalaxy-eu’s Traefik role. This role internally initializes a swarm cluster on the target host, creates secrets, the specified network and docker swarm services.
Docker swarm was used for mainly two reasons:
| | |
| —————— | ————————– |
| Default user | rocky
|
| IP | 132.230.103.37
|
| Host | traefik.galaxyproject.eu
|
| Traefiks logs | /var/log/traefik
|
To have a pretty output use hl
:
sudo ./hl /var/log/traefik/traefik./glog --follow
The machine is a ESXi VM. The University provides this fancy dashboard. Log in with your university handle and password.
docker service ls
shows the “services” which are similar to the ones in kubernetes.
docker service rm
deletes the service and all its respective containers, in case you would like to start from clean slate.
docker service logs
would not work with Traefik, because it writes directly to /var/log/traefik
To add new subdomains, add a new line to the file files/traefik/rules/template-subdomains.yml
in the infrastructure-playbook repo. The language there is a go
template for a yaml
file, which might look similar to ansible at first. (In case you wonder about the syntax).
The line should look like the ones above, like this scheme:
Be careful, the word subdomain
in the second colum needs to stay literaly “subdomain”, only the 3rd column is changed to the new subdomain, but without any .usegalaxy.eu
.
Once this is deployed, Traefik will automatically create a router for it and fetch certificates for the subdomain as well as a wildcard certificate for ITs.
If you did everything correctly, the new router appears on Traefik’s dashboard.
aws.yml
)route53
hosted zones
usegalaxy.eu
A Record
for usegalaxy.eu
and point it to sn06’s public IP address (132.230.223.239
)proxy_pass
all requests to one headnode directly.
404 not found
Most likely something happened to the router.
files/traefik/rules/
usegalaxy-eu-router.yml
for usegalaxy.eu and template-subdomains.yml
for subdomains. Take a close look at the HostRegexp
rule.servers
in usegalaxy-eu-service.yml
are correct and reachable.
Bad Gateway
errorsn07.galaxyproject.eu
directly. Traefik should automatically skip unhealth hosts, see the dashboard and docs/etc/traefik/acme.json
and delete all its contents (not the file itself), then restart Traefik using docker restart <Traefik container name>
usegalaxy-eu-service.yml
no available server
This could mean that either
sudo tailscale up --accept-dns=false --advertise-tags=tag:critical
docker restart <Traefik container name>
This should not be necessary, because it is set in group_vars/all.yml
Could only appear when many certs have to be fetched newly at the same time.
AWS route53
has a harsh rate limit of 5 req/s, if Traefik tries to create and check the TXT records
during DNS-01 challenge
for all subdomains, this could result in >100 req/s. It will take some time and Traefik will get more and more certs. If you see error messages after 1h, you can try to restart Traefik.
Probably a egress issue with Docker networks. Recreating the bridge helped:
# systemctl stop docker
# iptables -t net -F
# ip link set docker0 down
# brctl delbr br100
# systemctl start docker