We use SystemD instead of supervisor. This makes life much better for us. Previously edits to supervisor files was not trivial and some of the configuration is a bit esoteric. Scaling up and down was not a thing we could do without restarting zerglings.
Zerg mode is designed such that you can just attach more to your pool in order to scale up. Previously we had been using a more advanced configuration with spinningfifo but were not able to use any of the benefits of this. So we have switched to a simplified version which should offer similar benefits.
We will keep a pool of zerglings attached, galaxy-zergling@{0..3}
. By default we will keep only 0 and 1 enabled at boot, under heavy load or a restart of zerglings we can enable additional ones as needed. These zerglings are configured with fewer processes than our previous heavy 8-12 process zerglings. By doing this, instead of running two heavy 8-proc zergs during a restart process, we can more marginally increase load by only adding a 4-proc zerg.
The current status of each process (compare to supervisorctl status
):
galaxy@sn04:~$ gxadmin local status
galaxy-zergpool: Active: active (running) since Wed 2019-03-20 14:51:55 CET; 21h ago
galaxy-zergling@0: Active: active (running) since Thu 2019-03-21 12:03:02 CET; 1min 3s ago
galaxy-zergling@1: Active: active (running) since Thu 2019-03-21 10:41:54 CET; 1h 22min ago
galaxy-zergling@2: Active: inactive (dead)
galaxy-zergling@3: Active: inactive (dead)
The current memory usage of each process is available (no supervisor equivalent):
galaxy@sn04:~$ gxadmin local memory
galaxy-zergpool: Memory: 2.8M (limit: 8.0G)
galaxy-zergling@0: Memory: 238.8M (limit: 16.0G)
galaxy-zergling@1: Memory: 10.5G (limit: 16.0G)
galaxy-zergling@2:
galaxy-zergling@3:
Equivalent to supervisorctl restart hd:
systemctl restart 'galaxy-handler@*'
In theory galaxy-handler@{0..11}
should also work.
Equivalent to supervisorctl restart hd:handler_main_1
systemctl restart galaxy-handler@1
This is the unintelligent way of doing it.
$ gxadmin local zerg-swap
In the future this will be automated, but the intelligent way is:
$ systemctl start galaxy-zergling@2 # One that is off
wait until this comes up (stats addresses are 127.0.0.1:401# where # is the part after @, you can just curl 127.0.0.1:4012
a few times until this comes up. gxadmin has a built in function but it isn’t exposed.)
Once it’s up, there should be three zerglings running. We can do something like:
for i in {1..50}; do curl https://usegalaxy.eu/api/version; done;
to check that everything is responding. It is likely you’ll be spread across all three zerglings. If nothing errors, then you can go ahead and restart 1/2 in series.
This should be an extremely rare phenomenon. This should only occur if:
Then it might be OK to scale up.
Scaling down should be done when it returns to normal.