Vexera was unresponsive for about 10 minutes (18:44-18:53 UTC). This issue is due to the deployment of a bogus update.
The update was quickly tested on our beta instance before the deployment, and no issues were found (we used only very simple tests/commands). Edit: See at the end
The issue was therefore only identified when the update got deployed to our production instance, as alarms about high errors rate fired.
We immediately initiated a rollback, but an issue in our orchestrator prevented it from automatically receiving the rollback instruction, delaying the recovery of Vexera.
Please accept our apologies for this outage 😦
Edit: After investigation, it looks like we had almost zero chances to identify the bug.
The bug occurs only if the user sending the message doesn't have any configuration option set in our database (eg. if you never used +userlocale
or such commands). We actually have user config set on our accounts on both our dev and beta instances…