Vexera - Vexera unresponsive – Incident details

All systems operational

Vexera unresponsive

Resolved
Major outage
Started over 2 years agoLasted 20 minutes

Affected

Bot

Major outage from 6:45 PM to 7:05 PM

Commands processing

Major outage from 6:45 PM to 7:05 PM

Updates
  • Resolved
    Resolved

    Vexera was unresponsive for about 10 minutes (18:44-18:53 UTC). This issue is due to the deployment of a bogus update.

    The update was quickly tested on our beta instance before the deployment, and no issues were found (we used only very simple tests/commands). Edit: See at the end
    The issue was therefore only identified when the update got deployed to our production instance, as alarms about high errors rate fired.

    We immediately initiated a rollback, but an issue in our orchestrator prevented it from automatically receiving the rollback instruction, delaying the recovery of Vexera.

    Please accept our apologies for this outage 😦


    Edit: After investigation, it looks like we had almost zero chances to identify the bug.
    The bug occurs only if the user sending the message doesn't have any configuration option set in our database (eg. if you never used +userlocale or such commands). We actually have user config set on our accounts on both our dev and beta instances…

  • Monitoring
    Monitoring

    We rolled back an update and currently monitoring the result.

  • Investigating
    Investigating

    Our monitoring system fired an alert regarding elevated errors rate. We are investigating!