Postmortem

wassim belhedi
2 min readOct 4, 2020

Issue Summary

At 9:00pm GMT+1 in 25th September2020, i was notified that our company had a server outage. More specifically, one of their servers was returning a 500 Internal Server error. The root issue was a simple typo in one of the Wordpress configuration files. Specifically, the error existed on line 137 of the file /var/www/html/wp-settings.php. The path to a supporting library was entered incorrectly as /class-wp-locale.phpp instead of /class-wp-locale.php. The typo was corrected, and service was restored at 10:05pm. The service was offline for approximately 1hour. Impact on our customers was largely isolated to a three one period between 9pm and 10:00pm. Given the brevity of this outage, we can classify the incident’s impact as minimal.

Timeline for Friday 25th, 2020:

  • 9:00pm GMT+1— Outage occurs and notification was detected via monitoring alert.
  • 9:00pm GMT+1— received notification and immediately begins troubleshooting
  • 9:10pm GMT+1— A series of investigative curl’s yield nothing and the attempts a more advanced debugging with strace, as this was initially recommended in Holberton School’s notification of outage
  • 9:15pm GNT+1— No progress has been made with strace, maybe trying to combine both command together.
  • 9:25pm GMT+1— error was triggered with curl and strace on seperate terminals
  • 9:39pm GMT+1— Source of outage is determined and work begins to implement a fix
  • 10:04 pm GMT+1— Fix is run and service is restored

Issue

The outage was cause by a mistyped file extension in the Wordpress config file /var/www/html/wp-settings.php. The resource specified on line 137 was incorrectly entered as /class-wp-locale.phpp instead of /class-wp-locale.php. To resolve the issue, a puppet manifest was created. Using the exec resource this script ran a bash sed command to replace any occurrence of .phpp with .php. This script was adequate to fix the problem.

Future Prevention

We are putting measures in place to avoid a similar outage in the future. Namely, we are using datadog API and bash scripts to automatically fix these kind of errors..

Debugging Real story humour

Admiral Grace Hopper (an early computing pioneer better known for inventing COBOL) liked to tell a story in which a technician solved a glitch in the Harvard Mark II machine by pulling an actual insect out from between the contacts of one of its relays, and she subsequently promulgated bug in its hackish sense as a joke about the incident (though, as she was careful to admit, she was not there when it happened)

--

--