System malfunction due to certificate expiry
Aug 19 at 02:29pm CEST
Every year in July or August, the main SSL certificate for capptions.com and other assets needs to be renewed. Since 2021 certificates are managed through our central infrastructure repository and automatically deployed through Terraform.
To make sure that the renewal process proceeds without any glitches or outages we do have the following reminders in place:
- Automated reminders from our certificate provider
- Calendar based reminder for manual check
Both reminders were acknowledged and we did internally verify the working of the renewed certificates. However on the morning of August 19th 2023 we started getting notifications from a few customers that they couldn't log in or that there were issues related to uploading of images.
After reviewing the situation we found that for the following assets the certificate renewal were excluded from the automated process and the checks were insufficient:
- cdn.capptions.com, serving static assets including images and scripts for the login pages, causing the main login pages not to work and the web apps not to render
- download.capptions.com, serving customer uploaded media, causing assets not to load or potentially breaking PDF generation
The main point of regret here is users not being able to use our web apps on Saturday morning (CET) and our health monitors not catching this issue.
Luckily after the first client reports we could swiftly recover the systems and have everything fully operational again within one hour.
We are now taking steps to prevent this issue from happening again by:
- Making sure the manual checks will catch these issues
- Fixing the automatic certificate deployment for mentioned assets
- Extending our automatic monitors to catch similar issues
Capptions Web Applications