Erich Schonfeld has a post at Tech Crunch about how
Facebook recently experienced a data loss that resulted in evaporation of its user's e-mail notification preferences. He takes comfort in the fact that it was a loss that should provide only a minor, and short-lived, annoyance to users, but he also notes that had the preference been something that regarded privacy settings, the results could have been much worse.
As a developer, I have an idea of what might have happened in the data loss. I wouldn't want to be the person who was responsible for the data loss. It's good that we see these stories because it's a sobering reminder of the dangers of working with live data.
Almost every web-based application has to persist data. Facebook stores profile information in addition to preferences. E-mail services store messages and the preferences that govern the behavior of the application's user interface. Content management applications (blogs included) store posts, comments, and page layout information.
Usually this data goes into a relational database management system (RDBMS) like MySQL, MS SQL Server, or Oracle. But performance issues have
led some to question the database design best practices that have been in use over the past two or three decades. Others have one so far as to speculate that RDBMSes may be so ill suited to some data persistence uses that
one should consider using something else.
Regardless, the data you submit to a web application gets poked into some means of persistence, and then later on it is read by the application for generating pages (like this blog post) or making a decision (like whether to notify an administrator immediately when a blog post has been submitted for approval).
The world is neither perfect nor static. Sometimes a bug in an application can result in the creation of bogus database entries. It's not enough to stop the application from doing that again, it's also important to determine if the damaged data can be repaired. If the damage is on the order of hundreds or thousands of items, a script may need to be created to repair the damage.
Here is where things can get dangerous. Repair scripts sometimes need to make destructive changes (i.e. SQL
DELETE statements) to a table. It is very important for the script's author to have an understanding not only of the table being modified but also any quirks in the semantics of column entries.
For example, rows created early in the application's history may not have stored ID values for certain types of entries. When these things are not taken into account, a repair script may inadvertently delete rows that may not be valid in current application usage but are still essential pieces of data.
When things like this happen it's not always obvious that something has gone wrong in the application. For the most part things may work fine, but then all of the sudden people notice that information on certain pages is missing.
Times like this, it's nice to have a backup of the database from which to restore the lost data. My guess is that the Facebook development team decided that the loss was not severe enough to perform the recovery. On the other hand, if the decision was based on the lack of a recent backup to restore from, then shame on them.
Here at Compendium, we devoted a code review a couple months back to developing some standards and best practices for writing database repair and maintenance scripts. This helped us to move from an ad hoc patchwork of scripting styles to a more consistent, reliable, and reusable body of code. Some of the guidelines that emerged from those discussions:
- Proposed database schema changes should be vetted by all members of the engineering team. Once agreed upon, the system engineering team needs to be notified of these changes so that backup scripts can be adjusted.
- Environment settings (e.g. production or development) should be specifiable from the script's command line arguments and then determined programmatically by the framework if not specified there. There should be no hard coding of these values.
- If there are model classes or data access objects that can be used to manipulate the data, the script should use these rather than using hand crafted SQL strings.
- One-time use scripts should be named after the work item number that they were created for. Multiple use scripts should be flexible to handle a wide number of use situations, named for the tasks they perform.
- All repair and maintenance scripts should be placed under revision control.
These guidelines help us to ensure that when things need to be fixed or reconfigured, they are done with minimal risk to the data.