The Delete that Wasn’t

Monday, August 30th, 02010 at 11:11 UTC

There has been a privacy row over the last few years, which has picked-up recently with users organizing a day to delete their facebook accounts.

From a programming/development stand point, it’s a very interesting conversation. While technically easy to delete, intfrastruturally it isn’t. In any high-end web application, you user data is just one more row in a table or collection. The cost of storing one more item is next to zero. Deleting that item is trivial in the grand scope of things, but the danger is that it cascades through the entire system and has unintended consequences!

Let’s look at any large community that has user accounts. As customers, we want to be able to purge ourselves from their system, but do we really? There is the age-old problem of the bait and switch. Lets say that I sign-up for a service using my regular username, ‘briansuda’. Now, I amass links to my account, it is referenced on websites, in the media, on paper, etc. Now, the company turns evil and I want to quit. I want all my data deleted immediately. Sure, assuming that is possible, it might not be for reasons we’ll see later, all the information about that account under their control is deleted. Now, the next day, some nefarious person realizes that I deleted the account ‘briansuda’ and signs-up and takes over my persona. They have lost all the content, but not inbound links and my good name. In that situation you probably wanted the evil company to inactive your account rather than delete it. It prevents others from snipping-up previously used accounts. We see this problem with domains. I have my own personal domain suda.co.uk, if let that lapse and someone else jumps on it, then they could go to all the common sites, use my email address and press the forgot password link and become me. There isn’t much you can do about this issue because you don’t own domain names, you lease them.

In many out of the box web frameworks, the database is normalized so that it is the most optimal, building relations between the tables. A database doesn’t know anything about what you put into it, it is information agnostic. To a company, some of that information is very important and some isn’t. Say the first thing you do when signing-up for a new services is agree to some contract and received an invoice. When you try to delete your account, all of these other bits are connected to you and will be deleted too. Companies can’t delete vital business information such as who they have invoiced, when and other contractual data. It might be possible to make a hard-copy printout, but that will take time and resources. By being efficient in our application design and normalizing our database tables actually inhibits the ability to delete.

You certainly want an undo button for something as so monumental as a system-wide delete! What if you left your computer on and logged-in and someone snuck-up and pressed the delete button while you where away, maybe as a joke, maybe out of spite! You want to be able to recover all that data. Facebook actually has a grace period which you can re-log-in and prevent the delete from occurring. I would guess that even after you let that lapse, parts of your account are still not fully deleted.

There are also internal metrics that need to be considered. If you are using a Customer Relationship Management (CRM) tool, then you are tracking all sorts of information about your customers. When you lose a customer, you certainly need to remove them from your mailing lists, but you don’t want to outright delete them. You might need their data for metrics, such as what is the average order size, or time between projects, or preferences. Just because they are no longer active, doesn’t mean that the data isn’t valuable. Imagine a scenario where a new customer is refered to you from someone who quit. The new customer says “I want the same thing as he ordered”. If you deleted that information, now you are out two customers!

You certainly need to honor your customer’s will to be removed from the system, but not all of their data needs to go. Imagine a forum website, if several people were engaged in threaded conversations and you just deleted one person and all their data, then looking back at the discussion won’t make sense. It’s not just the loss of one person’s data, but now past archived forum data is useless too. One solution is to keep the messages, but replace the text with “This member has removed their posts”. This lessens the confusion in threaded conversations, but also respects the will of the quitting customer.

Depending on your company and dealings, you might be legally obligated to keep the data on file for 60-90 days or more. Some countries have data retention laws, which means that deleting a user out-right might put your company into legal hot water! Getting back to flagging a member as inactive rather than deleting them solves both problems.

Then there is the sticky question of who owns the data. Some sites are explicit in saying that “you own the data”, some explicitly say that they own it, then others never mention anything. Can you demand something be deleted that you don’t own? There are certainly legal instances where people are not free to keep or publish anything they want, but with the banal every day text we write and publish, if you agreed to a EULA (which are legally dubious) or some Terms and Conditions, you might have waved the right to your content, or someone might claim it wasn’t copyrightable in the first place.

There was a comment on MetaFilter that sums it up nicely, “If you are not paying for it, you’re not the customer; you’re the product being sold.” Which would explain why some companies are not willing to let you go! Having your data available to sell to companies interested in demographic breakdowns is why you can use their product for free. It isn’t out of the kindness of their hearts that you can log-in without paying, your actions are being subsidized by advertisements or the sale of your data.

It just goes to show that as a user, you might want your account closed and all your data removed, but in a community setting removing just one person’s data could have a larger knock-on effect which ruins it for everyone or the companies business plan. Deactivating data and locking accounts is a much better way to deal with the problem of customer retention than out-right deleting of data. Deleting from a database rarely has an Undo. Changing a flag from “active” to “inactive” can much more easily be rolled back.

Next time you sign-up for that new fancy web app, think to yourself: Can I actually delete my data?