Link Rot

Wednesday, March 9th, 02022 at 12:21 UTC

Link rot is like any other kind of decaying process. Wood slowly rots away in the forest, iron rusts. Everything slowly decays until we reach the heat death of the universe. Links on the Web are no exception.

In 01998, Tim Berners-Lee wrote Cool URIs don’t Change. That means that once you make a URL, it should last forever. It can redirect to somewhere else, but it will resolve to some information! Link rot proves this isn’t possible. It is certainly something you should strive for, but we only ever lease domains, companies get bought and sold, URL do die. The question is how fast?

There is one way to find out, do some personal research.

Online Bookmarks

Back in 02004, we signed-up to a new start-up called del.icio.us by Joshua Schachter. It was an online bookmarking site. We were certainly an EARLY adopter of del.icio.us because we got the user name ‘brian’! We had used others prior and even had built our own, but this was free and we didn’t have to maintain anything.

Since then, we’ve been filling it with bookmarks, some years more than others.

Pinboard analysis: 22% of links are gone

We have nearly 1400 bookmarks spanning 18 years saved. We wanted to see how many of those URL still work!

Pinboard gives you a few nice options for exporting/backing-up your data. We choose the JSON blob since that is easy for us to programatically read into our script and check each URL for availability.

The code to check your own stats is online, you can download it for free and export your pinboard bookmarks and try this for yourself. https://github.com/optional-is/pinboard-analysis There are instructions on how to get it all working and generating your own Link Rot stats table.

Table 1: A breakdown by year of the number of successful URLs, total URLs bookmarked and the percentage of non-link-rot URLs.

YearSuccessful BookmarksBookmarksAverage
202233100%
2021141593.3%
20201818100%
20191111100%
201822100%
201744100%
20167977.8%
2015222588.0%
2014353794.6%
2013252889.3%
2012445284.6%
2011799484.0%
2010516381.0%
2009506675.8%
2008708384.3%
200713619270.8%
200619627172.3%
200511415374.5%
200420526677.1%
TOTALS1086139278.0%

The code looks through your bookmarks and attempts to fetch each URL. If the HTTP code is less than 400 we mark it as a success. Without manually checking every URL, there might be some false positives: people selling existing domains, hosting provider redirects, etc. If the status code was 400 or higher, we marked it as a failure. After some manual investigation, we realized that some domains were not allowing bots to crawl them. Our code was using cURL, which appears as a bot, so we faked a browser’s user-agent string and decreased our failure rate by ~4%.

There are potential other improvements, but it would only be changing the results a small percentage. This is also only our bookmark analysis. What we saved these years were more academic and professional sites. Those might have a different longevity than news, videos, social media or personal blog URLs? Your milage may vary!

Del.icio.us History

Del.icio.us has it’s own interesting journey, from one-man folkonomy start-up formed in late 02003 to a Yahoo! acquisition in late 02005 to being sunsetted in 02011 and bought from Yahoo! In that time Pinboard.in started and we jumped over there and imported all our bookmarks from del.icio.us. And to complete the circle, in 02017 Pinboard purchased Del.icio.us and is restoring it back to use.

We’ve met Joshua Schachter a few times, he probably won’t remember. Back in 02006, we went to SXSW when it was still small-ish and cool. We were there on behalf of the microformats team. There we met Dan Connolly (who we later worked with on GRDDL) and via him, we rendezvoused one night outside a bar with Joshua. I still remember that year was the breakout year of “Dodgeball” (pre-Foursquare). You could see people on their pre-iPhones sending and receiving text messages of where the next party was happening. People would appear and disappear like a line of ants from venue to venue nearly instantly.

We vividly remember Dan having a sidekick phone it was the alternative to a BlackBerry, the one that swivled out a full keyboard. So we meet with Joshua, he and Dan chatted tech and we sat in awe of meeting someone who’s product we used and a speaker at one of the sessions on folksonomies!

At that session we sat in the audience with Thomas Vander Wal  (the coiner of the term folksonomy) while he stirred in his chair for not being on the panel discussing the topic that he made popular.

Many years later, we were invited to an O’Reilly FooCamp where we again crossed paths with Joshua when he was trying out a new start-up idea around people power and distributed tasks.

So What?

The Web as we know it won out over other competing systems partly because of the looseness of the connections. The Xanadu Project had “unbreakable links” and “two way links”. The overhead of “two way links” requiring who you link to, to link back, prevented spontaneous posts and would make social media impossible!

The Web we have today is that middle-ground of making it easy to publish at the risk of things disappearing. When domains/sites like geocities disappear (or inevitably medium), there are folks at archive.org feverishly trying to save as much content as possible. Both as a historical reference, but so we can continue to link and read content that has gone missing.

We talk about Language Death, when the last native speaker is gone. It is predicted that 90% of the world’s currently spoken languages will be extinct by 02050. We understand what that means culturally, historically and societally.

In 18 years, we’ve lost 22% (based on an N of 1) of the content on the Web, sure, the language and ideas are not extinct, but we certainly lost something for future generations.

The Internet Archive works hard to keep all this knowledge available. Find a way to support them and their hard work.