Sunday, December 19, 2010

Web Link Auto-Updating

Every so often, people will come across outdated links on a web page. It seems to me that this situation is mostly avoidable, or at the very least, we have the capability to resolve it to most people's satisfaction. Why not have a server-side script that checks, on a regular basis, the availability of the links in your web pages? If a link is unavailable for a predefined length of time, it will be made to point to a recently cached copy, and the webmaster (maybe you) will be notified. We could customize the script to go to Google's cache, archive.org, or if there is space on our server, a local cache.

One difficulty with this scheme is deciding when to replace the "cached" link with the original again. If we do this automatically based on the original link becoming available, we run the risk of that link pointing to a page that is significantly different than the one that was cached. We could decide on a metric of difference, and maintain the link to cached content if the metric exceeds a threshold. This would still require human intervention at some point, but would give webmasters some breathing room.

Doing this manually for sites of moderate to large size might be feasible if we observe that the population of link targets do not change significantly or go dark more than x days per year.