Miguel Afonso Caetano<p><a href="https://tldr.nettime.org/tags/InternetArchive" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>InternetArchive</span></a> <a href="https://tldr.nettime.org/tags/DigitalArchiving" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DigitalArchiving</span></a> <a href="https://tldr.nettime.org/tags/LinkRot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LinkRot</span></a> <a href="https://tldr.nettime.org/tags/InternetHistory" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>InternetHistory</span></a> <a href="https://tldr.nettime.org/tags/DigitalPreservation" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DigitalPreservation</span></a>: "Based on our 2023 crawl of the 27.3M URLs (that were first archived by IA between 1996-2021), we made the following high-level findings.</p><p>1. The Median Lifespan of a URL is 2.3 Years</p><p>We measured the median lifespan of a URL to be 2.3 years (calculated on URLs that were dead in 2023). However, we observed very different behavior between root URLs and deep links.</p><p>- For root URLs, the median lifespan is 8.8 years. <br> <br> - 10% of the root URLs died within 1 year, but 20% lived for over 20 years before dying.<br> - However, for root URLs that were first archived in the last 10 years of our study (2012-2021), the median lifespan was only 2.6 years, indicating overall shorter lifespans for newer webpages.</p><p>- For deep links, the median lifespan is 1.3 years. Over 50% of the deep links died within 1 year, and only 4% lasted for over 10 years before dying.</p><p>2. Only 35.3% of the webpages were still alive in 2023</p><p>Only 35.3% of the webpages were still alive (terminated in a HTTP 2xx status after following any redirects) in 2023. </p><p>- However, nearly half of the URLs first archived between 1996-2000 were still alive, but this is likely affected by the large proportion of root URLs in our dataset in the early years.</p><p>- For those URLs first archived between 2012-2021, about 40% were still alive in 2023, for both root URLs and deep links.</p><p>3. The remaining 64.7% are considered dead</p><p>We further categorized the dead webpages based on HTTP level responses:</p><p>- HTTP 4xx (25.1%) - These returned HTTP 4xx status codes, meaning that the webpages were gone or inaccessible, but the webserver was still alive.<br>- Error (39.6%) - These resulted in a DNS failure, TCP connection timeout, HTTP 5xx status, invalid redirection, or some other error state."</p><p><a href="https://ws-dl.blogspot.com/2024/09/2024-09-20-some-urls-are-immortal-most.html" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">ws-dl.blogspot.com/2024/09/202</span><span class="invisible">4-09-20-some-urls-are-immortal-most.html</span></a></p>