There's a quote that always stuck with me since the very old days:
"Once you put something online, it's there forever"
I wished this was true, because over the years a lot of absolutely valuable content had disappeared. Such content could be an insightful blog, a unique video, an exciting project or even whole communities.
They were things that I found easy to digest, made in a format that reasonated with me, and felt so valuable that I was sure they would remain online somewhere forever. It's heartbreaking to find out that content could permanently vanish, and somehow nobody managing to back it up. The only thing remaining being a vague idea of what it was back then, forcing me to try and attempt to link the pieces of concepts I still remember just to recreate something that was originally perfectly presented. Even after figuring out the crux of a vanished content, the end-result is always a poor imitation of the original. It's like trying to recite a book from memory vs reading the actual book.
It is no surprise that most websites through the lens of a search engine are all breeding ground for ads. The internet, an undying source for endless knowledge, has been slowly and silently decaying from its core. The grand archiving efforts from sites like webarchive can only go so far back before gaps start appearing, and I doubt they'll be able to scale indefinitely. I wonder, how much SEO traps are they archiving? How about the AI slop?
This is where the AI (or more precicely LLM) comes in. These big companies, with their insatiable hunger for content and complete disregard for any rules, are all trying to build the largest model possible. In some twisted way, they are actually building the biggest and most compact archive the world has ever seen.
So far, LLMs have managed to fill in the gaps of content I was missing. Perhaps someone did backup the content I was looking for, but they did it in another language, in some obscure source. Or maybe LLMs are just that good at linking things together and delivering them in my preferred format. In any case, after providing vague hints and ideas, it managed to rebuild some of the content I thought I would never find back.
Despite all the bad things these AI companies are doing, they're at least doing one good thing for the future: condensing all knowledge.