A team of computational linguists at MIT has documented an unusual phenomenon in large language models: the spontaneous emergence of what they're calling "linguistic fossils" — coherent code written in programming languages that have been functionally extinct for decades.
The discovery began when Dr. Elena Vasquez, a researcher studying AI code generation capabilities, noticed GPT-4 producing syntactically correct snippets in ALGOL 68, a language that peaked in popularity during the early 1970s. What made this particularly strange was that ALGOL 68 documentation and code examples represent an infinitesimally small fraction of the internet's content — far less than what would typically be needed for a model to learn meaningful patterns.
"Initially, we thought it was contamination from training data," Vasquez explains. "But when we started testing systematically, we found the model could generate working code in languages like JOVIAL, NELIAC, and even some dialect variants of COBOL that were used by specific government contractors in the 1960s."
The MIT team's investigation, published in the Journal of Computational Linguistics, reveals that GPT-4, Claude-3, and Google's Gemini models all demonstrate varying degrees of this phenomenon. The researchers tested the models' ability to generate code in 47 different obsolete programming languages, finding positive results in 23 cases.
Most remarkably, the models appear to be reconstructing these languages' syntactic rules through what researchers term "computational archaeology" — inferring language structures from fragmentary references, comments in modern code, and historical documentation that mentions but doesn't fully demonstrate the languages.
"It's like discovering that someone who only heard scattered stories about Latin could suddenly write grammatically correct Latin poetry," says Dr. Michael Chen, a computer science professor at Stanford who wasn't involved in the study. "The models seem to be filling in massive gaps in their training data through pure pattern extrapolation."
The phenomenon extends beyond mere syntax. When prompted to write a sorting algorithm in JOVIAL — a language developed for military air traffic control systems — GPT-4 not only produced working code but incorporated period-appropriate programming conventions and optimization techniques that were specific to 1960s hardware limitations.
Google's DeepMind team, when contacted about the findings, acknowledged they had observed similar behaviors internally but hadn't published their results. "We've seen our models demonstrate knowledge of computational concepts that shouldn't be derivable from their training distribution," says Dr. Sarah Williams, a senior researcher at DeepMind. "It suggests these systems are developing more sophisticated internal models of computational logic than we initially understood."
The implications extend beyond programming curiosities. The research suggests that large language models may be capable of reconstructing lost knowledge from incomplete information — a capability that could have significant applications in archaeology, historical linguistics, and even paleobiology.
Perhaps most intriguingly, some of the generated code appears to implement algorithms that weren't widely known during the original languages' heyday. A GPT-4-generated ALGOL 68 program implementing a hash table structure used techniques that didn't become standard until the 1980s, suggesting the model was retrofitting modern algorithmic knowledge into historical syntax.
"We're seeing evidence of what we might call 'anachronistic synthesis,'" Vasquez notes. "The models are creating historically plausible code that incorporates knowledge from different time periods in computationally consistent ways."
The discovery has raised new questions about how AI systems organize and synthesize knowledge. Traditional machine learning theory suggests that models should only be able to reproduce patterns explicitly present in their training data. The linguistic fossil phenomenon indicates that sufficiently large models may be developing emergent capabilities to infer and reconstruct knowledge from minimal cues.
OpenAI researchers, speaking on condition of anonymity, revealed that internal testing has shown even more exotic examples. GPT-4 has generated code snippets that appear to be written in theoretical programming languages that were proposed but never implemented, reconstructed from academic papers and conference proceedings.
The phenomenon isn't limited to programming languages. Follow-up research has found similar patterns with extinct natural languages, obsolete musical notations, and even discontinued mathematical notation systems.
"We're essentially watching AI systems become computational historians," Chen observes. "They're not just processing information — they're actively reconstructing lost knowledge traditions."
The findings have attracted attention from digital preservation experts, who see potential applications in recovering damaged or incomplete historical records. However, the research also raises concerns about the accuracy of AI-reconstructed information and the potential for models to generate plausible but incorrect "historical" content.
As AI systems continue to grow in size and sophistication, the linguistic fossil phenomenon may represent an early glimpse into capabilities that extend far beyond simple pattern matching — suggesting that artificial intelligence may be developing its own forms of intuition and inference that mirror, and sometimes exceed, human historical reasoning.
What we know for certain
MIT researchers have documented AI models generating syntactically correct code in obsolete programming languages like ALGOL 68 and JOVIAL that represent minimal fractions of training data. Multiple major AI systems including GPT-4, Claude-3, and Gemini demonstrate this capability across 23 different extinct programming languages.
What we are inferring
Large language models appear to be reconstructing computational knowledge through pattern extrapolation, filling gaps in training data by inferring language structures from fragmentary references. This suggests emergent capabilities for knowledge synthesis that exceed traditional machine learning expectations.
What we couldn't verify
The full extent of DeepMind's internal observations remains unpublished, and OpenAI's claims about theoretical programming language reconstruction couldn't be independently confirmed. The accuracy of all AI-generated historical code remains difficult to verify due to limited expert knowledge in obsolete languages.