This ancient language puzzle was impossible to solve—until a PhD student cracked the code

The discovery makes it possible to translate any word written in Sanskrit.
A page from an 18th Century copy of Dhātupāṭha of Pāṇini from the Cambridge University Library.
A page from an 18th-century copy of Dhātupāṭha by Pāṇini from the Cambridge University Library. Cambridge University Library

Share

A PhD student studying at the University of Cambridge has solved a puzzle that has stumped scholars since the fifth century BCE. Rishi Rajpopat decoded a rule taught by Pāṇini, an Indian grammarian who is believed to have lived in present-day northwest Pakistan and southeast Afghanistan. Scholars have referred to him as one of the fathers of linguistics.

Sanskrit is an ancient an classical Indo-European language from South Asia and the sacred and literary language of Hinduism. It is also how much of India’s greatest science, philosophy, poetry, and other secular literature has been written. It is spoken in the country by roughly 25,000 people today.

[Related: These ‘fake’ ancient Roman coins might actually be real.]

“Some of the most ancient wisdom of India has been produced in Sanskrit, and we still don’t fully understand what our ancestors achieved,” said Rajpopat, who first learned Sanskrit as a high school student and is now at the University of St. Andrews, in a statement. “We’ve often been led to believe that we’re not important, that we haven’t brought enough to the table. I hope this discovery will infuse students in India with confidence, pride, and hope that they too can achieve great things.”

With Rajpopat’s discovery, scholars can now construct millions of grammatically correct words in Sanskrit. The findings were published as Rajpopat’s PhD thesis in 2021.

Rajpopat decoded a 2,500-year-old algorithm that can accurately use Pāṇini’s “language machine” for the first time. Pāṇini’s system consists of 4,000 rules and is detailed in the Aṣṭādhyāyī. Considered his greatest work, Aṣṭādhyāyī is believed to have been written around 500 BCE. It is meant to work like a machine, where the base and suffix of a word are fed in and a step-by-step process should turn them into grammatically correct words and sentences.

“Pāṇini had an extraordinary mind, and he built a machine unrivaled in human history,” said Rajpopat. “He didn’t expect us to add new ideas to his rules. The more we fiddle with Pāṇini’s grammar, the more it eludes us.”

Often, two or more of Pāṇini’s rules can be applied at the same time and step in the process, which has left scholars agonizing over which rule or step to choose.

An algorithm is needed to solve this rules conflict, which affects millions of Sanskrit words, including certain forms of the commonly used “mantra” and “guru.” Pāṇini had a metarule to help the user decide which rule should be applied if a rule conflict occurred, but it has been misinterpreted by scholars for the last 2,500 years.

[Related: Researchers found what they believe is a 2,000-year-old map of the stars.]

Traditionally, Pāṇini’s metarule has been interpreted as: in the event of a conflict between two rules of equal strength, the rule that comes later in the grammar’s serial order wins. However, Rajpopat argues that Pāṇini meant that between rules applicable to the left and right sides of a word respectively, Pāṇini wanted us to choose the rule applicable to the right side.

“I had a eureka moment in Cambridge. After nine months trying to crack this problem, I was almost ready to quit, I was getting nowhere. So I closed the books for a month and just enjoyed the summer, swimming, cycling, cooking, praying and meditating,” said Rajpopat. “Then, begrudgingly I went back to work, and, within minutes, as I turned the pages, these patterns starting emerging, and it all started to make sense. There was a lot more work to do but I’d found the biggest part of the puzzle.”

By using this interpretation that Pāṇini expected the rule applicable to the right side to be chosen, Rajpopat found the ancient scholar’s language machine produced grammatically correct words consistently and with almost no exceptions.

Over the next two-and-a-half years, he worked to solve problems in what he had found and presented. In addition to understanding more Sanskrit texts, the algorithm that runs Pāṇini’s grammar can potentially be taught to computers.

“Computer scientists working on Natural Language Processing gave up on rule-based approaches over 50 years ago,” said Rajpopat. “So teaching computers how to combine the speaker’s intention with Pāṇini’s rule-based grammar to produce human speech would be a major milestone in the history of human interaction with machines, as well as in India’s intellectual history.”