The Algebra of Language

Using ideas from theoretical physics, Matilde Marcolli has created a new mathematical framework for Noam Chomsky's model of language.

by Whitney Clavin

When mathematician Matilde Marcolli worked as a postdoc at MIT in the late 1990s, she used to sit in on classes taught by famed linguist Noam Chomsky. At the time, Chomsky was teaching an early version of his minimalist program of language, which breaks down all human languages to innate computations made by the brain. "I sat in this class, and I kept thinking, 'OK, I'm not understanding what is going on. But I think there is something that I, as a mathematician, should be able to understand because this looks so familiar,'" she says.

Thirty years later, Marcolli has written a forthcoming book with Chomsky and Bob Berwick, a professor of computer science and computational linguistics at MIT, that elucidates the math underlying Chomsky's models of language.

"Noam's work was so precise that it was easy to translate into math," says Marcolli, the Robert F. Christy Professor of Mathematics and Computing and Mathematical Sciences at Caltech. "After Bob and I started working on this with Noam, everything magically fell into place. I'm still surprised by how smoothly the mathematical formalism came together. Language, or human language syntax, is all about structure, while mathematics, and especially algebra, is really also the study of structure."

Chomsky's latest model of linguistics, developed over the last 10 years, further simplifies the minimalist model that first fascinated Marcolli more than two decades ago. The new version explains in detail how the human brain is born with an ability to comprehend and manipulate language to create meaningful sentences. At the model's core is a single computational operation, called merge, which essentially joins two words, or two parts of a sentence, together. These so-called syntactic objects can be merged with others to form and comprehend increasingly complicated sentences.

"There is one very basic operation in language that makes it possible for all humans to compute the elaborate structures that exist in a sentence," Marcolli says. "Our brains juggle the entire structure of a sentence, including all the building pieces, and we are able to manipulate them. This ability for language happens very early on, even when babies are exposed to so few examples."

Take the sentence, "The apple was eaten." In Chomsky's model, tree-like diagrams explain how our brains merge the elements of the sentence together. First, you have the object "the apple" designated as such:

This phrase is then joined with "eaten" to form:

The tense of the verb is added next:

At this point, our brains have built the structural components of the sentence. The phrase "the apple" is then extracted from the tree and joined at the root with the remaining tree structure to create the final sentence.

Credit: Rob Corder (creative commons)

Marcolli likens Chomsky's model to the mid-century mobiles by the artist Alexander Calder, in which objects dangle in a hierarchical fashion from distinct levels or tiers. In this metaphor, the whole mobile represents the overall structure of language, while the different levels are substructures. Within any level, there are yet more substructures—the dangling syntactic objects, which can be moved around and rearranged.  

For instance, when we transform the sentence, "The cat is hungry," to "Is the cat hungry?" we are extracting words, or substructures, from one level of the mobile and reinserting them back in a different position. Linguists call this transformation "movement." However, while these mobiles illustrate how our brains process language, they do not determine how we speak or write it.

"Once the mobiles, or trees, have been structurally built to denote the meaning of a sentence, the actual ordering of words in a sentence is like placing the three-dimensional Calder mobile on a table," Marcolli says. This “planar embedding,” or flattening of the mobile, creates a time-ordered sentence.

"The ordering of words in the sentence is not the first thing that our brain perceives," she says. "When we try to understand the meaning of a sentence, we immediately see that certain elements of the sentence are closely related to other ones, regardless of how far apart they are in the actual sentence. We don't perceive the linear ordering of words. We perceive the structure."

Our brain perceives the structures of sentences, not just the string of words. In this example, the sentence, "I saw someone with a telescope," can take on two different meanings as shown with the two different trees. In the first case, the sentence is saying, “I have a telescope, and I used it to see someone.” In the second case, the sentence is relaying, “I saw someone who has a telescope.” The sentences viewed simply as strings of words are the same, but ultimately our brains can perceive the different structures behind the two meanings.

Where, then, does the math come in? Marcolli had done prior work on mathematics and linguistics starting in 2015, partly in collaboration with Berwick, a former student of Chomsky's. Among other topics, that research involved developing new mathematical tools to analyze the evolution of different languages throughout history. Then, around 2019, Marcolli began looking into Chomsky's new minimalist model and realized that its operations of assembling and disassembling language substructures could be described with a type of math known as Hopf algebras (named after the German mathematician Heinz Hopf, who began work on the math tool in the 1940s).  

Hopf algebras underlie critical calculations performed in several areas of science, most notably theoretical physics. One of these calculations, called renormalization, is widely used in quantum field theory. Renormalization allows researchers to reassign meaningful values to mathematical computations that have gone haywire due to nonsensical values of infinity. It is used in tandem with Feynman diagrams, a separate revolutionary tool developed by longtime Caltech professor Richard Feynman to describe the behavior of subatomic particles.

The method of renormalization was pioneered in the 1950s and ’60s, while later work by Caltech alum Kenneth Wilson (PhD '61) in the 1970s made the tool more accessible. However, Hopf algebras' role in renormalization theory did not come to light until much later. This was achieved beginning in 1999 by French mathematician Alain Connes and German physicist Dirk Kreimer. They and others spent the next several years working out the details and showed explicitly how Hopf algebras are at the core of the renormalization procedure for Feynman diagrams. Marcolli herself worked with Connes on this research from 2004 to 2008.

"It was very difficult to comprehend what was really going on in renormalization," she says. "It took a long time to actually understand. It was one of those things that mathematicians didn't want to look into, like some kind of black magic. But the important thing that Connes and Kreimer understood is that Hopf algebras are the key to explaining renormalization in a way that's very clear."

When Marcolli realized that Hopf algebras also explain the merge operation of Chomsky's model of language generation, she first discussed it with Berwick, who then suggested they contact Chomsky about a possible collaboration. "I was afraid that Noam would say 'get lost.' After all, he’s been thinking about this model for decades, and I am just an outsider, but he seriously wanted to discuss the idea and understand the math," Marcolli says.

In renormalization physics problems, all combinatorial substructures need to have compatible values in order to complete the calculation. Hopf algebras serve as a tool for subtracting the difficult infinity values over a set of substructures in a consistent fashion. In language, says Marcollli, Hopf algebras play a similar role: they give each piece of the sentence—specifically the substructures of a linguistic tree that are moved around—a compatible meaning so that the overall sentence is meaningful.  

"When we parse a sentence, we want to assign it meaning, and we have to do this over all the substructures of the sentence. This is what we do when assigning meaningful physics values. It's the same thing," Marcolli says.

Now that Marcolli and her colleagues have written several papers on the algebraic model of language, which will appear as chapters of their forthcoming book, Marcolli is sharing what they learned with others. Last spring, she began teaching a graduate course at Caltech (Ma 191: Mathematical Models of Generative Linguistics) on the subject, and she has hosted two workshops at the Institute’s new Richard N. Merkin Center for Pure and Applied Mathematics (supported by a grant from Caltech's Center for Evolutionary Science). The events have drawn together linguists, mathematicians, theoretical physicists, and computer scientists, including some engineers currently working on AI language-learning models such as ChatGPT.

One of the more satisfying implications of this research, Marcolli says, is that it vindicates a philosophy that Chomsky has championed for most of his career: that language can and should be studied with the same methods and tools used in the physical sciences.


The work by Marcolli, Chomsky, and Berwick that will appear in the forthcoming MIT Press book is funded by the National Science Foundation, the Foundational Questions Institute, and Caltech's Center for Evolutionary Science.

FeaturesAndrew Mosemanmath