Episode #426 from 33:00
That's for sure. Yeah. Can you talk to what is syntax and what is grammar? You wrote a book on syntax.
People
Topics
Introduction
0:00
Naively I certainly thought that all humans would have words for exact counting, and the Piraha don't. Okay, so they don't have any words for even one. There's not a word for one in their language. And so there's certainly not a word for two, three or four. And so that blows people's minds often. Yeah, that's blowing my mind.
Human language
1:13
As a kid in school, when we had to structure sentences and English grammar, I found that process interesting. I found it confusing as to what it was I was told to do. I didn't understand what the theory was behind it, but I found it very interesting. When you look at grammar, you're almost thinking about it like a puzzle, almost a mathematical puzzle.
Generalizations in language
5:19
Yeah. What do you find most beautiful about human language? Maybe the form of human language, the expression of human language.
Dependency grammar
11:06
Well, what I mean is in language, there's three components to the structure of language. One is the sounds. Cat is C, A and T in English. I'm not talking about that part. Then there's two meaning parts, and those are the words. And you were talking about meaning earlier. Words have a form and they have a meaning associated with them. And so cat is a full form in English and it has a meaning associated with whatever a cat is. And then the combinations of words, that's what I'll call grammar or syntax, that's when I have a combination like the cat or two cats, okay, where I take two different words there and put together and I get a compositional meaning from putting those two different words together. And so that's the syntax. And in any sentence or utterance, whatever, I'm talking to you, you're talking to me, we have a bunch of words and we're putting them together in a sequence, it turns out they are connected, so that every word is connected to just one other word in that sentence. And so you end up with what's called technically a tree, it's a tree structure, where there's a root of that utterance, of that sentence. And then there's a bunch of dependents, like branches from that root that go down to the words. The words are the leaves in this metaphor for a tree. A tree is also a mathematical construct.
Morphology
21:05
Morphology is the connections between the morphemes onto the roots. In English, we mostly have suffixes. We have endings on the words, not very much but a little bit, as opposed to prefixes. Some words depending on your language can have mostly prefixes, mostly suffixes or both. And then several languages have things called infixes, where you have some general form for the root and you put stuff in the middle, you change the vowels, stuff like that. That's fascinating, that's fascinating. In general, there's what, two morphemes per word? One or two, or three.
Evolution of languages
29:40
We have a little bit of old English to modern English because there was a writing system and we can see how old English looked. The word order changed for instance, in old English to middle English to modern English. And so we could see things like that. But most languages don't even have a writing system. Of the 7,000, only a small subset of those have a writing system. And even if they have a writing system, it's not a very modern writing system and so they don't have it ... For Mandarin, for Chinese, we have a lot of evidence for a long time, and for English, and not for much else. German a little bit but not for a whole lot of ... Long-term language evolution, we don't have a lot. We have snapshots, is what we've got of current languages. You get an inkling of that from the rapid communication on certain platforms. On Reddit, there's different communities and they'll come up with different slang, usually from my perspective, driven by a little bit of humor or maybe mockery or whatever, just talking shit in different kinds of ways. And you could see the evolution of language there because I think a lot of things on the internet, you don't want to be the boring mainstream. You want to deviate from the proper way of talking. And so you get a lot of deviation, rapid deviation. Then when communities collide, you get ... Just like you said, humans adapt to it. And you could see it through the lines of humor. It's very difficult to study but you can imagine a hundred years from now, well, if there's a new language born for example, we'll get really high resolution data.
Noam Chomsky
33:00
Thinking and language
1:17:06
Well, that's a really interesting question. What is the difference between language written communicated versus thought? What to you is the difference between them? Well, you or anyone has to think of a task which they think is a good thinking task, and there's lots and lots of tasks which would be good thinking tasks. And whatever those tasks are, let's say it's playing chess, that's a good thinking task, or playing some game or doing some complex puzzles, maybe remembering some digits, that's thinking, a lot of different tasks we might think. Maybe just listening to music is thinking. There's a lot of different tasks we might think of as thinking.
LLMs
1:30:36
Well, let's take a stroll there. You wrote that the best current theories of human language are arguably large language models, so this has to do with form. It's a kind of a big theory, but the reason it's arguably the best is that it does the best at predicting what's English, for instance. It's incredibly good, better than any other theory, but there's not enough detail.
Center embedding
1:43:35
And how they generate language, process language and generate language. That's fascinating. So in that sense, they're perfect. If we can just linger on the center embedding thing, that's hard for LLMs to produce and that seems really impressive because hard for humans to produce. And how does that connect to the thing we've been talking about before, which is the dependency grammar framework in which you view language, and the finding that short dependencies seem to be a universal part of language? So why is it hard to complete center embeddings? So what I like about dependency grammar is it makes the cognitive cost associated with longer distance connections very transparent. Basically, it turns out there is a cost associated with producing and comprehending connections between words, which are just not beside each other. The further apart they are, the worse it is. We can measure that and there is a cost associated with that.
Learning a new language
2:10:02
You mentioned one of the things is a way to measure a language is learning problems. So, what's the correlation between everything we've been talking about and how easy it's to learn a language? Is a short dependencies correlated to ability to learn a language? Is there some kind of... Or the dependency grammar, is there some kind of connection there? How easy it is to learn? Well, all the languages in the world's language, none is right now we know is any better than any other with respect to optimizing dependency lengths, for example. They're all kind of do it, do it well. They all keep low. So, I think of every human language as some kind of an optimization problem, a complex optimization problem to this communication problem. And so they've solved it. They're just noisy solutions to this problem of communication. There's just so many ways you can do this.
Nature vs nurture
2:13:54
To what degree is language, this is returning to Chomsky a little bit, is innate. You said that for Chomsky, you used the idea that language is, some aspects of language are innate to explain away certain things that are observed. How much are we born with language at the core of our mind brain? The answer is, I don't know, of course. I'm an engineer at heart, I guess and I think it's fine to postulate that a lot of it's learned. And so I'm guessing that a lot of it's learned. I think the reason Chomsky went with innateness is because he hypothesized movement in his grammar. He was interested in grammar and movement's hard to learn. I think he's right movement. It's a hard thing to learn, to learn these two things together and how they interact. And there's a lot of ways in which you might generate exactly the same sentences and it's really hard.
Culture and language
2:20:30
Well, that's a fascinating effect. You mentioned Bolivia. What's the connection between culture and language? You've also mentioned that much of our study of language comes from W-E-I-R-D, WEIRD people, western, educated, industrialized rich, and democratic. So when you study remote cultures such as around the Amazon jungle, what can you learn about language? So that term WEIRD is from Joe Henrich. He's at Harvard. He's a Harvard evolutionary biologist. And so he works on lots of different topics and he basically was pushing that observation that we should be careful about the inferences we want to make when we're in psychology or mostly in psychology, I guess, about humans. If we're talking about undergrads at MIT and Harvard, those aren't the same. These aren't the same things. And so if you want to make inferences about language, for instance, there's a lot of other kinds of languages in the world than English and French and Chinese. And so maybe for language, we care about how culture, because cultures can be very, I mean, of course English and Chinese cultures are very different, but hunter-gatherers are much more different in some ways. And so if culture has an effect on what language is, then we kind of want to look there as well as looking.
Universal language
2:34:58
Yeah. Do you have a sense why universal languages like Esperanto have not taken off? Why do we have all these different languages? Well, my guess is the function of a language is to do something in a community. I mean, unless there's some function to that language in the community, it's not going to survive. It's not going to be useful. So here's a great example. Language death is super common. Okay? Languages are dying all around the world, and here's why they're dying. It's like, yeah, I see this. It's not happening right now in either the Tsimane or the Piraha, but it probably will. So there's a neighboring group called Moseten, which is, I said that it's isolate. It's actually there's a dual, there's two of them. So it's actually, there's two languages which are really close, which are Moseten and Tsimane, which are unrelated to anything else. And Moseten is unlike Tsimane in that it has a lot of contact with Spanish and it's dying, so that language is dying. The reason it's dying is there's not a lot of value for the local people in their native language.
Language translation
2:39:21
Do you have hope for machine translation that it can break down the barriers of language? So while all these different diverse languages exist, I guess there's many ways of asking this question, but basically how hard is it to translate in an automated way for one language to another? There's going to be cases where it's going to be really hard. So there are concepts that are in one language and not another. The most extreme kinds of cases are these cases of number information. So good luck translating a lot of English into Piraha. It's just impossible. There's no way to do it because there are no words for these concepts that we're talking about. There's probably the flip side. There's probably stuff in Piraha, which is going to be hard to translate into English on the other side. And so I just don't know what those concepts are. The space, the world space is a little different from my world space, so I don't know what the things they talk about, things it's going to have to do with their life as opposed to my industrial life, which is going to be different. And so there's going to be problems like that always. Maybe it's not so bad in the case of some of these spaces, and maybe it's going to be hard or others. And so it's pretty bad in number. It's extreme, I'd say in the number space, exact number space. But in the color dimension, that's not so bad. But it's a problem that you don't have to talk about the concepts.
Animal communication
2:42:36
I met a guy named Aza Raskin, who does a lot of cool stuff, really brilliant, works with Tristan Harris on a bunch of stuff, but he was talking to me about communicating with animals. He co-founded Earth Species Project where you're trying to find the common language between whales, crows and humans. And he was saying that there's a lot of promising work that even though the signals are very different, the actual, if you have embeddings of the languages, they're actually trying to communicate similar type things. Is there something you can comment on that? Is there promise to that in everything you've seen in different cultures, especially remote cultures, that this is a possibility or no? That we can talk to whales? I would say yes. I think it's not crazy at all. I think it's quite reasonable. There's this sort of weird view, well, odd view, I think that to think that human language is somehow special. I mean, maybe it is. We can certainly do more than any of the other species, and maybe our language system is part of that. It's possible. But people have often talked about how, like Chomsky, in fact, has talked about how human, only human language has this compositionality thing that he thinks is sort of key in language. And the problem with that argument is he doesn't speak whale, and he doesn't speak crow, and he doesn't speak monkey. They say things like, well, they're making a bunch of grunts and squeaks. And their reasoning is like, that's bad reasoning. I'm pretty sure if you asked a whale what we're saying, they'd say, well, I'm making a bunch of weird noises.