Music to my ears: how language evolved

24 November 2017
Posted by Matz Larsson.

Did you know that people with roughly the same leg length tend to move in tandem?

This is usually done unconsciously. Between footfalls are short intervals with relatively low noise levels during which we can perceive sounds from the environment. This has less relevance in today's society, but, in the past, rhythmic movements would have helped our ancestors become aware of the pursuit of a sabre-toothed tiger, or the stealthy approach of a malicious stalker. And because behaviours that carry a positive survival benefit are selected, they will become more common within a species. This can also happen if animals (including humans) experience the beneficial behaviour as stimulating or enjoyable. This produces a surge of the "reward molecule" dopamine in the brain. Individuals – and families – with this trait will thus be more likely to walk in step. Meanwhile, less rhythmically-inclined individuals – perhaps those who were bad at paced walking - would have stomped themselves literally out of the genetic pool by being eaten!

Rhythmic behaviour could probably stimulate the production of dopamine also in safer surroundings. Clapping hands, stamping, howling around the campfire... from here, the step to dance and music was probably a small one. Dopamine definitely flows when people in the modern era are listening to music. So similar selective mechanisms may have increased the ability of early mankind to perceive, recollect and mimic sounds. Charles Darwin, and many scholars after him, have suggested that musical ability was a necessary precursor in the evolution of language. 

The so-called "motor theory" of language evolution includes the idea that it was gestures that laid the ground for the development of human language. This theory is very much about hand movements; that is, language has evolved as a result of observation and imitation of gestures. The "tool use sound theory of language", on the other hand, is a partly revamped version of the motor theory. It suggests that language development has been stimulated by the sounds of our movements. The human brain skilfully analyses the sound waves created when tools are used. These specific sounds may have played a crucial role. If early ancestors were able to mimic some tool use sound, the sound may have taken on a symbolic function. The day someone managed to imitate the sound of a cutting knife or an axe blow with the help of mouth, hands or otherwise, was an important first step. If two persons settled that a vocalization symbolised a certain subject or a particular event, the first word was in principle created. The ability to mimic the sounds of implements could only create a limited number of words, of course, but through minor changes to the sound, or a gradual change of its meaning, more words might eventually be formed.

A new way to communicate about tools may have had selective value. Communication could take place when individuals were out of sight of each other, or in darkness. This could have resulted in increased survival of the individual and the group. Now you could talk about apparatuses. This, in turn, would have stimulated the development of tools and opportunities to transfer knowledge of them. All this may have led to increased tool use, which in turn created new types of tool use sound – and so on. Some researchers, such as Michael A. Arbib, have suggested that tool use was linked to the development of syntax, i.e. how words are joined together to form phrases, clauses and sentences. These researchers believe that the various stages in tool use have similarities with sentence structure. Implement sounds can be one piece of the puzzle here. Why? They occur in a definite succession, and shifting sounds are included in the chain of events when complex tool use is performed. Tools are usually controlled with our hands. Bass and Chagnaud have demonstrated very strong links in the vertebrate brain between vocal communication, the sounds produced by voice organ, and motion control of the upper extremities.

The idea may be valid also for language development today. Indeed many modern languages contain sound symbolism. For example, studies have shown that a nonsense word such as "baluma" will be associated with round objects, whereas the word "takiki" is associated with pointed shape. Sound mimicking – onomatopoeic – words, such as "whiz", "sweeping", "slicing" are further examples. 

Of course, early humans might have mimicked other sounds in nature too, like the "meow" of a cat, or the rushing wind. But tool use specifically enables many more senses: hearing, touch, sight and proprioception, i.e. the sense that helps us to keep track of where are fingers, arms and legs are in three dimensional space and in relation to each other. Also, the motor neurones – nerve cells that generate motion – interact when tools are used. All that stimulates the creation of association chains, which is an important component of language and language development. Spoken language is based on the human ability to create associations between complex audio information (vocalisations), and other sensations, such as sight and touch. 

There are so-called "mirror neurones" in the brain. These cells can become active when a monkey cracks a peanut, but also when a monkey sees another monkey doing the same thing. The same also applies to the sound generated when the monkey himself or another monkey cracks a peanut. So the ape doesn't even have to see the event to stimulate audio-visual mirror neurons: just hearing implement sounds is sufficient. A result of tool use training in monkeys was that more multimodal neurons had been activated. Multimodal neurons are those that can be activated by more than one type of stimulus, such as audio and visual experiences. Again, this is something that stimulates the creation of associate chains in the brain.

Which part of the brain has received most of such stimulation during evolution? Consider that about 90 per cent of us humans are right handed. When a right-hander seizes an axe or a knife, the hand works usually in the right visual field (on the right hand side of a person's body). In that situation, it is the left hemisphere neurones that are active; they are controlling the hand's movements. Similarly, virtually all sensory information about the right hand will reach the left side of brain. The left brain will therefore have large doses of multimodal stimulation during tool use. Thus, the hypothesis is consistent with the left hemisphere dominance in language processing.

Sounds created as a side effect of locomotion are labelled "incidental sounds of locomotion" or ISOL for short. Monkeys in the canopy move unpredictably and irregularly in a typically diverse vegetation. Similarly, when non-human primates are on the ground, they do not move particularly regularly. When man began to walk on two legs, one result was more rhythmic and predictable ISOL — a regularity that is likely to help individuals to keep pace. The evolution of such synchronous behaviour is far from clarified, but there are similarities with both fish in the sea and birds in the air. Basically, we are the fish that scrambled up on land. That happened more than two hundred million years ago. Although "upgrades" and adaptations to new ecological niches followed in vertebrate descendants, some basic structures of the brain are likely to be preserved. In a suitable ecological situation, synchronous behaviour may give "pay off" again. An example may be when we switched to bipedal walking.

To understand this, maybe we should turn our eyes – and ears – down to the depths. Who has not been impressed by the rapid, synchronous movements made by shoaling fish? To move synchronously may cause many acoustic advantages. Theoretical models suggest that fish and birds, swimming and flying in large groups, can use ISOL to navigate in the flock. ISOL contains potentially valuable information about the neighbour’s distance, size, speed, and frequency of wing movements. If all animals in a group stop swimming, or flapping, at the same time, a sudden noise reduction follows, and fish or birds may eavesdrop on their surroundings more efficiently. Thus, synchronous movements makes it easier to perceive acoustic information. Moreover, ISOL is likely to help the group to synchronise movements. Fin-beats, wingbeats or footsteps are clearly audible to the closest fish, bird or human and can, in theory, serve as a metronome. By listening to ISOL, nearby animals (humans) may mutually adjust their speed, distance, and pacing.

To summarise, this hypothesis suggests that schooling fish may use analogous acoustical mechanisms as a human orchestra to achieve synchronisation. The ability to synchronise movements, to listen and imitate sound is fundamental for musicality. It assumes that bipedal walking stimulated the evolution of music, which in turn may have been critical for language evolution...

Add a comment