Note: Originally published as part of the Guggenheim Museum Bilbao’s exhibition Archtecture Effects.
Before designing the Labyrinth of Crete and creating artificial wings for himself and his son, Daedalus was said to have constructed a number of talking sculptures. The sculptures, self-animated, were so cunning that they would escape if not closely watched. Aristotle wrote that if tools could be animated like Daedalus’s sculptures, there would be no need for slavery and servitude. Elsewhere, the philosopher wrote that inanimate objects could never have a soul because inanimate objects could never have a voice. Archimedes built the first known mechanical flutist, a kind of automaton that would be copied for millennia afterwards. Meanwhile, the experiments of an Alexandrian barber, Ctesibius, led to the creation of the field of pneumatics. In tribute to Ctesibius, Hero of Alexandria built a pneumatic organ and hissing dragons. Hero also constructed androids capable of performing a short play. The mechanical actors were operated by a pulley system and were fully programmable. Their voices, however, were provided by human actors.
Eight hundred years later, in the city of Baghdad, many ancient Greek, Chinese, and Persian mechanical devices, along with several new ones, were published in the Book of Ingenious Devices. In it, the brothers Banū Mūsā describe a hydraulic flutist like those found in ancient Greece. A visitor to Baghdad during its golden age might have encountered mechanical birds of all kinds, singing from clocks and false trees. Visitors may have also seen a combination lock based on the letters of the alphabet, android servants, and a mechanical elephant that functioned as a clock. In the twelfth century, Ismail al-Jazari built an entire band of programmable musicians that floated on a pond and played for partygoers at the king’s palace. The band members were given fully articulated faces and bodies capable of fifty kinds of motions. None, however, were able to sing.
With the help of a colleague, Roger Bacon built a bronze head that could answer any question posed to it. Like singing mechanical birds in Baghdad, speaking bronze heads, also known as “brazen heads,” appeared throughout Europe during the late Middle Ages and early modern periods. Bacon’s brazen head was said to have worked through alchemy and clockwork mechanisms. Likewise, the Dominican bishop and saint, Albertus Magnus, built a man of brass that could answer any question posed to it. It also performed domestic chores. Albertus’s pupil, Thomas Aquinas, became so annoyed by the automaton’s talkativeness that he smashed it to pieces with a hammer.
Leonardo da Vinci built a full-body android dressed in armor whose arms were manipulated by programmable controls hidden in the device’s chest. When the android’s jaw opened, a hidden system of drums sounded. Decades later, Leonardo built a mechanical lion as a gift for King Francis I. The lion walked and produced flowers from its chest, but did not roar. During Queen Elizabeth I’s visit to Kenilworth Castle, Robert Dudley, first Earl of Leicester, arranged for the Virgin Queen to be serenaded by a singer riding a mechanical dolphin. A six-person orchestra, concealed inside the dolphin, provided musical accompaniment. Athanasius Kircher planned to build a speaking statue that could answer questions posed to it by the Queen of Sweden. He also created plans for a talking head and an organ that imitated the sounds of animals. It is unlikely that any of these machines were built.
René Descartes believed that androids could be constructed to imitate human speech, but that no android could produce spontaneous, meaningful answers to questions. He also believed that the heart was a steam engine and animals were machines. The physician Julien Offray de La Mettrie went further and declared that men, too, were machines. La Mettrie would later die after gorging himself on truffled pâté. La Mettrie’s century, the eighteenth, became the golden age of automata, though few were capable of vocalization. Jacques de Vaucanson’s famous mechanical duck was one of the few exceptions: not only could his duck defecate, but it also quacked. Vaucanson created yet another flutist and a pipe player, but neither could vocalize. Vaucanson’s pipe player, however, was equipped with a highly articulated metal tongue capable of playing faster than any human being. The inventor sold his automata to some Lyonnais businessmen, whereupon they were resold and eventually lost.
Forty years after Vaucanson first displayed his duck, the Saint Petersburg Academy of Sciences offered a reward for the best treatise on the mechanical reproduction of vowel sounds. The competition’s winner, Christian Gottlieb Kratzenstein, submitted a monograph describing a device capable of synthesizing the five vowels a, e, i, o, u. Such a device, it was surmised, would lead to mass unemployment among language teachers. At the same time, Wolfgang von Kempelen, creator of the Mechanical Turk, was building a series of mechanical jaws, mouths, and glottises in order reproduce human speech. Kempelen invited Benjamin Franklin to observe the machine in Versailles. The American was impressed by what he saw. Among other words and phrases, the talking machine could pronounce “exploitation,” “opera,” and “Constantinopolis.” Inspired by Kempelen, the German astronomer and hypochondriac Joseph Faber spent twenty-five years building a machine that could speak English, French, and German. The device, named “Euphonia,” was made up of a keyboard and organ attached to a reproduction of a human mouth, throat, and larynx. One version included a mechanical woman that spoke to the audience. P. T. Barnum exhibited Euphonia at London’s Egyptian Hall where visitors paid one shilling to watch and hear it speak. Soon thereafter, Faber committed suicide. The American physicist Joseph Henry imagined harnessing Euphonia to a telegraph and simultaneously broadcasting sermons to many distant congregations. Alexander Melville Bell saw Euphonia while on view in London and challenged his son, Alexander Graham Bell, to build a machine like it. Though still a teenager, young Alexander and his brother succeeded in building their own version. Its first word was “mama.” Thomas Edison manufactured and sold a twenty-two-inch-tall doll with a miniature phonograph embedded in its chest. The doll sang “Mary Had a Little Lamb” in what one newspaper described as a “flat, uninflected whine.” After the toy’s commercial failure, Edison buried more than 7,000 unsold dolls somewhere on his laboratory property. In the symbolist novel A Future Eve, a fictional Edison built an adult female android that spoke through a phonograph hidden in its chest. According to the novel’s author, “The soul is a notebook of phonographic recordings.” Marcel Schwob, in his 1892 short story “The Talking Machine,” wrote that an inventor’s speaking voice lacked “dynamics” because, as the inventor explains, “dynamics belong[s] to the soul, and I have suppressed mine.” In Salomo Friedlaender’s 1916 short story “Goethe Speaks into the Gramophone,” a professor exhumes Goethe’s corpse and uses the skeleton and fictional phonographic technology to reconstruct the poet’s voice. After listening to Goethe deliver a monologue on optics, the professor tosses the phonograph under a speeding train.
In Queens, New York, a two-story-tall abstract painting of a man speaks random English sentences to an audience of amazed onlookers. Beneath the painting, a telephone operator makes the picture speak by playing fragments of speech on an organ and keyboard. The operator takes requests for phrases from the audience. They respond with tongue twisters like “potentiometer” and “non- intercommunicability.” The operator, keyboard, and painting are part of American Telephone and Telegraph’s 1939 World Fair pavilion. The speaking man, the “Voder,” uses a primitive version of a voice synthesis technology later known as the “vocoder.” First imagined by a Bell Labs engineer while he was recovering in the hospital, the vocoder was the first technology to successfully synthesize all of human speech. The Voder would spawn a number of inventions during the twentieth century, including the encrypted communication network used by the Allies used during World War II. But, in 1939, the Voder was still a novelty, like Vaucanson’s flutist or Kempelen’s speaking mouth. The machine, despite the excitement, had its limitations. It scared as much as it entertained, with one journalist calling it “The Terrifying Metal Man.” Its keyboard system was complicated and required highly trained and dexterous operators. And despite the technology’s eventual success, the Voder stumbled on basic words. When asked by an audience member to sing a lullaby, the Voder stumbled and failed, unable to pronounce the double L in “lullaby.”
The first song sung by a computer was “Daisy Bell (Bicycle Built for Two).” The rendition was performed in 1961 at Bell Labs by a vacuum-tube mainframe computer the size of a large room. The song was also sung by the dying HAL 9000 in Stanley Kubrick’s 2001: A Space Odyssey. The director asked the Canadian actor reading HAL’s lines, Douglas Rain, to sing the song close to fifty times. Before casting Rain, Kubrick asked the American actor Martin Balsam to record the role, but rejected the voice-over because Balsam’s delivery was too “colloquially American.”
Isaac Asimov’s first robot, Robbie, communicated in pantomime. Harl Vincent’s electronic robot, Rex, spoke through a “sound-wave outlet from his loudspeaker throat.” Gnut, the benevolent robot invader in Harry Bates’s “Farewell to the Master,” spoke by re-creating the voice of its god-like master, Klaatu. In the movie adaptation of the story, The Day the Earth Stood Still, the robot invader was mute. In Harlan Ellison’s “I Have No Mouth, and I Must Scream” the sentient computer communicated in printed telegraphic “talkfields,” which Ellison reproduced throughout the short story. In Arthur C. Clarke’s “Dial F for Frankenstein,” the telephone system became sentient, but said nothing, satisfied with playing sounds of the sea or harp strings in the wind. Like the HAL 9000, the computer in Stanisław Lem’s Golem XIV spoke in a human voice, its language and accent unspecified. Other Lem machines spoke with “a metallic voice,” “a voice like thunder,” “a husky voice,” “a melodious voice,” and “a hollow voice, as if . . . from an empty barrel.” Later literary robots and sentient computers would use the voices of Jack Nicholson, dead spouses, animal cries, and the default text-to- speech programs found in most laptops.
In 1985, after losing the ability to speak due to an emergency tracheotomy, Stephen Hawking began using a commercially available text-to-speech device, the CallText 5010. With the aid of a handheld clicker, Hawking used a program run on an Apple II computer to assemble the words he wished the CallText 5010 to speak. However, the CallText voice used by Hawking was not made specifically for him. Called “Perfect Paul,” the voice was recorded by Dennis Klatt for mass- market use. Hawking was English, and his new voice’s accent was American, though people sometimes told him the voice sounded Scandinavian or Scottish. Over the next thirty years, Hawking refused to alter Perfect Paul or upgrade the CallText hardware. When the hardware eventually began to fail, Hawking faced the possibility of losing his voice a second time. A number of engineers, after learning of the physicist’s difficulties, volunteered to preserve his voice by porting it to new hardware. Several months after the project was completed, Hawking died. Following his funeral service, a message from Hawking, recorded with the Perfect Paul voice, was broadcast into the nearest black hole, 1A 0620-00.
Descartes is still right, at least for now: today, no machine today can carry on a convincing conversation with us for very long. Without Bacon’s alchemy, today’s brazen heads lack even the smallest of small talk. Regardless, artificial oracles and singers are everywhere. Rock stars are replaced with holograms, and artificially intelligent software composes new pop songs. Most American pop singers rely on pitch correction software, making radio sound as if it were dominated by a single artificial voice. At the same time, artificial intelligence software is used to create false audio recordings of public figures, fueling fears that society will no longer trust audio evidence. Technology companies offer services to reconstruct the voices of assassinated politicians and dead loved ones. Objects, long thought of as incapable of speech, regularly ask us about our needs and desires. These objects not only speak to us, interrogate us, but they also study and record our behavior. After more than two thousand years, the voice, once proof of the soul, is reduced to just another interface.1
1 Several excellent books contributed to this short history of speaking machines. They are, in order of reference: Allah’s Automata: Artifacts of the Arab-Islamic Renaissance (800–1200), edited by Siegfried Zielinski and Peter Weibel; Edison’s Eve: A Magical History of the Quest for Mechanical Life, by Gaby Wood; Gramophone, Film, Typewriter, by Friedrich A. Kittler; How to Wreck a Nice Beach: The Vocoder from World War II to Hip-Hop, by Dave Tompkins; and Machines That Think: The Best Science Fiction Stories About Robots and Computers, edited by Isaac Asimov, Martin H. Greenberg, and Patricia S. Warrick. Many articles supplemented this reading, though they are too numerous to list here.