Programming the Sound of One Hand Clapping
Forty miles south of San Francisco, on a windy knoll overlooking the choice farmland colonized a century or so ago by Leland Stanford—and turned into a university by his wife—sits a collection of computer hardware, software and diverse human talents that likely qualifies as the most sophisticated musical instrument on the planet. Already it is capable of imitating a range of orchestral instruments from violin to trumpet so flawlessly that not even a trained musician can hear the difference. Ultimately the device will produce an unlimited range of sound. Its developers envision the day when composer or performer will sit at an organ-sized console—some amalgam of computer terminal and keyboard instrument—and with the touch of a finger produce the sound of a 16-voice chorus. Or, perhaps, a 16-collie chorus. Or any sound the composer can imagine—and, quite likely, some that he can’t.
The entire venture began with a fairly simple question. “Why is it,” asked John Chowning, the energetic and personable composer/professor and guiding force behind Stanford’s brand-new Center for Computer Research in Music and Acoustics, “electronic sound does not have the richness of interest that exists in natural sound?” In only two decades, the unique tones of electronic synthesizers have managed to infiltrate nearly every facet of popular music as well as spawn an entirely new school of electronic composition. But the unique can also become the overly familiar and, as Chowning points out, “Very often, in electronic music, one can hear what’s going on. You learn to recognize a filtered square wave, say, or a ring-modulated sound. And so it loses a sort of mystery. That’s not to say electronic sounds should be like natural sounds, but there seems to be some quality lacking.”
Nonetheless, Chowning—who has worked with computer music for more than ten years—believes the loudspeaker will be a major instrument of the future. And so he and his associates, who range from an electrical engineer to a perceptual psychologist, have set out to define those qualities that have thus far separated the sound produced by semiconductor chips and loudspeakers from that of reeds, strings and resonant chambers. Their tool— that “ultimate instrument”—is a roomful of computer hardware called a PDP-10 and worth close to a million dollars. “What we have for the first time,” Chowning says, “is the ability to control the musical event from the most elemental level to the most formal.” The initial results of that ability have attracted the rapt attention of electronic musicians from Pasadena to Paris—and, in the last year, more than half a million dollars in federal grant money from both the National Science Foundation and the National Endowment for the Arts.
With good reason. Deep in the recesses of Stanford’s plate-glass and redwood Artificial Intelligence Laboratory, one can sit in Chowning’s tiny lab, surrounded by Altec speakers, and hear a clarinet note—artificially produced by the computer but uncannily lifelike—slowly mutate into a cello tone. Just as soon as it is a cello, it changes with equal fluidity into an oboe—and then into a French horn, and then back in circular fashion to the original clarinet. And later, during a Chowning computer composition called “Turenas,” the air fills with the sound of a tiny, glasslike chime, wandering lazily around the room—and gradually retreating to a distance seemingly greater than the bounds of the lab—until the chime begins to change into the sound of a thousand-pound gong powerful enough to shake the linoleum floor.
The demonstration is both amazing and discomfiting—the aural equivalent of the motion-picture technique that transforms fresh-faced actors into wolfmen. Yet the sounds have such palpable realism that one understands immediately that the computer is in a class by itself when it comes to fooling the human senses. The science behind this deception has as much to do with what we can’t hear as what we can.
Chowning’s earliest work was with distance cues: how one tells where sound is coming from and the dimensions of the space in which it occurs. “In an orchestra,” he explains, “we have different distances and angles, and a complete, reverberant field. All this seemed to be missing in electronic music, because whatever I did, it came from the same point—absolutely without character as far as presence was concerned.” After several years of research, Chowning developed a combination of computer-generated reverberation and doppler shift (the velocity-induced change of pitch one hears, say, in a passing train whistle) that could produce the sound emanating from wherever the composer cared to put it—even, by altering the reverberation, from somewhere outside the room. Chowning could, moreover, make it seem the sound source was moving—not simply in the ping-pong fashion of stereo and quad demo recordings, but in terms of three-dimensional space.
Unfortunately, the four-speaker blend of constantly changing information is hideously complex. That’s where the computer comes in. Chowning demonstrates by sitting down in front of a bright green cathode-ray tube and using a light-pen (a pencil-shaped probe that allows one to “draw” on a computer screen) to sketch, electronically, a complicated swirling pattern of sound motion around a graphic representation of the listening space.
“If we tried to plot the control patterns for that, for each of the four speakers,” Chowning says, “it would be impossibly complex. But the program here doesn’t care.” He types a brief command on the keyboard of the computer terminal, and in a moment the screen displays graphs that describe precisely the amplitude and frequency changes required for each of the four speakers, over time. When those instructions are fed to the speakers controlling a set of quiet bell tones, the location of those bells seems to follow, exactly, the pattern that Chowning originally sketched. “The result,” he concludes, “is to liberate sound from the loudspeaker.”
Commercial quad is an obvious market for this sort of super-real simulation; Chowning and an associate are considering an album of Chowning’s computer compositions. But the product may be too far ahead of present packaging: most quad-encoding systems have “holes”—points of information loss—that might render Chowning’s localization methods less effective. Chowning himself is not sure how much information must be retained for the illusion to still operate. “When Columbia was testing different systems for quad a few years ago,” Chowning says, “they took some of our tapes for subjects. We still haven’t heard how it came out.”
It’s not difficult to accept the fact that a computer can create the illusion of moving sound and reverberant space; what is striking is just how effective that illusion can be. But how does one make a synthetic clarinet tone? Better yet, how does one make a clarinet turn into a cello?
The process begins with finding out how a real clarinet tone works. When Chowning’s research associates, John Grey and J. “Andy” Moorer, used the computer to dismantle various natural tones, they encountered an interesting perceptual question: how much acoustical information in a tone is really necessary to re-create the impression of the original? The answer is surprising: a fully lifelike synthesis requires far less information than the musical instrument produced in the first place. “It was a bit awesome,” says Chowning. “We had no idea the ear throws away so much information.”
John Grey, a soft-spoken, long-haired psychologist whose specialty is dubbed “psychoacoustics,” demonstrates with a tape of a single real clarinet tone. Initially, the tone has been electronically dissected, first into its fundamental frequency, and then into each separate harmonic—odd squeaks and squawks that are multiples of the fundamental—ascending all the way past the limits of audibility. He then used the computer to rearrange the tape so that one first hears the fundamental by itself, and then with each harmonic added, one by one. The result is a strange, unidentifiable sound—the fundamental standing alone—gradually turning into a clarinet. Grey poses the crucial question: “When does it first become a clarinet? And when does it stop changing?”
Grey turns to the computer screen and summons up an intricate, three-dimensional plotting of the same dissected clarinet tone, displaying the contour of the fundamental and each of the first eight harmonics, arranged like a series of progressively smaller mountains rising from the coordinates of the graph. Each curve is jagged and intricate. Grey types out another order on the keyboard. Abruptly the plotting of the natural clarinet tone vanishes and is replaced by another image. The parameters of the graph are the same as before, but this time the harmonics have been drawn in by Grey himself as far simpler constructions of straight lines. When the computer generates the sound that Grey’s sketch represents, however, it sounds precisely like the original clarinet. Grey shakes his head: “I still have a hard time convincing myself that you can take a bunch of harmonics, add them up and get a real tone.”
But one can, and with that discovery the possibilities became virtually unlimited. Grey demonstrates, for example, that by selectively attenuating only the fundamental of the synthetic clarinet tone, it very accurately creates a muted clarinet. “And we’re still getting better at fooling people, in simpler and simpler ways,” he says.
The process that makes all of this delicate and precise manipulation of sound possible is called “direct digital synthesis” —an entirely different method than the analog synthesis used in all commercial synthesizers. While the analog synthesizer constructs sound as one constant, fluctuating signal, the digital unit generates tens of thousands of individual pieces (digits) per second. Each signifies the strength of the signal at a particular, discrete moment in time; when converted finally to an analog signal and run through a loudspeaker, the digits create sound waves. Digital synthesis therefore provides control over the nature of the sound generated.
Digital synthesis will not arrive in your local music store next week. The system at Stanford includes not only a hefty computer and some thoroughly hairy programming, but an additional unit—for converting digital to analog and vice versa —that cost $7000 and took nine months to build. But the advantages of digital synthesis over analog are so overwhelming that it is just a matter of time before the digital method is adapted for commercial purposes. One company is rumored to have a digital organ under development.
One can, moreover, use a similar technique to take “natural” music, digitize it, store it in the computer memory bank and then reproduce and manipulate it with a precision impossible in even the finest contemporary recording studio equipment. Mixing, music storage and even the automated production of scores will likely be revolutionized by digital technology—as will the music itself.
Loren Rush, another composer and researcher, displays a tape of a digitally processed natural trombone phrase altered, very subtly, to produce a sound that, while still a trombone, has an attack impossible for the most skillful trombonist to actually produce. Rush, who has written a piece for computer and orchestra to be performed in seven cities next season, calls this “a kind of music that explores impossibilities: presenting performers in contexts that strain them to their limits.”
The Stanford group is now working toward its next major goal, the development of systems to allow “real-time” synthesis—the final step required to bring the computer into its own as an accessible musical instrument for both composition and, ultimately, live performance. Says Chowning: “Tape is a loser—noise, drop-out, etc.—and we’d like to avoid that, using real-time equipment, so that you actually do the computation as the sound is generated.”
The field is still so new that even the researchers have a hard time predicting the directions these new techniques will take, and what effect they will have on music. The Stanford researchers emphasize that simulation of natural instruments is only an experimental way station for exploring how complex, satisfying, natural sound works. In fact, the real advantage of computer music is what one researcher calls “the spaces between the instruments.”
Pierre Boulez, the expatriate French composer/conductor, spent two weeks last summer studying at the Stanford Center. When his New York Philharmonic contract ends in 1977, he will return to Paris to oversee a multimillion-dollar complex of labs, studios and performance spaces, all dedicated to the exploration of computers in music. “Music,” he has said, “cannot move forward without science.”
Speaking of the computer while at the Center, he said, “It is like learning to play a new instrument, or learning to speak a new language, such as Japanese. It would not be easy, but one could do it, no?” John Chowning has an answer to that which may carry a hint of the future: “Learning a programming language is really no more difficult than learning counterpoint.”