The first developments in speech recognition predate the invention of the modern computer by more than 50 years. Alexander Graham Bell was inspired to experiment in transmitting speech by his wife, who was deaf. He initially hoped to create a device that would transform audible words into a visible picture that a deaf person could interpret. He did produce spectrographic images of sounds, but his wife was unable to decipher them. That line of research eventually led to his invention of the telephone.
For several decades, scientists developed experimental methods of computerized speech recognition, but the computing power available at the time limited them. Only in the 1990s did computers powerful enough to handle speech recognition become available to the average consumer. Current research could lead to technologies that are currently more familiar in an episode of "Star Trek." The Defense Advanced Research Projects Agency (DARPA) has three teams of researchers working on Global Autonomous Language Exploitation (GALE), a program that will take in streams of information from foreign news broadcasts and newspapers and translate them. It hopes to create software that can instantly translate two languages with at least 90 percent accuracy. "DARPA is also funding an R&D effort called TRANSTAC to enable our soldiers to communicate more effectively with civilian populations in non-English-speaking countries," said Garofolo, adding that the technology will undoubtedly spin off into civilian applications, including a universal translator.
A universal translator is still far into the future, however -- it's very difficult to build a system that combines automatic translation with voice activation technology. According to a recent CNN article, the GALE project is "'DARPA hard' [meaning] difficult even by the extreme standards" of DARPA. Why? One problem is making a system that can flawlessly handle roadblocks like slang, dialects, accents and background noise. The different grammatical structures used by languages can also pose a problem. For example, Arabic sometimes uses single words to convey ideas that are entire sentences in English.
At some point in the future, speech recognition may become speech understanding. The statistical models that allow computers to decide what a person just said may someday allow them to grasp the meaning behind the words. Although it is a huge leap in terms of computational power and software sophistication, some researchers argue that speech recognition development offers the most direct line from the computers of today to true artificial intelligence. We can talk to our computers today. In 25 years, they may very well talk back.
The potential problems with using speech recognition were on public display recently in a Windows Vista demonstration. While the system performed flawlessly at opening programs and accessing documents, when it came to transcribing text, it wasn't very accurate. The problems likely stemmed from the background noise and echo present in the large auditorium with an audience where the demo took place. A video of the incident soon spread across the Internet, hurting the reputations of Windows Vista and speech recognition in general.