By Barbara Hale
Researchers have developed a prototype system to help visitors locate campus parking lots and buildings by talking with a computer-controlled map that responds not only to the spoken word but also to natural hand gestures.
"There still is a lot of work to be done but we have a pretty fair, ground-level, demonstration model of a system in which a person can interact with a computer by using the most natural human mode of communication -- talking while gesturing," said project leader Rajeev Sharma, assistant professor of computer science and engineering.
"Besides the current application, the system could potentially be adapted to help tourists locate the sights in large cities, shoppers to find stores in malls, visitors to find patients in hospitals or even for roles in crisis management, mission planning and briefing."
In a recent demonstration, doctoral student Sanshzar Kettebekov stood about 5 feet away from a map of the University Park campus projected on a 4-foot-by-3-foot screen.
"Scroll," he said gently into the cordless microphone attached to his T-shirt and the map moved. "Stop," Kettebekov directed and the map did.
He waved his hand in the air and a little red hand appeared on the screen. As Kettebekov continued to gesture with his hand, the on-screen hand followed it, like a cursor obeying a mouse. When the red hand settled on one of the buildings, Kettebekov said, "Show me the nearest parking lot," and a bright blue line immediately appeared and connected the building to the closest lot.
The system is based on off-the-shelf hardware. The computer is a standard PC workstation equipped with a video camera, the system's "eye" on the gesturing human. A commercially available
speech recognition package currently takes care of the conversation. However, the researchers developed new gesture recognition software and used footage of TV weather broadcasters narrating the weather to "train" it.
The new gesture recognition software is based on a technique called Hidden Markov Models (HMM), a time-varying pattern recognition method. HMMs had been used previously in gesture recognition systems. However, only predefined gestures, such as sign language, had been used. The new approach, based on weathercaster movements, enables the computer to recognize and "understand" a rich store of natural gestures that occur in combination with speech.
At this point, although the system recognizes quite a few human gestures and spoken words, it doesn't like small talk. You can't tell it, "Well, I'd like to go to the Creamery for an ice cream cone first and then stop off at Old Main before parking at Beaver Stadium."
At least not yet.
Yuhui Zhou, a master's degree candidate with a background in linguistics, is working on dialog design and feedback systems that will enable the computer to extract the most pertinent information from a human conversation stream. Doctoral candidate Jiongyu Cai is working on extracting the salient gestures from the random hand waving that most people use while talking. Kettebekov is trying to understand the combination of speech and gestures so that he can develop software that enables the computer to interpret gestures in the speech context.
The research team also is paying attention to the fact that people from different cultures gesture differently but, at present, plans call for the map to respond only to English.
"Computer users have been slaves to the mouse and the keyboard too long. The equipment has, so far, limited the potential for human interaction with computers," Sharma said. "Incorporating gesture, which computer vision makes possible, allows us to imagine all kinds of potential applications. For example, I can imagine a computer you wear on your head, like a virtual reality helmet, that could help you repair your PC by telling you what to do and then 'watching' as you do it. Or, a wearable computerized surgical aide that could help direct a surgeon to the precise location of a tumor."
For now, the group is working to enable the computer to more effectively talk back to the user.
"We'd like to model the human/computer dialog so that the display could interactively influence the user input enabling the computer to play a more active role in the natural speech/gesture interface," Sharma said.
Back to news index
Back to Intercom home page