New AI Tech Boosts Synthesized Speech and Voice Recognition

New AI Tech Boosts Synthesized Speech and Voice Recognition

image provided by pixabay

This post is also available in: heעברית (Hebrew)

Developments in speech recognition technology from IBM and California universities offer hope for patients suffering from speech loss and vocal paralysis.

IBM reported the creation of a faster and more energy-efficient computer chip capable of turbo-charging speech-recognition model output.

With the explosive growth of large language models for AI projects, limitations of hardware performance leading to lengthier training periods and spiraling energy consumption have come to light.

IBM researchers seeking a solution say their prototype incorporates phase-change memory devices within the chip, optimizing fundamental AI processes known as multiply–accumulate (MAC) operations that greatly speed up chip activity. This bypasses the standard time- and energy-consuming routine of transporting data between memory and processor.

In processor-intensive speech recognition operations, IBM’s prototype achieved 12.4 trillion operations per second per watt, an efficiency level up to hundreds of times better than the most powerful CPUs and GPUs currently in use.

Meanwhile, researchers at UC San Francisco and UC Berkeley say they devised a brain-computer interface for people who lost the ability to speak that generates words from a user’s thoughts and efforts at vocalization, with Edward Chang, chair of neurological surgery at UC San Francisco saying their goal is to restore a “full, embodied way of communicating, which is the most natural way for us to talk with others.”

According to Techxplore, Chang and his team implanted two tiny sensors on the surface of the brain of a woman suffering from ALS (a neurogenerative disease that makes the person gradually lose their mobility and speech). The sensors were connected through a brain-computer interface to banks of computers housing language-decoding software.

The woman went through 25 training sessions in which she read sets of a few hundred sentences and her brain activity was translated by the decoder, which detected phonemes and assembled them into words. Researchers then synthesized her speech based on a recording of her speaking at a wedding years earlier and designed an avatar that reflected her facial movements.

After four months of training, the model was able to track the subject’s attempted vocalizations and convert them into intelligible words, and when based on a training vocabulary of 125,000 words, the accuracy rate was 76%.

The system was also able to translate the subject’s speech at a rate of 62 words per minute, which although an improvement from past experiments is still far from the 160-word-per-minute rate of natural speech.

Despite not yet being an actual device people can use in real life, this scientific proof of concept is a big step toward a world in which people with paralysis can speak once more.