We often take speech for granted. In a few seconds or less, brain activity fires, vocal muscles contract, and words spill out. Scientists have long sought to replicate this flurry of activity with devices called brain-computer interfaces. But as with anything that involves the brain, it’s proved quite a challenge.
Now, with the rapid development of artificial intelligence and better ways of capturing brain signals, these challenges are getting easier to overcome. This week, two research groups — one at Stanford University and the other at the University of California, San Franciso — published the results of two new brain-computer interfaces (BCIs) each team independently devised. They offer a major upgrade in terms of how fast they translate brain activity into speech, and they even digitally convey facial expressions. Both groups used their BCIs to help two separate individuals — one who couldn’t communicate due to a brainstem stroke and the other who lost communication due to amyotrophic lateral sclerosis (or ALS), a neurodegenerative condition — to speak again.
Findings from both studies were published separately on Wednesday in the journal Nature.
“These two papers are exciting because they are two independent data points that both show a big leap forward in the accuracy and generality of speech by [BCIs],” Francis Willett, the first author of the Stanford-led study and a staff scientist at the university’s Neural Prosthetics Translational Lab, said in a press conference to reporters.
“With these new studies, it’s now possible to imagine a future where we can restore fluid conversation to someone with paralysis, enabling them to freely say whatever they want with accuracy high enough to be understood.”
A digital avatar
BCIs work like this: a neural implant, placed anywhere inside or outside the brain, picks up on the brain’s electrical activity. A computer analyzes the brain activity and translates it into commands that a separate device — whether a prosthetic, robotic hand, or a speech synthesizer — carries out.
There’s a big disclaimer with all of this. BCIs can’t read minds or extract information without the user’s awareness (at least for now).
To pick up on the brain’s activity, the Stanford participant, a 68-year-old woman with ALS, had a brain electrode implanted slightly into her cerebral cortex at a depth roughly the size of two stacked quarters. The UCSF participant, a 47-year-old woman paralyzed due to a brainstem stroke over 18 years ago, had a mesh-like neural implant placed at the surface of her brain above the areas involved in speech.
Both BCIs used algorithms trained to recognize phonemes — or the basic building blocks of speech sounds that differentiate words from each other — from neural activity and turned that into text. For example, if you wanted to say “Hello,” the BCI would pick up on neural activity associated with four phonemes: “HH,” “AH,” “L,” and “OW.” Since there are a limited number of phonemes (a little over 40) compared to numerous words in the English language, this approach enables the BCI to decode virtually any word with high accuracy.
To train their device, the Stanford researchers had their participant spend 25 training sessions, each four hours long, teaching the BCI’s algorithm her phoneme-associated brain signals. This was done by repeating sentences chosen randomly from a large data set compiled from sample phone conversations.
Stanford’s BCI allowed the participant to communicate at an average of 62 words per minute with an error rate of around 24 percent for a 125,000-word vocabulary, or nine percent for a 50-word vocabulary.
“This means that about three in every four words were deciphered correctly,” said Willett.
The UCSF participant spent less than two weeks training her BCI, tirelessly repeating different phrases from a 1,024-word conversational vocabulary the researchers created.
“We decoded [sentences] with a 25 percent word error rate at 78 words per minute,” Sean Metzger, the UCSF paper’s first author, said during the press conference. “We also showed we were able to decode intelligible speech sounds from our participant’s neural activity. Using a clip from her wedding video, we were able to decode the sound into a voice that sounded like her own prior to her stroke.”
Not only that, the UCSF team created a digital avatar — capable of expressing nine different facial expressions ranging between happy and sad — based on the participant’s neural activity. The original idea for the avatar was to provide real-time feedback to the participant as she trained the BCI, said Edward Chang, a neurosurgeon at UCSF who led the study.
“Once [the avatar] started to work… it was a new experience for both us and our participant,” Chang told reporters. “For her to hear her own voice, to think about the words and hearing sounds coming from the machine of what she intended to say, but also to see a talking face.”
Paving the way for future BCIs
These developments are the latest in the ongoing work of melding mind with machine, as smoothly and effortlessly as possible, for individuals in need. While companies like Neuralink and Synchron aim to commercialize BCIs for all consumers (though, for the immediate future, their goal is still to help people with spinal cord injuries and other communication disorders), the researchers of the new studies believe their findings could carry over to that market.
“My opinion is that the technologies we’re describing, the algorithmic advances that both of these papers describe, are really usable with a number of different kinds of technology,” Jaimie Henderson, a professor of neurosurgery at Stanford and his group’s lead researcher, said during the press conference. “We’re not wedded to the implant itself necessarily. I think there will be many opportunities to apply the fundamental neuroscientific principles and algorithmic advances we’re currently working on to the commercial world.”
Something like a digital avatar constructed from neural activity may even help those with physical disabilities interact with emerging social interactive platforms like the Metaverse, said Chang of UCSF.
“A lot of development is going on right now in the digital world where people can go on and essentially have meaningful interactions. This is something that’s changing society,” he said. “The key here is that people with paralysis [and other conditions] will have some means to participate in that more fully.”
The studies are a scientific leg up, but as with any developing tech, there are improvements the researchers intend with future research. An obvious one is increasing the words per minute to bring their BCIs to conversational levels and improve decoding accuracy.
“Right now we’re getting one out of every four words wrong. I hope the next time… we get maybe one out of every 10 words wrong,” said Willett of Stanford. “I think one pathway we’re excited about exploring is [using] more electrodes. We need more information from the brain, we need a clearer picture of what’s happening.”
Henderson likened the groups’ efforts to the heydays of television broadcasting — from early receivers and grainy resolution to the brilliance of high definition and beyond.
“We’re sort of at the era of broadcast TV of the old days right now,” he said. “We need to continue to increase the resolution to HD and then on to 4K so we can continue to sharpen the picture and improve the accuracy.”