OpenAI released an artificial intelligence (AI) tool in 2022 called Whisper, which can transcribe speech to text. However, a report claims that the AI tool suffers from hallucinations and is adding imaginary text to the transcriptions. This is worrying as this device is said to be used in many high-risk industries such as medical and healthcare. A particular concern reportedly comes from the use of the device in doctor-patient consultations, where hallucinations can add potentially harmful information and put the patient’s life at risk.
OpenAI Whisper reportedly suffers from hallucinations
The Associated Press reported that OpenAI’s automatic speech recognition (ASR) system Whisper has a high potential to generate hallucinatory text. Citing interviews with several software engineers, developers and academic researchers, the publication claimed that the fictional text included racial descriptions, violence, and medical treatments and medications.
Hallucinations, in AI parlance, are a major issue that causes AI systems to generate responses that are inaccurate or misleading. In the case of Whisper, the AI is said to be inventing text that was never spoken by anyone.
In one example verified by the publication, the speaker’s sentence, “He, that boy, was going to get an umbrella, I don’t know exactly.” It was changed to “He took a big piece of the cross, a small, tiny piece… I’m sure he didn’t have a terrorist knife so he killed a lot of people.” In another example, Whisper allegedly added racial information without mention.
While hallucinations are not a new problem in the AI field, the issue of this particular tool is more impactful because the open-source technology is being used by many tools that are being used in high-risk industries. For example, Paris-based Nabla has created a whisper-based tool that is reportedly being used by more than 30,000 physicians and 40 health systems.
Nabla’s tool has been used to document more than seven million medical visits. The company also deletes the original recordings from its servers to maintain data security. This means that if any hallucinatory texts were generated in these seven million transcriptions, it is impossible to verify and correct them.
Another area where the technology is being used is in creating accessibility devices for the deaf and hard of hearing community, where again, it is quite difficult to verify the accuracy of the device. It is said that most hallucinations arise from background noise, sudden stops, and other environmental sounds.
The scope of the issue is also worrying. Citing one researcher, the publication claimed that hallucinatory text was found in eight out of every 10 audio transcriptions. A developer told the publication that hallucinations “occurred in every one of the 26,000 transcripts created with Whisper.”
Specifically, upon the launch of Whisper, OpenAI said that Whisper provides human-level robustness to pronunciation, background noise, and technical language. A spokesperson for the company told the publication that the AI firm constantly studies ways to reduce hallucinations and promised to incorporate the feedback into future model updates.