WellSaid Labs, a leading artificial intelligence (AI) voice company, unveiled a new technology today that allows users to direct the performance of AI voices in a more natural, nuanced way. The technology, called HINTS (Highly Intuitive Naturally Tailored Speech), enables content creators to shape AI voices by adding contextual annotations, like tempo or loudness adjustments, just like a movie director.

“We have long heard from our customers that they would like to have more direction in shaping our AI’s vocal outputs,” Michael Petrochuk, co-founder and CTO of WellSaid Labs, said in an exclusive interview with VentureBeat. “We wanted to develop a system that is intuitive and natural, that allows our model to predict natural performances based on the users’ production context, so that creatives can better see their artistic vision through.”

Meeting creative needs with AI

Unlike current methods of controlling AI voices through rigid markup languages or prompts, HINTS allows for fine-grained and interpolable adjustments. For example, users can make a specific passage slower by precisely 0.7x or louder by 5 dB, with the AI voice responding naturally. The contextual awareness means annotations can be nested and layered across long scripts. 

“Because it is using actual (consensually obtained) human data to make its final audio outputs, its annotated verbalizations are just as ‘realistic’ as unannotated outputs,” Petrochuk told VentureBeat. “Interestingly, we discovered in this research that not only is the model able to model a single dataset effectively, it can be generalized even further and use performances from multiple speakers to inform its use of prosody. We were floored when we first heard this, and it seriously highlights what’s to come with further research.”

VB Event

The AI Impact Tour – NYC

We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

 


Request an invite

Expanding creative possibilities

HINTS addresses a longstanding need for more customizable and director-focused AI voice tools. The new architecture could unlock creative possibilities for voice-based content across audiobooks, training narrations, marketing videos, and more. Early evaluation shows improvements in accuracy and naturalness.

The research also emphasizes responsible and ethical AI practices. “We have been committed to ethical innovation from the start,” said Petrochuk. WellSaid obtains explicit consent from voice donors, protects privacy, and moderates content to prevent misuse or deception. 

With vocal AI increasingly embedded in consumer tech and entertainment, HINTS demonstrates how the technology can become an empathetic storytelling medium, not just a vocal machine. While limitations remain compared to working with human talent, tools like HINTS bring us one step closer to truly expressive synthetic voices.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link