It’s time to celebrate the incredible women leading the way in AI! Nominate your inspiring leaders for VentureBeat’s Women in AI Awards today before June 18. Learn More
Weeks after AI voice startup ElevenLabs launched its Sound Effects text-to-sound AI offering, the company is releasing an open-source tool to showcase its potential. In “about 15 seconds,” this application enables creators to generate sound effect samples for their videos, analyzing the imported clip and providing multiple options.
While developers can access the app’s code on GitHub, ElevenLabs has published a website for the public to try out its Sound Effects API.
When you upload a video, the so-called Video to Sound Effects app extracts four frames at one-second intervals on the client side. Then, it sends those frames and a prompt to OpenAI’s GPT-4o to create a custom text-to-sound effects prompt. That prompt is then used to generate a sound effect through ElevenLabs’s Sound Effects API. Finally, the video and audio are combined on the client side into a single file ready for download that can be up to 22 seconds long.
“We view it as a proof of concept of what people will be able to do with our SFX API,” Ammaar Reshi, ElevenLabs’ design lead, tells VentureBeat. “AI video creators are often searching for the perfect sound effect and we felt like we could speed up the workflow intelligently by understanding the frames in their videos and then suggesting the best output.” He says the company is excited about the different kinds of dynamic experiences people might build with this API, highlighting immersive video games as one example where sounds may be generated based on a player’s interaction.
VB Transform 2024 Registration is Open
Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now
The aforementioned API allows developers to build fully custom AI sound effects using a short description. ElevenLabs charges 100 characters per generation with an automatic duration or 25 characters per second with a set duration.
In a brief test, the video-to-sound effects app appeared simple. After importing an audio-free movie of a vehicle navigating an all-terrain environment, ElevenLabs’ AI generated four options, all sounding like a car traversing on a gravel road. But while it’s amusing to apply sound effects to clips, perhaps the real potential is for this capability to be integrated into a larger system to derive the real benefits.
And as the AI video generation space heats up, ElevenLabs might be looking to stay ahead of everyone, developing new audio solutions it knows will be in demand by developers, filmmakers and creators.