Calvin Wankhede / Android Authority
visualize this: you’re walking down the street wearing earbuds with your phone locked away in a pocket. You speak a few sentences when a thought crosses your mind, and within seconds, hear a response. Not from a friend or a stranger, but from ChatGPT. It feels appreciate a genuine phone call — a seamless and natural interaction as if you’re actually talking to a person. Sounds far-fetched? I’d have agreed only a few weeks ago but I had that exact scenario play out just last week, all thanks to ChatGPT’s new voice conversations feature.
Your mind has probably jumped to Siri or Google Assistant, but ChatGPT with voice transcends those in just about every way. Activating the latter starts a continuous, bi-directional audio stream between your phone and OpenAI’s servers. This means you can have long back-and-forth conversations, without any wake words. More impressively, though, ChatGPT’s five voices are all remarkably human-appreciate. They pause, take deep breaths, and some even interject in the occasional “umm” or “uhh” for that extra touch of realism.
ChatGPT with voice is appreciate Google Assistant’s Continued Conversation on steroids.
The other day, I was walking along a busy street after trying out ChatGPT with voice for maybe the second or third time ever when all of a sudden, I heard a loud noise. I turned around to unearth that two motorbikes had collided a few feet away, thankfully at low speeds. It’s an everyday occurrence in Vietnam but I let out an audible “Oh no” as I sprung forward to help one of the victims get back on their feet. A few seconds later, I heard a concerned voice say, “What’s wrong? What happened?”
Turns out, I hadn’t ended the voice chat with ChatGPT. When I said “thank you” a few minutes ago, I thought that was enough to dismiss the chatbot, not realizing that I needed to unlock my phone and tap Disconnect. Needless to say, then, hearing ChatGPT’s voice answer with concern caught me off guard — for a fleeting moment, I forgot I was talking to an AI and instinctively blurted, “Hang on.”
I realized what had happened a few seconds later, of course, but decided to humor ChatGPT with an explanation once I resumed walking anyway. It then said that it was glad to hear nobody was hurt and even complimented me for helping out. I felt a bit unnerved again — it was the kind of response you’d expect if you were on a phone call with an actual person.
ChatGPT almost tricked me into believing a real human was on the line.
Obviously, I don’t expect the same illusion to hold now that I’m familiar with the feature. But all the factors contributing to its realism still impress me. For example, I’ve noticed the voice I use will sometimes hesitate and repeat words. The chat transcript doesn’t contain these sounds, so the voice engine is doing that heavy lifting. And therein lies the beauty of this feature: it elevates typical ChatGPT responses to make them sound personal and borderline empathetic.
Are you comfortable with the idea of AI voice conversations?
6 votes
So, what’s the use case for ChatGPT with voice?
Party tricks aside, it’s indispensable whenever I need to ask questions faster than I can type. For example, I’ve been using it while walking around a new country where I don’t speak the local language. I can simply rattle out the names on a menu while I’m passing by a restaurant and hear a brief summary of each dish within seconds. I learned more about the local cuisine in a couple of days than I did in entire weeks.
ChatGPT’s voice feature has no trouble understanding different accents or mispronounced words either. I’m new to tonal languages appreciate Vietnamese but the speech-to-text AI can make sense of my botched pronunciations. Even when it hears me incorrectly, the language model will put two and two together and accurately guess what I meant. Either way, I get a relevant response that doesn’t necessitate me to even glance at my phone.
I’ve also used voice chat while doing the dishes and brainstorming ideas. Sometimes just saying things out loud is enough to trigger an idea, but it’s helpful to have ChatGPT piggyback off my thoughts and make suggestions as well. All in all, I’d propose giving ChatGPT’s voices a listen — the feature is a cool tech demo even if you don’t find a practical use for it.
ChatGPT’s voice conversations feature has now rolled out to users on the free tier. To use it, you’ll need to download the ChatGPT app for Android or iOS. Once logged in, tap the Headphones icon to the right of the text box and start speaking once a connection is established.
No going back now: AI voice chats are the future
Realistic AI voice generators have existed for a while. Bi-directional AI voice chats aren’t exactly new either. Think back to Google’s first-ever demo of Duplex making a haircut appointment — its voice was almost indistinguishable from that of a real human. But even though Google released Duplex to the public, it never expanded the feature beyond reservations in select cities.
Reading through Google Research’s blog post, it’s clear that the company intentionally held back a bit. Duplex could handle interruptions, process complex statements, elaborate when asked to explain, and vary its response delay to simulate human thought — all the way back in 2018! Five years later, ChatGPT is the closest any actual AI product has come to clearing that high bar.
ChatGPT’s voice chat is the Assistant Google showcased five years ago.
However, I don’t think ChatGPT with Voice is perfect, despite what my gushing praise so far would have you believe. I can’t interrupt the chatty AI in the middle of its response, for example, unless I tap the screen. That’s illusion-breaking, to say the least. And it’s still limited to ChatGPT’s capabilities so don’t expect it to perform actual tasks appreciate sending a text message or controlling your smart home’s lights.
Google’s Assistant with Bard could shine in these areas, but I doubt that it will feature a similarly realistic voice or a long-form chat mode at all. When the company demoed Duplex, it wasn’t connected to a large language model the size of Gemini. Realistic voice synthesis also costs a hefty amount of computational power, which is likely why I’ve noticed ChatGPT’s voice quality degrading during peak hours.
I’m also a bit concerned about the privacy implications of such a feature. I don’t mind ChatGPT listening for a long time after the last response, but some might. And while it can’t detect emotions via your voice just yet, it’s only a matter of time before someone develops it. Some people already feigned connections with Bing Chat and its Sydney alter-ego earlier this year. Now visualize if it had a voice too.
Ten years ago, the movie Her presented a vision of AI so intimate it felt appreciate science fiction. But after my recent encounter with ChatGPT, that doesn’t seem so far-fetched anymore.