“Now behold the rock pile, a splendid collection of stone ensconced within wood boxes,” said Boston Dynamics’ famous Spot robot dog, mustering all the emotional vitality of a beleaguered tour guide eight hours into their shift. The Boston Dynamics engineers are famous for their autonomous robot designs, and their latest innovation gave their robodog the chance to speak for itself with yet another integration of ChatGPT.
“My employment as a tour guide provides great satisfaction,” Spot told Boston Dynamics principal software engineer Matt Klingensmith. “I find that this termination of knowledge rather rewarding, won’t you agree?”
The Boston Dynamics team showed how the chatbot integration worked with Spot in a video uploaded Thursday. The big yellow dog did seem to glitch when Klingensmith tried to tell it that he loved the bot’s accent. Instead of responding immediately, it continued with the tour, saying “Keep close,” spun in a circle, and only then responded to the engineer’s prompt. Spot then offered a description of the lab’s world-famous knee-high calibration board of QR code tags.
Boston Dynamics robots have proved they can dance and even parkour, but with generative AI they can now hear and respond directly to human input. The bot has multiple personalities, including a “precious metal cowgirl” who can’t help but talk excitedly about the minerals potentially found underneath all that stone. Another, the “Shakespearean time traveler,” would only respond in rhyming couplets. The sarcastic “Josh” personality told Klingensmith “I see the unfathomable void of my existence reflected in this QR code-filled board… oh and also a large window.”
The software engineer said the team created multiple ChatGPT integration demos during a recent hack-a-Thon, and the “tour guide” function was apparently one of the more interesting applications. Spot could act as a full guide for the Boston Dynamics headquarters, offering small tidbits about past bots made by the company. It could even point out its “parents,” or the older Spot models on display in the building.
The bot was programmed with a script and map of the headquarter’s rooms and exhibits, then it used in-built cameras and image recognition technology to comprehend what was happening around it.
Everything else was merely the ChatGPT API with an added voice synthesization on top. ChatGPT-creator OpenAI recently added voice and image recognition to its world-famous chatbot. That system can also “speak” back to users with AI-generated voice lines synthesized from real-life voice actors. Boston Dynamic’s voice was much more computerized than OpenAI’s recent addition, and it was likely designed before OpenAI’s latest update.
The video was a lighthearted showcase of what a talking bot could do, yet the team might have delved a little too deep into overt AI hype, according to Klingensmith.
“This kind of technology might make it possible for robots not just to follow our commands, but in some sense understand the actions they can take and the context of the world around them,” he said.
“In some sense” is doing a lot of heavy lifting there. Modern language models are extremely capable of producing language that seems natural, but no chatbot actually comprehends or “understands” what it’s doing. Combined with voice and image recognition, ChatGPT has the capacity of seeming intelligent, but in truth, it’s merely putting phrases together that fit the required prompt.