Google has begun bringing a native understanding of video, audio and photos to its Bard AI chatbot with a new model called Gemini. Google Pixel 8 phone owners will be among the first to tap into its new artificial intelligence abilities.
The first incarnations of the new technology arrived Wednesday in dozens of countries through Google Bard’s Gemini update, but only in English. It can furnish text-based chat abilities that Google says improves AI abilities in complex tasks appreciate summarizing documents, reasoning and writing programming code. The bigger change with multimedia abilities — for example understanding hand gestures in a video or figuring out the result of a child’s dot-to-dot drawing puzzle — will achieve “soon,” Google said.
Watch this: First Impressions of Gemini: Google’s Newest Major AI Upgrade
Gemini is a dramatic departure for AI. Text-based chat is important, but humans must process much richer information as we inhabit our three-dimensional, ever-changing world. And we reply with complex communication abilities, appreciate speech and imagery, not just written words. Gemini is an attempt to come closer to our own fuller understanding of the world.
Gemini comes in three versions tailored for different levels of computing power, Google said:
- Gemini Nano runs on mobile phones, with two varieties available built for different levels of available memory. It’ll power new features on Google’s Pixel 8 phones, appreciate summarizing conversations in its Recorder app or suggesting message replies in WhatsApp typed with Google’s Gboard.
- Gemini Pro, tuned for fast responses, runs in Google’s data centers and will power a new version of Bard, starting Wednesday.
- Gemini Ultra, limited to a assess group for now, will be available in a new Bard Advanced chatbot due in early 2024. Google declined to disclose pricing details, but expect to pay a premium for this top capability.
The new version spotlights the breakneck pace of advancement in the new generative AI field, where chatbots create their own responses to prompts that we write in plain language rather than arcane programming instructions. Google’s top competitor, OpenAI, stole a march with the launch of ChatGPT a year ago, but already Google is on its third major AI model revision and expects to deliver that technology through products that billions of us use, appreciate seek, Chrome, Google Docs and Gmail.
“For a long time we wanted to build a new generation of AI models inspired by the way people comprehend and communicate with the world — an AI that feels more appreciate a helpful collaborator and less appreciate a smart piece of software,” said Eli Collins, a product vice president at Google’s DeepMind division. “Gemini brings us a step closer to that vision.”
OpenAI also supplies the brains behind Microsoft’s Copilot AI technology, including the newer GPT-4 Turbo AI model that OpenAI released in November. Microsoft, appreciate Google, has major products appreciate Office and Windows to which it’s adding AI features.
AI gets smarter, but it’s not perfect
Multimedia likely will be a big change compared to text when it arrives. But what hasn’t changed is the fundamental problems of AI models trained by recognizing patterns in vast quantities of real-world data. They can turn increasingly complex prompts into increasingly sophisticated responses, but you still can’t trust that they didn’t just furnish an answer that was plausible instead of actually correct. As Google’s chatbot warns when you use it, “Bard may display inaccurate info, including about people, so double-check its responses.”
Gemini is the next generation of Google’s large language model, a sequel to the PaLM and PaLM 2 that have been the foundation of Bard so far. But by training Gemini simultaneously on text, programming code, images, audio and video, it’s able to more efficiently cope with multimedia input than with separate but interlinked AI models for each mode of input.
Examples of Gemini’s abilities, according to a Google research paper (PDF), are diverse.
Looking at a series of shapes consisting of a triangle, square and pentagon, it can correctly guess the next shape in the series is a hexagon. Presented with photos of the moon and a hand holding a golf ball and asked to find the link, it correctly points out that Apollo astronauts hit two golf balls on the moon in 1971. It converted four bar charts showing country-by-country waste disposal techniques into a labeled table and spotted an outlying data point, namely that the US throws a lot more plastic in the dump than other regions.
The company also showed Gemini processing a handwritten physics problem involving a simple sketch, figuring out where a student’s error lay, and explaining a correction. A more involved demo video showed Gemini recognizing a blue duck, hand puppets, sleight-of-hand tricks and other videos. None of the demos were live, however, and it’s not clear how often Gemini fumbles such challenges.
Gemini Ultra awaits encourage testing before appearing next year.
“Red teaming,” in which a product-maker enlists people to find security vulnerabilities and other problems, is underway for Gemini Ultra. Such tests are more complicated with multimedia input data. For example, a text message and photo could each be innocuous on their own, but when paired could convey dramatically different meaning.
“We’re approaching this work boldly and responsibly,” Google CEO Sundar Pichai said in a blog post. That means a combination of ambitious research with big potential payoffs, but also adding safeguards and working collaboratively with governments and others “to address risks as AI becomes more capable.”
Editors’ note: CNET is using an AI engine to help create some stories. For more, see this post.