AI startup Anthropic said Monday that the latest version of its Claude family of AI models, Claude 3, exhibits “human-like understanding,” a bold, though not entirely unprecedented, assertion by a maker of generative AI chatbots.

Compared to prior versions, the Claude 3 family can handle more complicated queries with higher accuracy and enhanced contextual understanding, Anthropic said. The latest family of models is also better at analysis and forecasting; content creation; code generation; and conversing in languages like Spanish, Japanese and French, the company said. However, it’s worth noting that while chatbots can understand and predict content, they don’t truly understand the meaning of words as we do.

Read more: AI Chatbots Are Here to Stay. Learn How They Can Work for You

In ascending order of horsepower, the three models in the Claude 3 family are: Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus.

The pace of updates and releases among generative AI companies has been accelerating since the release of text-to-image model Dall-E in 2021. In February, Google released the latest version of its model, Gemini 1.0 Ultra, and teased Gemini 1.5 Pro. ChatGPT maker OpenAI debuted its GPT-4 Turbo model in November. Microsoft announced its “AI companion,” Copilot, in September. All these companies are looking to stake a claim in a generative AI market projected to reach $1.3 trillion by 2032.   

According to Anthropic, anyway, Opus outperforms its rivals on AI benchmarks like undergraduate-level expert knowledge, graduate-level expert reasoning and basic mathematics. To be fair, Google has said its Gemini 1.5 model has “the longest context window of any large-scale foundation model yet,” referring to the “context window” measurement of how much a model can recall at once. OpenAI for its part called its GPT-4 Turbo model “more capable [and] cheaper” than previous models as it also supports multimodal capabilities like vision, image creation and text-to-speech.

Anthropic said its Claude 3 family sets “a new standard for intelligence,” with the models more accurate than previous models and better able to follow multistep instructions.

For example, compared to Claude 2.1, which came out in November, Opus has shown a twofold improvement in accuracy on open-ended questions, Anthropic said. In addition, the company will soon enable citations, making it easier for Claude 3 users to verify answers within reference material.

The Claude 3 models are also “significantly less likely” to refuse to answer harmless prompts than their predecessors, as they have “a more nuanced understanding of requests” and “recognize real harm,” Anthropic said. That means users who make queries that don’t violate any guidelines are more likely to get responses from the Claude 3 models.

As of Monday, Sonnet is available via claude.ai and Opus is accessible to Claude Pro subscribers.

Anthropic didn’t share a release date for Haiku, saying only that it’ll be “available soon.”

The Claude 3 models have a 200,000-token context window. One token is equivalent to four characters, or about three-quarters of a word in English.

Think of it this way: Leo Tolstoy’s War and Peace is 587,287 words long. That’s about 440,465 tokens. That means Claude 3 can recall about half the book per session.

However, Anthropic said the model is capable of accepting inputs of more than 1 million tokens and that the company “may make this available to select customers who need enhanced processing power.”

By way of comparison, Google’s latest Gemini models can process up to 1 million tokens, while GPT-4 models have context windows of about 8,000 to 128,000 tokens.

Haiku versus Sonnet versus Opus

While Anthropic recommends Haiku for customer interactions, content moderation and tasks like inventory management, Sonnet, it says, “excels at tasks demanding rapid responses, like knowledge retrieval or sales automation.”

Opus, on the other hand, can plan and execute complex actions across APIs and databases and perform research and development tasks like brainstorming and hypothesis generation and even drug discovery, as well as advanced analysis of charts and graphs, financials and market trends, according to the company.

The Claude 3 models can process visual formats like photos, charts and graphs “on par with other leading models,” Anthropic said.

Claude 3 also shows fewer biases than its predecessors, according to the Bias Benchmark for Question Answering, a collection of question sets from academics at New York University that evaluates models for social biases against people in protected classes.

Anthropic also noted it has multiple teams focused on risks including misinformation, child sexual abuse material, election interference and “autonomous replication skills.” This means that with Claude 3, we may be less likely to see the kinds of unsettling responses that chatbots have been known to provide from time to time.

Red team evaluations, or those that seek out vulnerabilities in AI, showed that the models “present negligible potential for catastrophic risk at this time,” an Anthropic blog post said.

“As we push the boundaries of AI capabilities, we’re equally committed to ensuring that our safety guardrails keep apace with these leaps in performance,” the post added. “Our hypothesis is that being at the frontier of AI development is the most effective way to steer its trajectory towards positive societal outcomes.”

Anthropic said it plans to “release frequent updates” to the Claude 3 models “over the next few months.”

Editors’ note: CNET is using an AI engine to help create some stories. For more, see this post.


Source link