Home Technology Anthropic unveils Claude 3, surpassing GPT-4 and Gemini Ultra in benchmark tests

Anthropic unveils Claude 3, surpassing GPT-4 and Gemini Ultra in benchmark tests

04.03.2024

Anthropic, a leading artificial intelligence startup, unveiled its Claude 3 series of AI models today, designed to meet the diverse needs of enterprise customers with a balance of intelligence, speed, and cost efficiency. The lineup includes three models: Opus, Sonnet, and the upcoming Haiku.

The star of the lineup is Opus, which Anthropic claims is more capable than any other openly available AI system on the market, even outperforming leading models from rivals OpenAI and Google.

“Opus is capable of the widest range of tasks and performs them exceptionally well,” said Anthropic cofounder and CEO Dario Amodei in an interview with VentureBeat.

Amodei explained that Opus outperforms top AI models like GPT-4, GPT-3.5 and Gemini Ultra on a wide range of benchmarks. This includes topping the leaderboard on academic benchmarks like GSM-8k for mathematical reasoning and MMLU for expert-level knowledge.

VB Event

The AI Impact Tour – NYC

We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

Request an invite

“It seems to outperform everyone and get scores that we haven’t seen before on some tasks,” Amodei said.

While companies like Anthropic and Google have not disclosed the full parameters of their leading models, the reported benchmark results from both companies imply Opus either matches or surpasses major alternatives like GPT-4 and Gemini in core capabilities.

This, at least on paper, establishes a new high watermark for commercially available conversational AI.

Engineered for complex tasks requiring advanced reasoning, Opus stands out in Anthropic’s lineup for its superior performance.

Mid-range, speedy options are available

Sonnet, the mid-range model, offers businesses a more cost-effective solution for routine data analysis and knowledge work, maintaining high performance without the premium price tag of the flagship model.

Meanwhile, Haiku is designed to be swift and economical, suited for applications such as consumer-facing chatbots, where responsiveness and cost are crucial factors.

Amodei told VentureBeat he expects Haiku to launch publicly in a matter of “weeks, not months.”

New visual capabilities unlock new use cases

Each of the models unveiled today supports image input, a feature in high demand, especially for applications like text recognition in images.

“We haven’t focused as much on output modalities, because there’s less demand for that on the enterprise side,” Anthropic president and cofounder Daniela Amodei told VentureBeat, highlighting the company’s strategic focus on the most sought-after features by businesses.

In addition, Claude 3 models demonstrate sophisticated computer vision abilities on par with other state-of-the-art models. This new modality opens up use cases where enterprises need to extract information from images, documents, charts and diagrams.

“A lot of [customer] data is either highly unstructured, or in some sort of visual format,” explained Daniela. “Just the process of having to manually copy that information to even be able to have it interact with a generative AI tool is quite cumbersome.”

Fields like legal services, financial analysis, logistics and quality assurance could benefit from AI systems that understand real-world visuals and text alike.

Walking the tightrope of bias in AI

Anthropic’s announcement comes on the heels of controversy surrounding Google’s new chatbot Gemini, which highlighted the difficulties tech companies face in releasing models that avoid perpetuating social bias.

Last week, people found that prompting Gemini to generate historical images resulted in depictions that appeared to overcorrect racial portrayals. For example, asking for pictures of vikings or Nazi soldiers produced images of racially diverse groups that are unlikely to reflect historical reality.

Google responded by disabling Gemini’s image generation capabilities and issuing an apology, saying it had “missed the mark” in trying to increase diversity. But experts say the situation illustrates the constant balancing act around bias in AI.

Constitutional AI helps but isn’t perfect

Anthropic cofounder Dario Amodei emphasized in his interview with VentureBeat the difficulty of steering AI models, calling it an “inexact science.” He said the company has teams dedicated to assessing and mitigating various risks from their models.

“Our hypothesis is that being at the frontier of AI development is the most effective way to steer the trajectory of AI development towards a positive outcome for society,” said Dario.

However, Anthropic cofounder Daniela Amodei acknowledged that perfectly bias-free AI is likely unattainable with current methods.

“It’s almost impossible to create a perfectly neutral, generative AI tool, I think, both technically, but also because not everybody even agrees on what neutral is,” she said.

Part of Anthropic’s strategy is an approach called Constitutional AI, where models are aligned to follow principles defined in a “constitution.” But Dario Amodei admits even this technique isn’t perfect.

“We aim for models to be fair and ideologically and politically neutral, [but] you know, we haven’t got it perfectly,” he said. “I don’t think, you know, anyone has got it perfectly.”

Nonetheless, Dario believes Anthropic’s constitution of widely agreed upon values helps safeguard against skewing models towards any partisan agenda, in contrast to accusations facing Gemini.

“Our goal is not to promote any particular political or ideological viewpoint,” he said. “We want our models to be suitable for everyone.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link

Anthropic unveils Claude 3, surpassing GPT-4 and Gemini Ultra in benchmark tests

VB Event

Mid-range, speedy options are available

New visual capabilities unlock new use cases

Walking the tightrope of bias in AI

Constitutional AI helps but isn’t perfect

Don’t Beat Up Your Opponents Too Badly While Smiling

Financial Peace University vs. True Financial Freedom vs. Crown Financial MoneyLife

10 Gen X Parenting Styles That Millennials Are Rejecting

Why we’re adding far fewer jobs than the White House claims

FDA warns Dollar Tree about failure to remove recalled children’s snack