Is the New Claude 3 Language Model Really Stronger Than GPT-4?

At the beginning of March, Anthropic set another milestone in the world of artificial intelligence with the release of Claude 3. This new version of the Large Language Model (LLM) demonstrates impressive capabilities in areas such as analysis, forecasting, content creation, code generation and multilingual communication. 🌍

Who is Anthropic?

Anthropic is a US-based AI research company dedicated to the development of advanced and ethical AI systems. With a team of renowned experts and former members of OpenAI, Anthropic is a leader in the research and application of Large Language Models (LLMs). The company emphasizes safety, transparency and accountability in the development of AI technologies that have the potential to positively shape the future of humanity.

Near-human understanding and fluency 🧠

Back to Claude 3, Anthropic itself credits its latest AI system, Claude-3, with a near-human level of understanding and fluency, even in complex tasks. It leads the development of general intelligence and opens up completely new possibilities for customer communication.

Compared to other AI systems, Anthropic writes, Claude 3 Opus “outperforms its peers on most common benchmarks for AI systems, including elementary school-level expert knowledge (MMLU), college-level reasoning (GPQA), basic math (GSM 8 K)”.

Comparison of Claude 3 with similar AI systems in several benchmarks. Source: https://www.anthropic.com/news/claude-3-family

You can see that Claude 3 Opus beats both GPT-4 and Google’s Gemini 1.0 Ultra in all benchmarks. In addition, Claude 3 Haiku, the “smallest” model from Anthropic, beats GPT-4 in 2 benchmarks. Let’s now take a closer look at the different Claude 3 models.

Speed and cost efficiency 🚀

The three Claude 3 models - Opus, Sonnet and Haiku - offer different performance profiles for different use cases:

Comparison of the three Claude 3 models in terms of intelligence and cost. Source: https://www.anthropic.com/news/claude-3-family

Claude 3 Haiku is the fastest and most cost-effective model in its class and can analyze a data-rich research report with graphs in less than three seconds. It answers simple queries with unmatched speed and enables seamless AI experiences that mimic human interactions.
Claude 3 Sonnet offers the ideal balance between intelligence and speed and is twice as fast as Claude 2 and is ideal for tasks that require quick answers, such as knowledge retrieval or sales automation.
Claude 3 Opus is Anthropic’s most powerful AI model and masters even the most complex tasks with near-human understanding and fluency. It excels with high accuracy, visual capabilities, multilingualism and the processing of open questions, while refusing harmless queries much less frequently. Opus can be used to implement the most demanding use cases in companies and explore the limits of what is possible in generative AI.

Visual capabilities 📊

Claude 3 has advanced visual capabilities and can process a variety of formats such as photos, diagrams and technical drawings. This opens up new opportunities for companies, as often up to 50% of their knowledge bases are in visual formats.

Fewer refusals and higher accuracy ✅

The Claude 3 models demonstrate improved understanding of queries and refuse harmless prompts significantly less often than previous generations. Opus demonstrates a two-fold improvement in accuracy on challenging open-ended questions and is significantly less likely to provide incorrect answers.

Compared to Claude 2.1, the Claude 3 models refuse to dispense much less frequently. Source: https://www.anthropic.com/news/claude-3-family

”Needle In A Haystack” test

A particularly impressive example of the performance of Claude 3 Opus is the “Needle In A Haystack” test. In this test, a random and context-free paragraph (e.g. about a pizza topping) is inserted into a long text (e.g. a computer science essay). The AI system is then asked about the random paragraph and should provide information about it. Since many AI systems are better at recognizing the beginning and end of a text input, this is a way to measure the accuracy and efficiency of the model in finding relevant information in a large amount of data. Opus achieved a near-perfect hit rate of over 99%.

Claude 3 Opus offers a context window of 200,000 and can be expanded to up to 1 million tokens. Source: https://www.anthropic.com/news/claude-3-family

The model was even able to recognize limitations of the scoring method itself by detecting that some answers appeared to have been artificially inserted into the original text by humans. This result underlines Claude 3 Opus’ exceptional ability to identify relevant information in large amounts of data and process it contextually.

Integration into your customer communication 🔄 Whether Claude 3 is better than GPT-4 probably depends on the user and the use case. With the LoyJoy Conversational Platform, you can seamlessly integrate the power of Claude 3 into your existing communication channels. Whether on your website, in social media or in messaging apps - your customers benefit from intelligent conversations everywhere. The focus is always on data protection and security in accordance with the GDPR. 🔒

Another important note on GDPR compliance: Claude 3 is currently only available in the USA, but will soon also be available from Frankfurt. GDPR-compliant use is already possible today via the standard contractual clauses. Those who want to play it safe and host exclusively in the EU will have to be patient. We will offer Frankfurt as a hosting location as soon as this option is available.

Now is the perfect time to get in! ⏰ Claude 3 opens up completely new opportunities to transform customer communication and stand out from the competition. Contact us today to find out how you can use the latest Generative AI for your business. Let’s shape the future of conversation together! 💬

— by Steffen Wichtrup