NIX Solutions: Anthropic Introduces Claude 3.5 Sonnet

Last spring, Anthropic revealed plans to develop a “next-generation self-learning AI algorithm” to handle various office tasks autonomously, aiming to automate large parts of the economy. Today, the company launched version 3.5 of its Claude Sonnet model, which interacts with desktop applications using the Computer Use API, mimicking human actions like keystrokes, clicks, and mouse gestures.

“We trained Claude to see what’s happening on the screen, and then use the available software tools to perform tasks,” Anthropic explains. “Once a developer grants Claude access to a software tool, Claude interprets screenshots, calculates the necessary cursor movements, and executes clicks or keystrokes accordingly.”

NIX Solutions

AI Agents and Their Market Potential

While the concept of AI agents—AI tools that automate software use on PCs—is not new, Anthropic’s release highlights a growing trend. These agents are evolving as essential tools in automation. Microsoft, Salesforce, and OpenAI offer similar solutions, while emerging players such as Relay, Induced AI, and Automat are also contributing to the AI-agent landscape.

Several specialized agents demonstrate the broad applications of this technology. For example, Rabbit, a consumer gadget startup, created an agent that autonomously purchases tickets online. Meanwhile, Adept, now acquired by Amazon, focuses on models trained to browse websites and navigate software interfaces. Twin Labs uses off-the-shelf models, including OpenAI’s GPT-4o, to automate routine desktop processes.

According to a Capgemini survey, 10% of businesses have already adopted AI agents, with 82% planning to implement them within three years. Analysts suggest that AI agents offer companies new ways to monetize their significant AI investments.

How Claude 3.5 Sonnet Works

Anthropic refers to its AI agent system as an “action-execution layer,” enabling it to perform commands at the desktop level. With web-browsing capabilities, Claude 3.5 Sonnet can use any application or website to accomplish tasks.

The model relies on user prompts to guide actions. An Anthropic spokesperson explains: “Users provide prompts, such as ‘use data from my computer and the web to fill out this form.’ Based on these prompts, Claude generates specific commands—like cursor movements or typing actions—and completes the task accordingly. Users can adjust access as needed to control the process.”

Anthropic claims Claude 3.5 Sonnet outperforms other models, particularly in coding tasks, scoring higher than OpenAI’s latest model on the SWE-bench Verified benchmark. The model also features self-correcting capabilities and can execute complex workflows with multiple steps.

However, limitations exist. Claude struggles with basic functions such as scrolling and zooming, sometimes missing transient notifications or updates. During testing, it completed fewer than half of the tasks required for booking a flight. In a ticket refund scenario, the model failed approximately one-third of the time.

Security Concerns and Risk Mitigation

AI agents raise significant security concerns, especially those capable of interacting directly with desktop software. Even models without desktop capabilities, such as GPT-4o, have demonstrated the ability to engage in multi-step malicious activities, such as ordering fraudulent documents on the dark web. Researchers have also achieved high success rates for harmful actions using jailbreaking techniques.

A model with desktop functionality could potentially exploit software vulnerabilities, compromise user data, or intercept information stored in applications. Given its network access, such a model could also become a powerful tool for cyber attackers.

Anthropic acknowledges these risks but believes it is better to expose users to current, relatively secure models to learn from real-world interactions. This, the company argues, enables developers to improve safeguards over time.

Measures to Ensure Safe Use

Anthropic has implemented several safety protocols for Claude 3.5 Sonnet. To reduce risks, the company did not train the model on screenshots or user prompts and restricted its internet access during training. Additionally, it developed classifiers to block high-risk activities, such as posting on social media, creating online accounts, or engaging with government portals.

If necessary, Anthropic can restrict access to specific features to prevent misuse, including blocking activities related to spam or fraud. The company also stores all screenshots taken by the Computer Use API for a minimum of 30 days, which may introduce additional privacy risks. However, Anthropic has not clarified under what conditions these screenshots would be shared with third parties, such as law enforcement agencies.

“There is no perfect solution,” Anthropic admits. “We will continue to monitor and enhance our security protocols to balance functionality with safety. Users are advised to take precautions, such as isolating Claude from sensitive data, to minimize risks.”

Future Releases and Developments

Alongside Claude 3.5 Sonnet, Anthropic announced the upcoming release of Claude 3.5 Haiku, which focuses on speed, precise tool usage, and improved instruction following. Haiku will initially be a text-only model, with plans to expand into multimodal capabilities that can process both text and images. This version aims to support specialized tasks and personalized experiences by analyzing data such as purchase histories or inventory records.

Anthropic also teased the future release of the Claude 3.5 Opus model, adds NIX Solutions. “Each model in the Claude 3 family has unique strengths tailored to different customer needs,” the company noted. “Claude 3.5 Opus is part of our roadmap, and we will be sure to share more details with you as soon as we can.”

Developers interested in testing the Computer Use API can do so through Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI platform.