Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.


Today, Cognition, a recently formed AI startup backed by Peter Thiel’s Founders Fund and tech industry leaders including former Twitter executive Elad Gil and Doordash co-founder Tony Xu, announced a fully autonomous AI software engineer called “Devin”.

While there are multiple coding assistants out there, including the famous Github Copilot, Devin is said to stand out from the crowd with its ability to handle entire development projects end-to-end, right from writing the code and fixing the bugs associated with it to final execution. This is the first offering of this kind and even capable of handling projects on Upwork, the startup has demonstrated.

The announcement of Devin marks a significant shift in the AI-assisted development space, giving engineers a full-fledged AI worker for their projects, rather than a copilot that could merely write barebones code or suggest snippets.

However, as of now, Devin remains non-public, with the company opening access only to a select few customers, including Bloomberg journalist Ashlee Vance, who wrote about his experience using it here.

VB Event

The AI Impact Tour – Boston

We’re excited for the next stop on the AI Impact Tour in Boston on March 27th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on best practices for data integrity in 2024 and beyond. Space is limited, so request an invite today.


Request an invite

What exactly can Devin do?

In a blog post today on Cognition’s website, Scott Wu, the founder and CEO of Cognition and an award-winning sports coder, explained Devin can access common developer tools, including its own shell, code editor and browser, within a sandboxed compute environment to plan and execute complex engineering tasks requiring thousands of decisions. 

The human user simply types a natural language prompt into Devin’s chatbot style interface, and the AI software engineer takes it from there, developing a detailed, step-by-step plan to tackle the problem. It then begins the project using its developer tools, just like how a human would use them, writing its own code, fixing issues, testing and reporting on its progress in real-time, allowing the user to keep an eye on everything as it works.

If something doesn’t look right to the human observer, the user can also jump into the chat interface and give the AI a command to fix it. This, Cognition says, enables engineering teams to delegate some of their projects to the AI and focus on more creative tasks that require human intelligence.

In this way, Devin offers a new paradigm that may be a glimpse of the way all software development — and computer work generally — may be done in the near-future: by AI workers overseen by human supervisors/users.

Capable of handling a wide range of dev tasks

According to demos shared by Wu, Devin is capable of handling a range of tasks in its current form. This includes common engineering projects like deploying and improving apps/websites end-to-end and finding and fixing bugs in codebases to more complex things like setting up fine-tuning for a large language model using the link to a research repository on GitHub or learning how to use unfamiliar technologies.

In one case, it learned from a blog post how to run the code to produce images with concealed messages. Meanwhile, in another, it handled an Upwork project to run a computer vision model by writing and debugging the code for it.

In the SWE-bench test, which challenges AI assistants with GitHub issues from real-world open-source projects, the AI software engineer was able to correctly resolve 13.86% of the cases end-to-end – without any assistance from humans. In comparison, Claude 2 could resolve just 4.80% while SWE-Llama-13b and GPT-4 could handle 3.97% and 1.74% of the issues, respectively. All these models even required assistance, where they were told which file had to be fixed.

Performance of Devin in SWE-bench test

Core technology remains undescribed

AI in software development is no new feat. There have been tools in this space for quite some time, right from the popular GitHub Copilot and StarCoder to Replit, which has a few small AI coding models on Hugging Face, and Codeium, which recently nabbed $65 million series B funding at a valuation of $500 million.

However, most of these offerings have largely focused on using AI to assist with coding. They can generate barebones code from text prompts, summarize it with relevant IDE context or retrieve snippets, accelerating the workflow of the team. With Devin, Cognition AI appears to be going a step (or multiple steps) further, giving a full-fledged AI worker to handle entire projects.

While the tool remains to be tested, its ability to handle multiple steps – while staying on track – to complete a software engineering project is the biggest unique selling point. Cognition has not shared how exactly it has achieved this feat and whether it is using its own proprietary model or that from a third party, but it does note that the work is the result of its “advances in long-term reasoning and planning.”

Currently, the company is in the process of ramping up capacity and offering early access to Devin only to select users. It says interested parties looking to augment their engineering work can reach out via email to gain access. Broader access is expected to open up at a later stage.

Cognition also notes on its website that coding is “just the beginning” which seems to indicate it may tap its reasoning advances to launch similar AI agents/workers for other disciplines as well. The company has received $21 million in funding so far.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link