OpenAI is best known for its advanced large language models (LLMs) used to power some of the most popular AI chatbots, such as ChatGPT and Copilot. Multimodal models can take chatbot capabilities to new heights by unleashing a new range of visual applications, and OpenAI just made one available to developers.
On Tuesday, via an X (formerly Twitter) post, OpenAI announced that GPT-4 Turbo with Vision, the latest GPT-4 Turbo model with vision capabilities, is now generally available to developers via the OpenAI API.
Also: How to use ChatGPT
This latest model maintains GPT-4 Turbo’s 128,000-token window and knowledge cutoff from December 2023. The main difference is its vision capabilities, which allow it to understand images and visual content.
Before GPT-4 Turbo with Vision was made available, developers had to call on separate models for text and images. Now, developers can just call on one model that can do both, simplifying the process, and opening the doors for a wide range of use cases.
Also: The best AI image generators of 2024: Tested and reviewed
OpenAI shared some ways developers are already using the model, and they are pretty fascinating.
For example, Devin, an AI software engineering assistant, leverages GPT-4 Turbo with Vision to better assist with coding. The health and fitness app, Healthify, uses GPT-4 Turbo with Vision to scan photos of users’ meals and give nutritional insights through photo recognition. Lastly, Make Real uses GPT-4 Turbo with Vision to convert a user’s drawing into a working website.
While the GPT-4 Turbo with Vision model is not yet available inside ChatGPT or to the general public, OpenAI teased that it will soon be available in ChatGPT. If you are a developer looking to get started with OpenAI’s GPT-4 Turbo with Vision API, you can learn how to get started here.