Home Technology Apple research paper reveals AI that understands visual elements

Apple research paper reveals AI that understands visual elements

03.04.2024

Researchers at Apple have reportedly developed a new AI system called ReALM (Reference Resolution As Language Modeling), that can read and understand visual elements, essentially being able to decipher on-screen prompts.

The research paper suggests that the new model reconstructs the screen using “parsed on-screen entities” and their locations in a textual layout. This essentially captures the visual layout of the on-screen page, and according to the researchers, when a model is specifically fine-tuned for this approach, it could outperform even GPT-4, and lead to more natural and intuitive interactions.

“Being able to understand context, including references, is essential for a conversational assistant,” reads the research paper. “Enabling the user to issue queries about what they see on their screen is a crucial step in ensuring a true hands-free experience in voice assistants.” The development could one day make its way to Siri, helping it become more conversational and “true hands-free.”

While it is unlikely that we’ll hear more about ReALM this year, we should be learning more about AI-related developments, including features coming to Siri at WWDC 2024 on June 10th.

Apple research paper reveals AI that understands visual elements

Don’t Beat Up Your Opponents Too Badly While Smiling

Financial Peace University vs. True Financial Freedom vs. Crown Financial MoneyLife

10 Gen X Parenting Styles That Millennials Are Rejecting

Why we’re adding far fewer jobs than the White House claims

FDA warns Dollar Tree about failure to remove recalled children’s snack