Artificial intelligence (AI) scientists are increasingly finding ways to break the security of generative AI programs, such as ChatGPT, so it was only a matter of time before someone applied the same cybersecurity techniques of vetting users to vetting sources of data.
On Wednesday, startup IndyKite of San Francisco unveiled its bid to verify what goes into Gen AI as the grist of its predictions.
The software “helps to ensure ‘baked-in’ trustworthiness of leveraged data in any business or analytics model, by employing an identity-centric approach to data where trust is paramount,” said IndyKite.
Also: ChatGPT can leak training data, violate privacy, says Google’s DeepMind
Three-year-old IndyKite has the pedigree of the identity management area of cybersecurity best associated with Okta, more formally known as “identity and access management,” or IAM.
The IAM field has broadened in recent years to take on challenges beyond just securing enterprise networks and applications. For example, Google has filed patents pertaining to the application of IAM to Web 3, the blockchain-based distributed systems governing everything from the Internet of Things to cryptocurrencies. The technology is designed to vet access to sensitive data such as consumer medical histories without the data being copied from one database to another.
Also: Generative AI can easily be made malicious despite guardrails, say scholars
While details on the IndyKite system are thus far limited, it’s easy to see how one can extend access to consumer data via identity to access to data sources for Gen AI.
Generative AI such as OpenAI’s ChatGPT has been the focus of controversy because of the way that the program is trained on vast data sets compromising multiple hundreds of gigabytes worth of data.
The data sets are the subject of multiple lawsuits by parties including The New York Times, alleging infringement of copyright.
OpenAI has said it will indemnify enterprise users of its software against suits.
In addition to infringement issues, creators of Gen AI are dealing with questions of where authoritative answers will come from. The approach known as “retrieval-augmented generation” seeks to connect large language models such as GPT-4 with databases as an oracle of truth. The RAG approach, however, presents its own challenges of dealing with data drift, which can contribute to a neural network model’s bias.
Also: Hybrid AI and apps will be in focus in 2024, says Goldman Sachs CIO
In theory, all those issues could be dealt with by methods that guarantee the provenance of data before it is ingested in training the programs.
IndyKite founder Lasse Andresen is a serial entrepreneur who previously founded ForgeRock, a competitor to Okta in identity management. ForgeRock was sold in August to private equity firm Thoma Bravo for $1.8 billion.
The IndyKite software leverages the popular Neo4j graph database management software, which builds a knowledge graph of the discovered relationships in an enterprise. “IndyKite ensures accurate and rich information across the corporate knowledge graph using Neo4J as a data backend,” states IndyKite.
IndyKite has received a total of $10.5 million in seed financing from Molten Ventures, Alliance Ventures, and SpeedInvest, according to Crunchbase.