Stay informed with free updates
Simply sign up to the Science myFT Digest — delivered directly to your inbox.
The writer is a science commentator
Our lives, like stories, follow narrative arcs. Each one unfolds uniquely in chapters bearing familiar headings: school, career, moving home, injury, illness. Each storyline, or life, has a beginning, a middle and an unpredictable end.
Now, according to scientists, each life story is the chronicle of a death foretold. By using Denmark’s registry data, which contains a wealth of day-to-day information on education, salary, job, working hours, housing and doctor visits, academics have developed an algorithm that can predict a person’s life course, including premature death, in much the same way that large language models (LLMs) such as ChatGPT can predict sentences. The algorithm outperformed other predictive models, including actuarial tables used by the insurance industry.
That our complex existences can be parsed like scraps of text is both exhilarating and disconcerting. While we know that a generous income correlates with longer life expectancy, linking vast amounts of different data could unmask other ways in which social factors affect health. That could inform policymakers seeking to improve our odds of living longer, healthier lives.
On the minus side, there is something almost absurdly reductive about the idea of a DeathGPT. Each bead on the necklace of life — attending a class, a salary increase, losing a parent — feels too personal to power a predictable data set. But, in an age of big data, and AI to mine it, we will need to accept that those deeply felt qualitative experiences can be captured quantitatively in ways that, within error bars, sketch out individual destiny.
Sune Lehmann, from the Technical University of Denmark, who led the research published last month in Nature Computational Science, does not find the idea discombobulating. “I think the similarity between text and lives is deep and multi-faceted,” he told me by email. “It makes sense to me that our algorithm can predict the next step in human lives.”
Both language and life are sequences. The researchers, drawn from the University of Copenhagen and Northeastern University in Boston, exploited that similarity. First, they compiled a “vocabulary” of life events, creating a kind of synthetic language, and used it to construct “sentences”. A sample sentence might be: “During her third year at secondary boarding school, Hermione followed five elective classes.”
Just as LLMs mine text to figure out the relationships between words, the life2vec algorithm, fed with the reconstituted life stories of Denmark’s 6mn inhabitants between 2008 to 2015, mined these summaries for similar relationships.
Then came the moment of reckoning: how well could it apply that extensive training to make predictions from 2016 to 2020? Among algorithm test runs, the researchers studied a sample of 100,000 people aged 35-65, half of whom are known to have survived and half of whom died during that period. When prompted to guess which ones died, life2vec got it right 79 per cent of the time (random guessing gives a 50 per cent hit rate). It outperformed the next best predictive models, Lehmann said, by 11 per cent.
While the paper claims that “accurate individual predictions are indeed possible”, the algorithm furnishes a probability of death over a certain period rather than an exact date. There are caveats: what applies in Denmark might not apply elsewhere, and the algorithm encodes biases in the training data. Even so, given its potential to fine-tune risk prediction, the impact on the insurance industry will be worth watching. For their part, the researchers don’t want their work to be used by insurers, and are keeping the algorithm and data under wraps for now.
But more exciting than the results, the researchers stress, is that life2vec is general rather than task-specific. In existing predictive models, researchers must pre-specify variables that matter, such as age, gender and income. In contrast, this approach swallows all the data and can independently alight on relevant factors (it spotted that income counts positively for survival, for example, and that a mental health diagnosis counts negatively). This could point researchers to previously unexplored influences on health — and may uncover new links between apparently unrelated patterns of behaviour.
One of Lehmann’s growing concerns is privacy; he points out that companies such as Google are assembling muscular prediction machines, using an abundance of personal data culled from the internet.
This is an era of unparalleled predictability in human lives — and an era of unparalleled power for those who can read our stories before we have lived them.