Home Finance Philip Tetlock on teaming up with AI

Philip Tetlock on teaming up with AI

04.04.2024

This article is an onsite version of our Unhedged newsletter. Premium subscribers can sign up here to get the newsletter delivered every weekday. Standard subscribers can upgrade to Premium here, or explore all FT newsletters

Good morning. Is the rally stalling? So far this week, the S&P 500 is off 2 per cent. More Federal Reserve officials are putting zero rate cuts in 2024 on the table. Chicago Fed president Austan Goolsbee yesterday said the “biggest danger” was housing inflation that does not fall, as it is widely expected to. Steady our nerves: robert.armstrong@ft.com and ethan.wu@ft.com.

Friday Interview: Philip Tetlock

Philip Tetlock is one of the most influential social scientists in the world. His work ranges widely: he has written extensively about forecasting, judgment, politics and moral values. But he is best known for two books. Expert Political Judgement found that many experts were systematically overconfident about the accuracy of their own predictions. Superforecasters posited that some people are in fact better at predicting the future than others, and that what sets those people apart is a straightforward combination of open-mindedness, attention to detail, good habits, and a bit of technique.

This week he spoke to Unhedged about the implication of his works for market participants — and about the rise of AI.

Unhedged: Our readership will be especially interested in what you think about economic experts. Can we take your work to mean economists simply don’t have much useful to say about, for example, what’s going to happen to the economy in 2025?

Philip Tetlock: It’s an understandable reading, but it’s not accurate. It paints with too broad a brush. I don’t like talking about experts at large; I don’t like talking about economists at large. It is true that in some of the studies that I ran many experts, mostly in geopolitics, did not fare all that well. They thought they knew more about the future than they did. One of people’s big takeaways is a know-nothing view, that experts are clueless.

There is some truth to that: there is a hubris that does infect many expert communities. The other thing that reinforced the view that I’m anti-expert is the IARPA tournaments which ran 2011 and 2020 [the Intelligence Advanced Research Projects Activity is a research organisation operated by the US Office of the Director of National Intelligence]. We were in competition in some of those years with intelligence analysts. The question was: could the amateur superforecasters do better than the seasoned intelligence analysts who had access to classified information? That also turned out to work out rather poorly for the subject matter experts. The superforecasters were winning by 20 per cent plus.

Unhedged: Let’s sharpen the question. Recent experience with a very fundamental economic variable, inflation, seemed to demonstrate that even over reasonably short periods of time, economic expertise provided a very limited edge in predicting inflation versus a very mechanical, rule-based approach. If economists can’t predict something as fundamental as inflation, what do they know?

Tetlock: It’s a deep question. There are some fields in which knowledge translates much more directly into predictive power than in others. Geophysicists, as far as I can tell, have a very accurate model of plate tectonics, grounded in very solid physical science. They can predict roughly where earthquakes are going to occur. But they can’t really predict when.

Unhedged: In fields where knowledge doesn’t translate very easily into predictive power, does your work tell us how we should proceed?

Tetlock: It certainly suggests we should become increasingly sceptical the further out experts claim to be able to see. But just to show you that I don’t like painting with a broad brush, I just wrote a paper which reports on some of the few 25-year forecasts I collected for Expert Political Judgment in the early 1990s. I wouldn’t say we have a great database, but there were enough forecasts to be able to draw some statistical conclusions.

There were two categories of forecasts. One was about border changes, either as a result of invasion or as a result of internal secession, and the other was about nuclear proliferation. The border change predictions turned out more or less as we expected. The people who had expertise in that domain weren’t really much better than a very casual reader of a serious newspaper would have been.

There were really only two episodes of nuclear proliferation that occurred in the 25-year time period: North Korea and Pakistan. India went officially nuclear but they had been nuclear before. And the nuclear proliferation experts did remarkably well. For someone whose early public intellectual reputation hinged on experts being stupid, it is kind of anomalous that I would be writing a paper like this in 2023. But I did, and it illustrates the peril of painting with a broad brush.

Unhedged: Why should nuclear proliferation be different?

Tetlock: This ties into your question about knowledge. I think the prerequisites for creating a bomb are very well understood. The economic and technological levers that the major powers have to put pressure on aspiring nuclear powers are very substantial. So it’s going to be a heavy lift to become a nuclear power. You have to be willing to withstand a lot of international censure. North Korea, of course, is in a category unto itself. And it became clear that the experts were anticipating all this quite accurately, even the approximate timeframe. And Pakistan became a nuclear power. So there were those two exceptions, but a lot of people thought there would be a lot of other nuclear powers: they thought that Iraq would become a nuclear power, Syria, Libya. With the end of the cold war some neorealists thought that Germany and Japan would become nuclear because they wouldn’t trust the US nuclear umbrella anymore. There were a lot of false positives — but they tended to be false positives by people who were not nuclear proliferation experts. Proliferation experts resisted the siren calls.

Unhedged: The last time we spoke, you said that in liquid markets, superforecasters don’t add much value. Does that remain true? Is the market itself a superforecaster?

Tetlock: Well, the efficient market hypothesis is what suggests it is a superforecaster. And I think there’s some truth to that. I don’t think I’ve seen evidence of superforecasters with any reasonable consistency beating deep liquid markets. I have seen them beat toy prediction markets that aren’t very deep or liquid. You’ve probably seen the anomalies that pop up in those.

Unhedged: What about a question like, ‘Will the sovereign debt of country X fall into default within three years?’

Tetlock: They tend to be pretty good at questions like that — it’s right in their wheelhouse. You can use the longitudinal cross-sectional base rates, take the individuating information about the country into account and crank out a forecast. Argentina, of course, is an interesting case because it has this new president; the question there is whether all bets are off now.

Unhedged: How is AI going to change forecasting?

Tetlock: I’m not an AI expert, but it is an extremely impressive technology. It has some serious problems, yet it has an astonishing capacity to answer broad categories of questions extremely well. I have written a couple of papers on how we prompt large language models to think like superforecasters, or prompt them to think like particular schools of thought. For example, we might say, “simulate more hawkish and more dovish views on monetary policy of central banks in OECD countries”. The first test is whether the machine can pass an ideological Turing test — whether it can summarise those views so well that the observer feels it has characterised rather than caricatured them. That’s not a very demanding test for LLMs, because they just suck up words and rearrange them.

The more challenging test is to ask them about counterfactuals — counterfactuals that have not yet been widely discussed but are implicit in the belief system of the school of thought, its what-if beliefs.

Unhedged: And is the idea that in a way a forecast is analogous to a counterfactual judgment?

Tetlock: Very much so. It’s a conditional, an if-then. Let’s say we do succeed at reproducing the implicit counterfactual beliefs of various schools of thought on various topics — whether about inflation or a geopolitical outcome. Then let’s take this system that seemed to work well as a representation of this school of thought about what happened between 1990 and 2025, and use it in a forecasting tournament that stretches from 2025 to 2050. You take the nuclear proliferation experts who did pretty well in the last 25 years, get the LLMs to reproduce their belief systems, and then you put the LLMs in competition with them. The LLMs become participants in the tournament.

You can also ask them to get more creative. You ask the LLM: “When you look back on the way the world looked in 1990, and you look at the various schools of thought that existed, what do you think would be the optimal combination of viewpoints that would allow you to make really good predictions between 1990 and 2024? And how would you apply that combination going forward, for 2020-2050?” I think the AIs are likely to make more foxlike [ideologically heterogeneous, carefully weighted, provisional] than hedgehog-like [ideologically committed, deductive] judgments.

Unhedged: It sounds like you’re saying the AIs are just one additional set of participants in a collaborative structure.

Tetlock: That’s right. I think that “superintelligence” does have a meaning here, if the LLMs can do a better job of synthesising human schools of thought than humans can do. They can take those synthetic insights from the previous time period, apply them to the next time period and outperform both the humans and the LLMs that just represent single schools of thought. That would mean they’re operating at a superintelligent level.

Unhedged: This touches on your point that there’s an ideal mix between collaboration and competition, so-called adversarial collaboration.

Tetlock: You saying that makes me sad because it reminds me of Danny Kahneman [the renowned psychologist, author of Thinking Fast and Slow, who died in March]. He was a big proponent of this sort of collaboration. Being Danny, he was such a pessimist, and didn’t think adversarial collaboration worked very well. He didn’t think it could change people’s minds very much, based on his experience running experiments. But sometimes they were successful. My wife Barb was involved with one that Cass Sunstein wrote about in the New York Times recently. It was about the link between happiness and money. Each side did change its mind to some degree.

That kind of synthetic thinking [is where AI could come into play]. You could ask LLMs to broker adversarial collaborations. You could even ask LLMs representing rival schools of thought to have an adversarial collaboration between each other and see if they produced better syntheses than humans, who are maybe more stubborn and ego-driven.

Unhedged: What are the conditions under which adversarial collaboration can help and the conditions where it’s not helpful? Adversarial collaboration seems like what investment committees do. But one of your recent papers looked at whether adversarial collaboration can bridge people’s views on AI existential risk, and found that fundamental world views stopped viewpoints from converging.

Tetlock: We’re not giving up on adversarial collaboration, but [that research] was indeed sobering. We may be underestimating ad-collab from that paper. Some of the participants did generate “crux questions”, near-term questions that, once resolved, would induce each side to change its mind to some degree about the long-term outcome. And none of those crux questions is resolved yet.

There are reasons to suppose that human participants weren’t doing a very good job generating crux questions. Because in retrospect, the experts who were more concerned about AI risk changed their judgments not because of arguments from more AI-sceptical superforecasters, but because of new regulatory events in the UK and US coinciding with the study. In other words, they became less pessimistic about AI existential risk when they saw signs of a strong regulatory response.

The remarkable thing is that it never occurred to them to turn that into a crux question. That was happening right under their noses! So how good are people going to be at generating cruxes about 2026 if they can’t do it about news happening in May 2023?

Unhedged: Is part of the value of crux questions that they are a form of pre-commitment? In other words, they prevent ad hoc responses to changing facts.

Tetlock: Very much so. It makes it a little more awkward to wiggle out and maintain business as usual.

Unhedged: To bring it back to finance, what are some potential best practices for investors who want to do ad-collab? You already mentioned generating crux questions. Anything else?

Tetlock: Let me give you one example. Almost all our AI expert scientists and superforecaster participants believed that extraordinary claims require extraordinary evidence. But they disagreed about who was making the extraordinary claim. Some thought that complete human extinction by 2100 was an extraordinary claim. And the AI experts asked, well, what happens when a more advanced species or civilisation encounters a less advanced one? It tends to be extinction for the less advanced one. So insofar as they believed that an autonomous species is emerging in AI research, and that we’re reaching an exponential take-off point where AI starts programming itself, it’s quite reasonable to suppose that human extinction could happen.

So there you have it. How do you reach agreement when you can’t agree on who’s making the extraordinary claim? I think in finance you wouldn’t have that, would you? There’d be much more consensus on what counts as an extraordinary claim. The Dow is going to go down 90 per cent? OK, that’s pretty extraordinary.

Unhedged: You might suppose that Keynesians and Austrian economists are living in different paradigms.

Tetlock: I agree the differences are quite stark. But you would ask a question like: how are things going to work out in Argentina? There are a lot of political reasons why [President] Javier Milei would fail. There are economic and political reasons why he might succeed. And I think both sides would recognise that. Keynesians would concede that government was totally out of control in Argentina, and Hayekians would concede that there are some political grounds on which Milei will be crucified. I wouldn’t imagine that to be as polarised a debate as what we observed with AI existential risk.

Unhedged: So to condense that into a piece of advice: you want to develop crux questions, keep the sides focused on particular circumstances, and try to constrain the impact of differing world views.

Tetlock: It really helps when the debate is grounded in specifics. I think the more that the AI debate becomes grounded in specifics, the better. When you’re talking at the level of high abstraction, where should the burden of proof lie about who is making extraordinary claims?

Unhedged: In your field of study, what’s the next frontier? If you had to pick questions that will define the next 30 years of work in your field, what would they be?

Tetlock: I think you put your finger on it: AI-human collaboration. The teams that can pull that off most effectively will have a big competitive advantage. We’re starting to see some of it in the work we’re doing on using LLMs to capture schools of thought. The schools of thought give feedback to the LLMs about what they’re getting right and wrong, and the LLMs are slowly learning.

We’re moving into a quite different world from the one I started in. I got my PhD in 1979. But one of my advisers, Bob Abelson at Yale, was on this quixotic quest to create an AI, and he was working with Roger Schank, a pretty famous computer science guy, to create computer programs that could answer questions. What happens next in a restaurant? How do you understand a newspaper article about a traffic accident? Not very successfully, by the way. But the field has just come so far, with so many hiccups along the way. I mean, the field has been written off as dead a couple of times in the last 40 years. And now it’s being hailed as world changing.

It seems possible that GPT-5 is only going to be a very marginal improvement on GPT-4, and that we’re reaching the limits on what can be done with these reinforcement learning neural networks. Some very prominent computer scientists are making that argument. But you’ve also got some very large companies that are betting many billions of dollars that AI is going to continue taking off.

One good read

Intel stock is up 60 per cent from the bottom last year. There’s a lot left to do.

FT Unhedged podcast

Can’t get enough of Unhedged? Listen to our new podcast, hosted by Ethan Wu and Katie Martin, for a 15-minute dive into the latest markets news and financial headlines, twice a week. Catch up on past editions of the newsletter here.

Recommended newsletters for you

Swamp Notes — Expert insight on the intersection of money and power in US politics. Sign up here

Due Diligence — Top stories from the world of corporate finance. Sign up here

Source link

Philip Tetlock on teaming up with AI

Friday Interview: Philip Tetlock

One good read

FT Unhedged podcast

Recommended newsletters for you

Forecasters project active hurricane season as homeowners remain unprepared for damages

Musk looks to bring advertisers back to X after ‘go f—...

Steve Jobs’ widow grabs $94 million California lot

Decoding Donald Trump’s favorite 2024 campaign trail catchphrases : NPR

Why NOW is the Time to Master YouTube Live (A 2024...