Two years ago, Yuri Burda and Harri Edwards, researchers at OpenAI, were trying to find out what it would take to get a large language model to do basic arithmetic. At first, things didn’t go too well. The models memorized the sums they saw but failed to solve new ones.
By accident, Burda and Edwards left some of their experiments running for days rather than hours. The models were shown the example sums over and over again, and eventually they learned to add two numbers—it had just taken a lot more time than anybody thought it should.
In certain cases, models could seemingly fail to learn a task and then all of a sudden just get it, as if a lightbulb had switched on, a behavior the researchers called grokking. Grokking is just one of several odd phenomena that have AI researchers scratching their heads. The largest models, and large language models in particular, seem to behave in ways textbook math says they shouldn’t.
This highlights a remarkable fact about deep learning, the fundamental technology behind today’s AI boom: for all its runaway success, nobody knows exactly how—or why—it works. Read the full story.
—Will Douglas Heaven
If you’re interested in the mysteries of AI, why not check out:
+ Why AI being good at math matters so much—and what it means for the future of the technology.
+ What the history of AI tells us about its future. IBM’s chess-playing supercomputer Deep Blue was eclipsed by the neural-net revolution. Now, the machine may get the last laugh. Read the full story.
+ What an octopus’s mind can teach us about AI’s ultimate mystery. Machine consciousness has been debated—and dismissed—since Turing. Yet it still shapes our thinking about AI. Read the full story.