Originally published on We Can Look Up - Substack, May 22, 2025.


☠️ Warning: reading this article may result in existential distress, continue at your own peril ☠️

I

Welcome to We Can Look Up.

You’re probably wondering why I’ve called you here today?

These articles may one day cover a range of topics, but right now, I want to urgently share my reflections on the progress of artificial intelligence (AI).

I’ve been obsessed with AI for almost 14 years.

It was a crisp winter night in 2011, and we were having drinks, overlooking the lake at my undergraduate university. A friend from college asked me two simple questions that have haunted me ever since:

‘What do you think will happen when we build computers that are smarter than we are?’

and

‘When we do, who will build the next generation of even smarter computers?’

In the years since, I completed my studies as a doctor, started training in psychiatry and I was fortunate to have the opportunity to study a 2-year masters in data science (Computational Biology) on scholarship at Harvard, with technical coursework and research in AI and machine learning.

I also spent a fair amount of this time contemplating humanity’s future. And thinking about our place in the universe.

And I’ve never stopped trying to answer those two simple questions from 14 years ago.

What does AI mean for humanity?

The Rising Tide

Please return your seat-backs to their full upright and locked position

Why am I writing about AI now?

Recently I’ve had a handful of friends and colleagues ask me “What’s happening in AI?”.

And over the past 14 years, and particularly during recent months and weeks, I’ve come to believe there’s a high probability each of the following statements is true:

  1. Independent AI systems can be built with capabilities (intelligence, agency, strategy, communication etc.) greater than those of humans.

  2. On our current trajectory, they will be built, and soon.

    • More likely than not, within 5 years.
  3. If these systems are built, humans will very likely lose any and all control over our future.

  4. In many of the most plausible scenarios, if we build these systems, they will pose an existential risk to every person alive.

Yes.

If we build fundamentally uncontrollable beings that are more powerful than we are, I believe a likely possible outcome is our extinction.

Some people, when they see ideas like these, don’t even allow themselves to consider the possibility they may be true.

I certainly don’t expect you to agree with what I write, and I’m not certain of each of these 4 propositions (stay tuned for further articles exploring these questions further).

But I invite you to consider these ideas carefully for yourself and share your thoughts with me. What do you agree with and what do you disagree with? And why?

I hope you can update me and I sincerely hope I’m wrong.

“…there could be an algorithm that said, ‘Go penetrate the nuclear codes and figure out how to launch some missiles.’

If that’s its only job, if it’s self-teaching and it’s just a really effective algorithm, then you’ve got problems.”

Barack Obama - 2016

Stay with me - I promise there’s hope, and we can look up.


II

Sailing into the Singularity

AI Progress is Accelerating

A helicopter sits at rest.

The engine whirs into gear and the rotor blades slowly start to move.

As they gradually accelerate, the inertia of the multi-tonne chunk of metal keeps it firmly rooted on the tarmac.

As the blades rotate faster and faster, eventually, the helicopter tentatively begins to lift just a corner of one of its skids a tiny fraction of an inch off the ground.

At first it teeters almost imperceptibly, and then hovers in place.

Finally, the entire machine lifts off the ground, rises slowly, and then soars, faster and faster, up into the sky.

Artificial General Intelligence - Humanity’s Final Invention?

The explicit target of each of the leading AI labs (OpenAI, Anthropic, Deepmind, plus xAI, Meta and DeepSeek) over the past decade, has always been to build Artificial General Intelligence (AGI).

AGI is an AI system that:

  • Matches humans in capabilities broadly
  • Can function in a wide range of situations, independently of human supervision

We can think of AGI as ‘general and human-level AI’.

As I discuss next, I believe we are currently a lot closer to AGI than virtually everyone realises, and after the past few months, a new and even steeper trajectory of progress may have begun (driven by the reasoning generation of models).

How Far is AGI? Are We in the Latter Stages of the Game?

The Scaling Laws

Since GPT-1 was released in 2018, our primary pathway towards AGI has been to build much larger models with each new generation (larger datasets and greater computational power).

This has consistently resulted in progressively more capable foundational models - a phenomenon known as The Scaling Laws.

The Scaling Laws (along with limits in data and hardware) may or may not be hitting a ceiling, but even if so, I find I’m not reassured, because we are now exposed to multiple potential pathways into AGI.

Reasoning Models

A break-through came in late 2024, with the creation and release of reasoning models built with chain of thought (CoT) using reinforcement learning.

Up until a few months ago, when asked a question, a standard large-language model (LLM), would blurt out the first answer that came to mind and run with it, so to speak.

This lack of thought prior to responding, was identified in July 2024, by Ex-OpenAI Alignment Team member Leopold Aschenbrenner, as one of the remaining important hurdles, currently ‘hobbling’ AI, on its path to AGI.

By contrast, you may have noticed the new reasoning models (such as ChatGPT 4-o3) pause and think carefully about a response before answering.

They follow a series of reasoning steps, a ‘chain of thought’, before answering with a more intelligent response.

Scaling and chain of thought improvements are two separate pathways towards AGI, and they already appear to be compounding each other - chain of thought has squeezed a lot more juice out of GPT4-level models.

Each time the chefs in the kitchen find a new ingredient (an ‘algorithmic improvement’) we watch these models grow brighter, and improvements are coming at break-neck speed.

Like many others, I expect the next 12 months to see significant improvement in ‘agents’ - models that are no longer stuck in chatbot shackles, but are capable of independently pursuing goals, roaming free on the internet.

GPT-4o-3 - A Christmas Eve Canary in the Coal-Mine?

When do smart normies need to wake up to AI?

Christmas Eve Canary in the Coal-Mine

But first, let’s rewind and consider some ‘ancient history’ (by AI standards) - Christmas, 2024.

On December 24th 2024, just as the world’s media were celebrating their holidays at home with family, Open-AI released news of GPT-4o3 (o3 for short).

o3 is currently the world’s most powerful language model (among those released to the public anyway) and the most powerful in the new family of reasoning models. After months of internal review, o3 was released to the public on April 16th 2025.

Do We Have Human-Level Computer Programming Now?

In their Christmas Eve update, OpenAI shared evaluations of o3, demonstrating unexpected jumps compared to the previous state-of-the-art models.

Among other dramatic gains in capabilities, o3 scored 2727 on Codeforces (a global competitive computer coding platform), which is equivalent to the 175th best human competitive coder on the planet.

If you’ve just tuned in folks, that means, as of 5 months ago, it seems AI can now write code at a level surpassing 99.99% of competitive human programmers.

Zvi Mowshowitz: “In the presentation, [CEO, Sam] Altman jokingly mentions that one person at OpenAI is a competition programmer who is 3000+ on Codeforces, so ’they have a few more months’ to enjoy their superiority. Except, he’s obviously not joking. Gulp.”

Anyone who’s used LLMs to code will know that until now, they have been useful and sometimes impressive, but haven’t threatened to automate the job of a professional software engineer.

And they certainly haven’t been capable of coding at a world-class level.

OpenAI may be hyping their latest model, and the real-world performance of models doesn’t always match their evaluation metrics.

But OpenAI have been reasonably honest in their evaluations previously, and people in the community are taking this update very seriously.

o3 was released to the public in April, and 4 weeks later, it’s clear it’s a significant improvement on the previous state of the art. I’ll discuss new model capabilities in future articles. But if your experience with ChatGPT was trying the free version 18 months ago and noticing it’s drawbacks and hallucinations, then you may be surprised.

Here’s an article by Scott Alexander describing informal experiments conducted by himself and Kelsey Piper. Kelsey elicited an eerie new capability from o3: guessing the location a nondescript photograph was taken, using only extremely subtle cues - at a level rivaling or even exceeding the best humans in the world.

Could a human guess the exact location of this photograph (Marina State Beach, Monterey, CA)? o3 can, but don’t expect it to explain how.

Frontier models have progressed in leaps and bounds, and can now rapidly research and deliver reasonable opinions, at a PhD level, across virtually every field of human endeavor.

World-Expert Level Answers in Economics and Physics?

Tyler Cowen, leading academic economist and writer, recently posted about o1 pro (the less-powerful predecessor to o3). February 3 2025 (emphasis mine):

“I find it very difficult to ask o1 pro an economics question it cannot answer. I can do it, but typically I have to get very artificial. It can answer, and answer well, any question I might normally pose in the course of typical inquiry and pondering.

In an economics test, or any other kind of naturally occurring knowledge test I can think of, it would beat all of you (and me).

Its rate of hallucination is far below what you are used to from other LLMs.

[…]

“o1 pro is the smartest publicly issued knowledge entity the human race has created (aside from Deep Research!). Adam Brown, who does physics at a world class level, put it well in his recent podcast with Dwarkesh. Adam said that if he had a question about something, the best answer he would get is from calling up one of a handful of world experts on the topic. The second best answer he would get is from asking the best AI models.

As Adam indicated, I think only a relatively small number of humans in the world can give better answers to what I want to know.”

While AI continues to have its naysayers, their ranks seem to be thinning. And it’s fair to note that Tyler Cowan has been skeptical of transformative AI over the past 2 years.

Am I Among the Final Cohorts of Human Doctors?

Research published in January 2025, in the Journal of the American Medical Association (JAMA) found AI chatbots now surpass doctors in clinical decision-making.

Another study, this one from the University of Virginia School of Medicine, published November 2024, reported that not only did AI rival human doctors in the accuracy of medical diagnosis, but combining AI with humans, actually reduced accuracy in some cases. It seems doctors performed more accurately when they trusted the AI rather than their own clinical judgement.

You might say, ‘Well AI might be able to diagnose disease, but it can’t empathise with patients’.

Well, another study published in JAMA Internal Medicine in April 2023, found evaluators preferred the bedside manner of ChatGPT over human doctors.

In this study, ChatGPT’s answers were rated significantly higher in both quality and empathy. 45.1% of ChatGPT’s responses were deemed empathetic or very empathetic, compared to only 4.6% of physicians’ responses.

So why don’t we have AI doctors?

These recent findings on clinical decision making and diagnosis were only published in the past few months, and it takes years to build new applications in technology, and many years for new products to be accepted into clinical practice.

And there’s more to clinical medicine than diagnosis, clinical-decision-making and communicating empathy. Doctors navigate a complex physical healthcare system.

How long will it be until we build more general systems capable of doing this as well?

When is smart, smart enough?

From computer programming, to economics, to physics, to medicine, the latest models are impressing experts in their fields.

AI is starting to reach human-level or above, in more and more areas.

The raw horse-power needed to build useful applications to support humans in most intellectual domains is already available now.

We don’t need greater capabilities to bring massive benefits to humanity in fields ranging from healthcare to scientific research.

In the coming months and years, as chatbots morph into more and more independent systems capable of initiating actions (ie. ‘agents’) they will of course bring significant financial incentives to the people who control them (initially, at least, until they lose control).

But AI advancements risk triggering a ‘take-off’ - a hypothetical positive feedback loop in which AI races past us, like an express train hurtling through a station, as humanity stands on the platform, left behind in a cloud of dust.

As I explain below, human-level computer programming skills may just be one of catalysts that accelerate us into ’take-off’.

How Far Away is Take-off?

Why we don’t need robot plumbers to trigger an intelligence explosion.

Take-off

‘Take-off’ describes a hypothetical shift in AI progress when AI starts to ‘recursively self-improve’.

Take-off could easily come before we reach AGI. For take-off to happen, we don’t need to build systems capable of automating every human field (for example, hairdressing, nursing, advanced mathematics or plumbing), we just need to build systems capable of automating AI research itself.

In this scenario, the AI can, of course, take care of the rest!

During 2024, there were reports out of Anthropic, that AI had become useful enough for software engineers to speed up their coding, and Zvi Mowshowitz mused - ‘was this the beginning of a slow take-off?’.

In recent weeks, Google announced news of AlphaEvolve, an AI system they say has already accelerated internal AI research.

And as we’ve discussed, the latest reasoning models like o3 seem to be capable of programming at a world-class level.

Are these the sparks that start a chain reaction accelerating frontier AI research?

Could news of the reasoning models be the canary in the AI coal mine?

“I fear that AI may replace humans altogether.

If people design computer viruses, someone will design AI that improves and replicates itself.

This will be a new form of life that outperforms humans

[…]

It would take off on its own, and re-design itself at an ever-increasing rate,”

Stephen Hawking - 2017

These puppies are unrelated to the article but I just wanted to lighten the mood a little.

See you in Part 2!