Exploration, exploitation, and thinking

01 Sep, 2025

I’ve been thinking about AI and machine learning for about a decade. One of the surprising things is how the ideas sometimes spill over into life.

In reinforcement learning (RL) there’s a tension between two forces: exploration and exploitation. Exploration means trying new things. Exploitation means sticking with what you already know.

The way RL agents learn is by exploring a lot in the beginning. Later, once they know more, they can safely exploit. But if they skip exploration, they get stuck. They just keep repeating what they know, even if it isn’t very good.

Humans aren’t so different. We need a period of exploration, where we struggle and figure things out for ourselves. That’s how you build the mental muscles for judgment. If you outsource that too soon, you never develop them.

It reminds me of calculators. Once we had them, we could move on to harder maths. But we learned to add first, and that mattered. You can bash the keys all day, but if you don’t know what the symbols mean, the answers won’t mean anything either. Calculators helped once we’d done the hard part.

AI works the same way. It can take away the struggle. But the struggle is the point. That’s how you learn to think.

Studies show that people who lean on AI too early don’t go as deep. They remember less. They do less real thinking. Over time it becomes a kind of thought atrophy.

The irony is that AI could help us explore more, but it usually does the opposite. It tempts us into premature exploitation. That’s why so many students and young workers risk getting stuck. They trade the long-term reward of thinking for the short-term reward of an easy answer.

This is the exploitation trap. You get an answer, but at the cost of the skills you need to find answers yourself, maybe even better ones. And the younger you are, the bigger the cost, because you may skip exploration entirely.

So what’s the right balance? Probably the same one RL agents use. Explore first, exploit later. But it’s never all or nothing. Even the most experienced agents still explore a little.

For people it helps to name the goal. If the goal is output, use AI. If the goal is learning, by all means use AI, but embrace the struggle. Because when you get answers handed to you, it only feels like learning.

An RL agent that skips exploration never learns its environment. A generation that skips exploration may never learn to think.

The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects from a Survey of Knowledge Workers (Microsoft & Carnegie Mellon, 2025)

ChatGPT's Impact On Our Brains According to an MIT Study (TIME, 2025)

Does ChatGPT Make You Dumber? What a New MIT Study Really Found (Marketing AI Institute, 2025)

Increased AI use linked to eroding critical thinking skills (Phys.org, 2025)

Evaluating the Impact of AI Dependency on Cognitive Ability among Young Adults (AMH International, 2024)

#AI #RL #musing