When the student shows up with an upgrade, passing the test isn’t enough
There’s a new game of cat and mouse between AI systems and exams like Humanity’s Last Exam—created to show just how intelligent (or not) the most advanced models really are. The funny thing is that, using almost any reasonable definition, today’s models would already qualify as “intelligent” if we tested them against the original Turing Test, proposed by Alan Turing 75 years ago.
Not familiar with the Turing Test? The idea was simple but powerful: evaluate whether an intelligent system could convince a person, through a text-based conversation, that it wasn’t a machine. If the evaluator couldn’t tell the human apart from the system, the test was considered passed. As you can imagine, the latest models—like ChatGPT-4.5—have largely passed that test: more than 70% of the time, people can’t tell whether they’re talking to a system or a human.
The system isn’t failing. It’s learning.
So yes, these systems are getting smarter, more capable, and more skilled—which makes new forms of evaluation increasingly necessary. Humanity’s Last Exam, developed by Scale AI in collaboration with over a thousand contributors and the Center for AI Safety (CAIS), aims to test the limits of knowledge, reasoning, and the direct comparison between humans and models. To be honest, I’m not convinced this exam is enough to measure the intelligence of these systems—especially once we start seeing clearer signs of AGI in the real world. Maybe we should call it Humanity’s Almost Last Exam.
That brings me to a more interesting question (at least for me): how do we evaluate the intelligence, capabilities, and potential of students or employees who are already augmented by AI? How can I help my students become the best version of themselves, in a way that’s as enjoyable and seamless as possible?
Spoiler: your exam is also in beta
I’ve got my own, extremely fun, game of cat and mouse with my graduate students—one that helps me see what they really know, get them to actually learn, and push their critical thinking to the limit in every class, on every assignment, at every moment. I know some of them use AI for their homework—and I get it—but that doesn’t mean it will always work, or that they can (or should) avoid thinking for themselves.
I’m also not a fan of using AI detectors to grade assignments or treating students as if they’re cheating. I’d rather raise expectations—for them and among them—and challenge them to solve complex problems, in real time, and at random. Every class, I learn how this generation of super-smart students is trying to “hack” the system using AI, and I figure out how far I can push them to unlock new capabilities without breaking their spirit in the process.
I even developed an AI system called Professor McEthical, which works with them 24/7 on any device and helps them learn complex concepts at their own pace. Just like I wouldn’t tell a student not to use spellcheck, a calculator, or Google, I wouldn’t tell them not to use AI. I get to turn on my students' brains in different ways that challenge them in this new era of AI-enabled academic life. But they still have to understand the concepts—deeply and on demand—like Netflix or HBO. I’ve got the classroom remote: if I don’t like what I hear, I switch to the next student, and their classmates can rate what they just heard—just like Netflix. Thumbs up? Or thumbs down?
We no longer grade assignments. We grade judgment.
I’ve also changed how I assess them: I design new systems, create different challenges, and ask different questions so that evaluations measure their intelligence and potential—not just a static list of requirements.
Just as the tests for AI systems have changed over the past few months, we also need to transform how we evaluate students, team members, and future hires. We live in an augmented world, and we can—and should—raise the standards of what we expect an augmented person to be able to do, in both academic and professional settings.
The world is changing, and we need to rethink how we assess people and how we encourage them to use the tools they have—thoughtfully. I know my former students will know exactly what I mean when I talk about expectations: I want them to be bold thinkers, critical thinkers, and AI-enabled thinkers—and to know the difference between the three. For my future students: take a breath, enjoy the ride, and do your reading… because 2025 brings a brand-new set of challenges.
And it’s not just about students. It applies to the teams we build, the people we guide, and the generations coming next. So maybe it’s worth asking one last question (or three):
Does your team truly know how to succeed in an augmented world? If you have to think about it, then that’s your new top priority. What’s your Almost Last Exam?
Originally published in Spanish for Fast Company Mexico:
https://fastcompany.mx/2025/05/07/alumno-upgrade-inteligencia-artificial-ia/