ChatGPTとGemini、東大入試で「首席超え」の衝撃
- •ChatGPT and Gemini outperformed the top-scoring student in Tokyo University's 2026 entrance exam simulation.
- •Models demonstrated significant progress in mathematical reasoning, moving from failing grades to perfect scores in just one year.
- •Human experts grading descriptive answers reveal AI's strength in logic but persistent struggles with context and visual interpretation.
The recent performance of Large Language Models (LLMs) on Japan's most prestigious entrance exams—specifically those of the University of Tokyo and Kyoto University—has sent ripples through both the academic and technological communities. For non-computer science students, it is vital to understand that this is not simply about an AI 'getting lucky' with multiple-choice questions. We are looking at a fundamental shift in how these systems handle complex, multi-step logical reasoning under strict, human-evaluated constraints.
The data provided by the startup LifePrompt, which subjected models like ChatGPT and Gemini to the 2026 secondary entrance examinations, illustrates a stark trajectory. In the Tokyo University 'Science III' category—a track famously known as the gateway to the medical profession—both leading models secured scores that surpassed the actual top-scoring human student. Specifically, scores exceeding 490 out of 550 points highlight that these models have transcended simple pattern matching. The jump is most visible in mathematics; where a model might have struggled to reach 40 points in previous iterations, it now achieves near-perfect scores. This leap is emblematic of how rapidly reasoning capabilities, often driven by architectures that allow for deeper 'thinking' processes, are maturing.
What makes this analysis particularly rigorous is the methodology: the examinations were not merely automated inputs. These were descriptive, handwritten-style prompts subjected to the same grueling grading standards used by human cram school lecturers (specifically from Kawaijuku). When an expert grader evaluates a descriptive answer, they aren't just looking for the final number; they are analyzing the derivation, the logical flow, and the scientific rigor. The fact that these AI systems could navigate this level of scrutiny—and occasionally falter on nuanced linguistic or visual tasks—provides a balanced view of their current capabilities.
However, as an aspiring technologist or observer, one must avoid the trap of 'AI exceptionalism.' While these models have mastered the rigid structure of a mathematics exam, they still exhibit what experts call 'contextual gaps.' The study noted significant struggles in subjects requiring deep empathy, metaphorical understanding, or complex visual interpretation, such as geography, history, or literature. It is here that the disparity between 'knowledge' and 'understanding' becomes palpable. The models often possess the data, but their ability to synthesize that data into the specific, culturally nuanced prose required by top-tier Japanese universities remains a hurdle.
Ultimately, this evolution suggests that the future of education is not about competing with the machine's ability to calculate or recall facts. Instead, it shifts the focus toward the uniquely human capacity for high-level context, nuanced output control, and the ability to verify AI-generated work. We are witnessing a transition from AI as a simple tool for retrieval to AI as a collaborative partner in reasoning. As these systems continue to evolve, the challenge for the next generation of students will be to learn how to wield these capabilities to supplement—rather than replace—the critical thinking skills that define academic excellence.