New Benchmark Evaluates Memory-Driven Scientific Reasoning in AI | aib vote