AI Agents Struggle with End-to-End Scientific Research Benchmarks | aib vote