New Benchmark Evaluates AI Research Process and Factuality | aib vote