New CHAIN Benchmark Tests AI Physical Reasoning | aib vote