A New Logic-Based Benchmark for AI Reliability | aib vote