DeepSeek Models Challenge Reasoning Costs via Reinforcement Learning | aib vote