New Algorithm Stabilizes Reinforcement Learning for LLM Training | aib vote