NVIDIA Researchers Fix RL Optimization via GDPO Algorithm | aib vote