TOPReward Uses Model Probabilities for Better Robotic Training | aib vote