An RL training method that reduces covert political bias by enforcing symmetric responses to opposing political viewpoints across sentiment and helpfulness dimensions.