A reward signal with multiple dimensions (e.g., correctness per test case) instead of a single scalar score.