Adding explicit positional time encoding to neural process models significantly improves their ability to generalize to unseen action sequences in robotic action prediction tasks.
This paper applies Conditional Neural Processes to predict robot actions from partial observations, inspired by how humans understand others' movements. The authors improve an existing multimodal prediction model by adding better temporal encoding, enabling robots to forecast actions over longer sequences and refine predictions as new information arrives.