Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

Shengqiang Zhang, Ruotong Liao, Volker Tresp, Barbara Plank, Hinrich Schütze|June 11, 2026arXiv

Key Takeaway

By working with a VAR model's native bitwise predictions and residual code composition instead of token streams, you can edit images more precisely while preserving backgrounds—no retraining needed.

Summary

BitResEdit is a training-free editor for visual autoregressive image generators that uses two techniques: BitEdit guides bit-level predictions toward text descriptions, while ResEdit applies edits through the model's native residual code structure. This approach preserves unedited regions exactly while making precise, localized changes to match text prompts.

multimodal efficiency applications

Key Terms

visual-autoregressive-models classifier-free-guidance residual-code bernoulli-prediction kl-trust-region