OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
OmniStream achieves perception, reconstruction, and action in visual streams using causal spatiotemporal attention and 3D-RoPE, excelling across 29 datasets.
Yibin Yan, Jilan Xu, Shangzhe Di et al.