Inferring 3D structure and depth from a single 2D image or video frame without stereo or multi-view input.