Structure-from-Motion (SfM) is a robust vision pipeline to estimate camera parameters and a sparse point cloud from an unordered set of images.

Why it is Important

  • Robust and efficient pipeline that just requires a set of images to build a 3D map.
  • Widely used as a backend in 3D vision to estimate camera intrinsics and extrinsics, e.g. for photogrammetry or novel-view-synthesis (NeRF).
  • Sparse pointcloud enables efficient and highly accurate localization of new images in the map.

Key Feature

  1. Feature Detection and Matching: Local features have to be detected and matched across images.
  2. Visual Localization: Estimating the camera pose of an image w.r.t. a sparse 3D map is at the core of incremental mapping pipelines.
  3. Bundle Adjustment: Camera poses and the 3D point cloud are jointly refined with a large non-linear optimization called Bundle Adjustment.

  4. Applications: Structure-from-Motion is widely used in many computer vision tasks, such as:

    • Photogrammetry
    • 3D reconstruction
    • Visual localization


Structure-from-Motion is the State-of-the-Art approach to accurately estimate camera parameters from an image collection, and is widely used as a backend in computer vision systems.


  • LaMAR: Benchmarking Localization and Mapping for Augmented Reality (ECCV 2022) [Project page]
  • Camera Pose Estimation using Implicit Distortion Models (CVPR 2022) [Paper]
  • Pixel-Perfect Structure-from-Motion with Featuremetric Refinement (ICCV 2021) [Project page]
  • Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021) [Project page]