ETH Zurich - D-INFK - IVC - CVG - Research - Mountain Localization

Visual Image Localization in Mountainous Areas

Georges Baatz, Olivier Saurer, Kevin Köser and Marc Pollefeys
ETH Zürich, Switzerland
{gbaatz, saurero, kevin.koeser, marc.pollefeys}



We address the problem of localizing any given photograph (of a mountainous landscape) using vision techniques only. We propose an automated approach for very large scale visual localization that can efficiently exploit visual information and geometric constraints at the same time. We validate the system on the scale of a whole country (Switzerland, 40'000 km²) using a new dataset of more than 200 landscape query pictures with ground truth.


Quick Overview

The appearance of natural scenes changes dramatically with seasons, time of day or weather conditions. This makes the use of traditional patch-based features impractical. We use the horizon as recognizable feature, since it remains stable over very long timespans.

Feature Descriptor

We take smaller parts of the horizon curve, which after smoothing and downsampling constitute local feature descriptors. These are quantized to obtain visual words.

Voting Scheme

We propose a novel bag-of-words scheme that votes for both location and viewing direction simultaneously. This enables a rough geometric consistency check already at the voting stage.


Our approach localizes 88% of the query images correctly withing 1km of the ground truth and estimates the full 3D orientation of the camera.

Supplementary Results

Robustness to Tilt

Our algorithm uses the fact that landscape images usually are not subject to extreme tilt angles. In this experiment, we virtually rotate the extracted horizon of the query images by various angles in order to simulate camera tilt and observe how recognition performance is affected.

Tilt versus Recognition

We designed our proposed feature descriptor to be robust with respect to camera tilt. It turns out that we still get over 60% recognition, even for 30° tilt (which is a lot, since the horizon is usually straight in front of the camera, not above or below).

Robustness to Field-of-View

The field-of-view (FoV) extracted from the EXIF data may not always be 100% accurate. This experiment studies the effects of a slight inaccuracies. We modify the FoV obtained from the EXIF by various percentages and plot it against recognition performance on the entire query.

FoV versus Recognition

In the paper, we state that we only need to know an approximate value for the field-of-view. Here, we see that even if that value is off by ±5%, we still get 70-80% recognition.

Estimation of Instrinsics

We can even obtain a rough estimate of the camera intrinsics by hypothesizing different values for the field-of-view (FoV) and retaining the best one.

Field of View Animation

Every frame of the animation displays the matching costs arising from a different assumed FoV. For FoVs between 25° and 45°, the optimum (in blue) travels along the camera's viewing direction northwest. The last frame shows for each location the best matching cost over all FoVs. The minimum corresponds to the best combination of location and FoV.

Field of View versus Matching Cost

This plot shows the matching cost of the best location as a function of the FoV. For FoVs around the 33° (the value from the EXIF tag), the matching cost is lower and varies more smoothly than further away.

Query Sets

Dataset: CH1 [226MB] (203 images)
Dataset: CH2 [1.2GB] (948 images)


We thank Simon Wenner for his help with rendering the DEMs. We also thank Hiroto Nagayoshi, José Henrique Brito, Lionel Heng and the Panoramio users bp_meier, fourpier, JGAlarcon, loamvalley, tompon and tressy for contributing photographs to the queryset. This work has been supported through SNF grant 127224 by the Swiss National Science Foundation.