ETH Zurich - D-INFK - IVC - CVG - Research

Research


Localization Localization | 3D Reconstruction | Optimization Methods | Motion Analysis | Micro Areal Vehicles | Semantic Labeling | Vision for Robotics | Interactive Image Editing

Visual Image Localization in Mountainous Areas

We address the problem of localizing any given photograph (of a mountainous landscape) using vision techniques only. We propose an automated approach for very large scale visual localization that can efficiently exploit visual information and geometric constraints at the same time. We validate the system on the scale of a whole country (Switzerland, 40'000 km²) using a new dataset of more than 200 landscape query pictures with ground truth.

Urban Location Recognition on Mobile Device

We address the problem of large scale place-of-interest recognition in cell phone images of urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-view like image data for database creation.

Camera Pose Voting for Large-Scale Image-Based Localization

Image-based localization approaches aim to determine the camera pose from which an image was taken. Finding correct 2D-3D correspondences between query image features and 3D points in the scene model becomes harder as the size of the model increases. Current state-of-the-art methods therefore combine elaborate matching schemes with camera pose estimation techniques that are able to handle large fractions of wrong matches. In this work we study the benefits and limitations of spatial verification compared to appearance-based filtering. We propose a voting-based pose estimation strategy that exhibits O(n) complexity in the number of matches and thus facilitates to consider much more matches than previous approaches – whose complexity grows at least quadratically. This new outlier rejection formulation enables us to evaluate pose estimation for 1-to-many matches and to surpass the state-of-the-art. At the same time, we show that using more matches does not automatically lead to a better performance.

Large-Scale Localization

Structure-based localization is the task of finding the absolute pose of a given query image w.r.t. a pre-computed 3D model. While this is almost trivial at small scale, special care must be taken as the size of the 3D model grows, because straight-forward descriptor matching becomes ineffective due to the large memory footprint of the model, as well as the strictness of the ratio test in 3D. Recently, several authors have tried to overcome these problems, either by a smart compression of the 3D model or by clever sampling strategies for geometric verification. Here we explore an orthogonal strategy, which uses all the 3D points and standard sampling, but performs feature matching implicitly, by quantization into a fine vocabulary. We show that although this matching is ambiguous and gives rise to 3D hyperpoints when matching each 2D query feature in isolation, a simple voting strategy, which enforces the fact that the selected 3D points shall be co-visible, can reliably find a locally unique 2D-3D point assignment. Experiments on two large-scale datasets demonstrate that our method achieves state-of-the-art performance, while the memory footprint is greatly reduced, since only visual word labels but no 3D point descriptors need to be stored.

Toroidal Constraints for Two-Point Localization Under High Outlier Ratios

Localizing a query image against a 3D model at large scale is a hard problem, since 2D-3D matches become more and more ambiguous as the model size increases. This creates a need for pose estimation strategies that can handle very low inlier ratios. In this paper, we draw new insights on the geometric information available from the 2D-3D matching process. As modern descriptors are not invariant against large variations in viewpoint, we are able to find the rays in space used to triangulate a given point that are closest to a query descriptor. It is well known that two correspondences constrain the camera to lie on the surface of a torus. Adding the knowledge of direction of triangulation, we are able to approximate the position of the camera from two matches alone. We derive a geometric solver that can compute this position in under 1 microsecond. Using this solver, we propose a simple yet powerful outlier filter which scales quadratically in the number of matches. We validate the accuracy of our solver and demonstrate the usefulness of our method in real world settings.

3D Reconstruction Localization | 3D Reconstruction | Optimization Methods | Motion Analysis | Micro Areal Vehicles | Semantic Labeling | Vision for Robotics | Interactive Image Editing

A Symmetry Prior for Convex Variational 3D Reconstruction

We propose a novel prior for variational 3D reconstruction that favors symmetric solutions when dealing with noisy or incomplete data. We detect symmetries from incomplete data while explicitly handling unexplored areas to allow for plausible scene completions. The set of detected symmetries is then enforced on their respective support domain within a variational reconstruction framework. This formulation also handles multiple symmetries sharing the same support. The proposed approach is able to denoise and complete surface geometry and even hallucinate large scene parts. We demonstrate in several experiments the benefit of harnessing symmetries when regularizing a surface.

Semantic 3D Modeling

In semantic 3D modeling the goal is to find a dense geometric model from images and at the same time also infer the semantic classes of the individual parts of the reconstructed model. Having a semantically annotated dense 3D model gives a much richer representation of the scene than just the geometry. For example questions such as what is the volume of a building can directly be answered. This is difficult with just a geometric model where the knowledge about which parts of the geometry belong to the building is not present. Also by solving the problem of dense 3D reconstruction and class segmentation jointly, prior knowledge such as the ground is usually a surface which is close to horizontal can be included.

PlaneSweepLib

PlaneSweepLib (PSL) is a library that implements the plane sweeping stereo matching algorithm. It is written in C++/CUDA by Christian Häne. It contains an implementation for the pinhole camera model and for the unified projection camera model (fisheye cameras). The package comes with small test datasets and applications for both camera models and runs on Linux and Windows. It is released under the terms of the GPLv3 license.

Distortion in Multiple View Geometry

Multiple view geometry is well-understood for the case of ideal pinhole cameras and many algorithms exist to estimate epipolar geometry, trifocal tensors or homographies. In this research we focus on the problem of multiple view relations between images with radial distortion. One important case is e.g. in sequential approaches where one registers an unknown image (potentially with radial distortion) to a set of previously calibrated images. Here, we introduce the single-sided radial fundamental matrix as well as algorithms for estimating and decomposing it.

Geometric Change Detection

We present an algorithm to detect changes in the geometry of an urban environment using some images observing its current state. The proposed method can be used to significantly optimize the process of updating the 3D model of a city changing over time, by restricting this process to only those areas where changes are detected. The method also accounts for all the challenges involved in a large scale application of change detection, such as, inaccuracies in the input geometry, errors in the geo-location data of the images, as well as, the limited amount of information due to sparse imagery.

Dense Reconstruction from Symmetry

Icon

A system is presented that takes a single image as an input (e.g. showing the interior of St.Peter's Basilica) and automatically detects an arbitrarily oriented symmetry plane in 3D space. Given this symmetry plane a second camera is hallucinated that serves as a virtual second image for dense 3D reconstruction, where the point of view for reconstruction can be chosen on the symmetry plane. This naturally creates a symmetry in the matching costs for dense stereo. Alternatively, we also show how to enforce the 3D symmetry in dense depth estimation for the original image. The two representations are qualitatively compared on several real world images, that also validate our fully automatic approach for dense single image reconstruction.

Discovering and Exploiting 3D Symmetries in Structure from Motion

We propose a new approach for structure from motion, where symmetry relations in the 3D structure are automatically recovered from multiple images and then imposed within a new constrained bundle adjustment formulation that incorporates robust priors on the expected model shape. Our approach significantly reduces drift through "structural" loop closures and improves the accuracy of reconstructions in urban scenes. We also use the discovered symmetries to estimate a natural coordinate system and complete the 3D model.

3D Modeling on the Go

We present a system for 3D reconstruction of large-scale outdoor scenes based on monocular motion stereo. Ours is the first such system to run at interactive frame rates on a mobile device (Google Project Tango Tablet), thus allowing a user to reconstruct scenes "on the go" by simply walking around them. We utilize the device's GPU to compute depth maps using plane sweep stereo. We then fuse the depth maps into a global model of the environment represented as a truncated signed distance function in a spatially hashed voxel grid. We observe that in contrast to reconstructing objects in a small volume of interest, or using the near outlier-free data provided by depth sensors, one can rely less on free-space measurements for suppressing outliers in unbounded large-scale scenes. Consequently, we propose a set of simple filtering operations to remove unreliable depth estimates and experimentally demonstrate the benefit of strongly filtering depth maps. We extensively evaluate the system with real as well as synthetic datasets.

Merging the Unmatchable: Stitching Visually Disconnected SfM Models

Recent advances in Structure-from-Motion not only enable the reconstruction of large scale scenes, but are also able to detect ambiguous structures caused by repeating elements that might result in incorrect reconstructions. Yet, it is not always possible to fully reconstruct a scene. The images required to merge different sub-models might be missing or it might be impossible to acquire such images in the first place due to occlusions or the structure of the scene. The problem of aligning multiple reconstructions that do not have visual overlap is impossible to solve in general. An important variant of this problem is the case in which individual sides of a building can be reconstructed but not joined due to the missing visual overlap. In this paper, we present a combinatorial approach for solving this variant by automatically stitching multiple sides of a building together. Our approach exploits symmetries and semantic information to reason about the possible geometric relations between the individual models. We show that our approach is able to reconstruct complete building models where traditional SfM ends up with disconnected building sides.

Indoor-Outdoor 3D Reconstruction Alignment

Structure-from-Motion can achieve accurate reconstructions of urban scenes. However, reconstructing the inside and the outside of a building into a single model is very challenging due to the lack of visual overlap and the change of lighting conditions between the two scenes. We propose a solution to align disconnected indoor and outdoor models of the same building into a single 3D model. Our approach leverages semantic information, specifically window detections, in multiple scenes to obtain candidate matches from which an alignment hypothesis can be computed. To determine the best alignment, we propose a novel cost function that takes both the number of window matches and the intersection of the aligned models into account. We evaluate our solution on multiple datasets.

Non-Parametric Structure-Based Calibration of Radially Symmetric Cameras

We propose a novel two-step method for estimating the intrinsic and extrinsic calibration of any radially symmetric camera, including non-central systems. The first step consists of estimating the camera pose, given a Structure from Motion (SfM) model, up to the translation along the optical axis. As a second step, we obtain the calibration by finding the translation of the camera center using an ordering constraint. The method makes use of the 1D radial camera model, which allows us to effectively handle any radially symmetric camera, including non-central ones. Using this ordering constraint, we show that the we are able to calibrate several different (central and non-central) Wide Field of View (WFOV) cameras, including fisheye, hyper-catadioptric and spherical catadioptric cameras, as well as pinhole cameras, using a single image or jointly solving for several views.

Privacy Preserving Structure-from-Motion

We present the first full Structure-from-Motion pipeline based on privacy preserving line features.

Optimization Methods Localization | 3D Reconstruction | Optimization Methods | Motion Analysis | Micro Areal Vehicles | Semantic Labeling | Vision for Robotics | Interactive Image Editing

Tight Convex Labeling

In this work we present a unified view on Markov random fields and recently proposed continuous tight convex relaxations for multi-label assignment in the image plane. These relaxations are far less biased towards the grid geometry than Markov random fields. It turns out that the continuous methods are non-linear extensions of the local polytope MRF relaxation. In view of this result a better understanding of these tight convex relaxations in the discrete setting is obtained. Further, a wider range of optimization methods is now applicable to find a minimizer of the tight formulation. We propose two methods to improve the efficiency of minimization. One uses a weaker, but more efficient continuously inspired approach as initialization and gradually refines the energy where it is necessary. The other one reformulates the dual energy enabling smooth approximations to be used for efficient optimization. We demonstrate the utility of our proposed minimization schemes in numerical experiments.

Motion Analysis Localization | 3D Reconstruction | Optimization Methods | Motion Analysis | Micro Areal Vehicles | Semantic Labeling | Vision for Robotics | Interactive Image Editing

Unstructured VBR

We present an algorithm designed for navigating around a performance that was filmed as a "casual" multi-view video collection: real-world footage captured on hand held cameras by a few audience members. The objective is to easily navigate in 3D, generating a video-based rendering (VBR) of a performance filmed with widely separated cameras. Casually filmed events are especially challenging because they yield footage with complicated backgrounds and camera motion. Such challenging conditions preclude the use of most algorithms that depend on correlation-based stereo or 3D shape-from-silhouettes.

Marker-less Motion Capture of Interacting People

The project aims to infer the poses of a character acting in a environment filmed by a set of video cameras. Once his poses are estimated, a free-viewpoint video of the entire action can be genertated.

Articulated and Restricted Motion Subspaces and Their Signatures

The project aims to analyse and categorize different type of restricted motions. Once the type is found we exhibit how to compute the parameters of the motion with linear tools.

Micro Areal Vehicles Localization | 3D Reconstruction | Optimization Methods | Motion Analysis | Micro Areal Vehicles | Semantic Labeling | Vision for Robotics | Interactive Image Editing

Vision Controlled MAV

This is a multi-year, student-driven project to create autonomous flying systems using pure onboard processing. We are focused on computer vision on Micro Air Vehicles, which allowed us to win the EMAV 2009 Indoor Autonomy Competition.

sFly: Swarm of Micro Flying Robots!

The objective of the sFly project is to develop several small and safe helicopters which can fly autonomously in city-like enviroments and which can be used to assist humans in tasks like rescue and monitoring.

Semantic Labeling Localization | 3D Reconstruction | Optimization Methods | Motion Analysis | Micro Areal Vehicles | Semantic Labeling | Vision for Robotics | Interactive Image Editing

Efficient Structured Parsing of Facades Using Dynamic Programming

We propose a sequential optimization technique for segmenting a rectified image of a facade into semantic categories. Our method retrieves a parsing which respects common architectural constraints and also returns a certificate for global optimality. Contrasting the suggested method, the considered facade labeling problem is typically tackled as a classification task or as grammar parsing. Both approaches are not capable of fully exploiting the regularity of the problem. Therefore, our technique very significantly improves the accuracy compared to the state-of-the-art while being an order of magnitude faster. In addition, in 85% of the test images we obtain a certificate for optimality.

Vision for Robotics Localization | 3D Reconstruction | Optimization Methods | Motion Analysis | Micro Areal Vehicles | Semantic Labeling | Vision for Robotics | Interactive Image Editing

V-Charge: Automated Valet Parking and Charging for e-Mobility

The project V-Charge is an EU funded project which aims for fully autonomous charging and parking of electrical vehicles. It is based on the vision that due to required drastic decrease in CO2 production mobility will undergo important changes. The main idea is that a driver will drop off the car at a special zone called the drop off zone for example in front of a train station or an airport. The car then drives itself to the charging spot and once charged will move automatically to a parking spot until it is called back to the pickup zone using a mobile phone application.

Interactive Image Editing Localization | 3D Reconstruction | Optimization Methods | Motion Analysis | Micro Areal Vehicles | Semantic Labeling | Vision for Robotics | Interactive Image Editing

Interactive High-Quality Green-Screen Keying via Color Unmixing

Due to the widespread use of compositing in contemporary feature films, green-screen keying has become an essential part of post-production workflows. To comply with the ever-increasing quality requirements of the industry, specialized compositing artists spend countless hours using multiple commercial software tools, while eventually having to resort to manual painting because of the many shortcomings of these tools. Due to the sheer amount of manual labor involved in the process, new green-screen keying approaches that produce better keying results with less user interaction are welcome additions to the compositing artist's arsenal. We found that --- contrary to the common belief in the research community --- production-quality green-screen keying is still an unresolved problem with its unique challenges. In this paper, we propose a novel green-screen keying method utilizing a new energy minimization-based color unmixing algorithm. We present comprehensive comparisons with commercial software packages and relevant methods in literature, which show that the quality of our results is superior to any other currently available green-screen keying solution. Importantly, using the proposed method, these high-quality results can be generated using only one-tenth of the manual editing time that a professional compositing artist requires to process the same content having all previous state-of-the-art tools at his disposal.

Unmixing-Based Soft Color Segmentation for Image Manipulation

We present a new method for decomposing an image into a set of soft color segments, which are analogous to color layers with alpha channels that have been commonly utilized in modern image manipulation software. We show that the resulting decomposition serves as an effective intermediate image representation, which can be utilized for performing various, seemingly unrelated image manipulation tasks. We identify a set of requirements that soft color segmentation methods have to fulfill, and present an in-depth theoretical analysis of prior work. We propose an energy formulation for producing compact layers of homogeneous colors and a color refinement procedure, as well as a method for automatically estimating a statistical color model from an image. This results in a novel framework for automatic and high-quality soft color segmentation, which is efficient, parallelizable, and scalable. We show that our technique is superior in quality compared to previous methods through quantitative analysis as well as visually through an extensive set of examples. We demonstrate that our soft color segments can easily be exported to familiar image manipulation software packages and used to produce compelling results for numerous image manipulation applications without forcing the user to learn new tools and workflows.

Designing Effective Inter-Pixel Information Flow for Natural Image Matting

We present a novel, purely affinity-based natural image matting algorithm. Our method relies on carefully defined pixel-to-pixel connections that enable effective use of information available in the image. We control the information flow from the known-opacity regions into the unknown region, as well as within the unknown region itself, by utilizing multiple definitions of pixel affinities. Among other forms of information flow, we introduce color-mixture flow, which builds upon local linear embedding and effectively encapsulates the relation between different pixel opacities. Our resulting novel linear system formulation can be solved in closed-form and is robust against several fundamental challenges of natural matting such as holes and remote intricate structures. Our evaluation using the alpha matting benchmark suggests a significant performance improvement over the current methods. While our method is primarily designed as a standalone matting tool, we show that it can also be used for regularizing mattes obtained by sampling-based methods. We extend our formulation to layer color estimation and show that the use of multiple channels of flow increases the layer color quality. We also demonstrate our performance in green-screen keying and further analyze the characteristics of the affinities used in our method.

Semantic Soft Segmentation

Accurate representation of soft transitions between image regions is essential for high-quality image editing and compositing. Current techniques for generating such representations depend heavily on interaction by a skilled visual artist, as creating such accurate object selections is a tedious task. In this work, we introduce semantic soft segments, a set of layers that correspond to semantically meaningful regions in an image with accurate soft transitions between different objects. We approach this problem from a spectral segmentation angle and propose a graph structure that embeds texture and color features from the image as well as higher-level semantic information generated by a neural network. The soft segments are generated via eigendecomposition of the carefully constructed Laplacian matrix fully automatically. We demonstrate that otherwise complex image editing tasks can be done with little effort using semantic soft segments.


© CVG, ETH Zürich webmaster: psarlin@inf.ethz.ch