Learning on Riemannian Manifolds for Interpretation of Visual Environments

Ph.D. Thesis Cuneyt Oncel Tuzel


Classical machine learning techniques provide effective methods for analyzing data when the parameters of the underlying process lie in a Euclidean space. However, various parameter spaces commonly occurring in computer vision problems violate this assumption. We derive novel learning methods for parameter spaces having Riemannian manifold structure and present several practical applications for scene analysis.

The mean shift algorithm on Lie groups is a generalization of the mean shift procedure which is also an unsupervised learning technique for vector spaces. The derived procedure can be used to cluster data points which form a matrix Lie group. We present an application of the new algorithm for multiple 3D rigid motion estimation problem from noisy point correspondences in the presence of outliers. The approach performs simultaneous estimation of all the motions and does not require prior specification of the number of motion groups.

We present a novel geometric framework to learn a supervised classification model for data points lying on a connected Riemannian manifold. The structure of the classifier is an additive model, where the weak learners are trained on the tangent spaces of the manifold. The derived algorithm is applied to pedestrian detection problem which is known to be among the hardest examples of the detection tasks.

We describe a regression model where the response parameters form a Lie group. The model is utilized for affine tracking problem where the motion is estimated as a parameter of the image observations. We present generalization of the learning model to build an invariant object detector from an existing pose dependent detector. The proposed model can accurately detect objects in various poses, where the size of the search space is only a fraction compared to the existing detection methods.

The other contributions of the thesis include a novel region descriptor and an online learning algorithm for estimating background statistics of a scene which are utilized for several challenging problems such as matching, tracking, texture classification and low frame rate tracking.

The thesis contains 176 pages.

Return to Theses