3D Reconstruction of Dynamic Scenes from Monocular Video
As humans we take the ability to perceive the dynamic world around us in three dimensions for granted. From an early age we can grasp an object by adapting our fingers to its 3D shape; we can understand our mother’s feelings by interpreting her facial expressions; or we can effortlessly navigate through a busy street. All of these tasks require some internal 3D representation of shape, deformations and motion.
Building algorithms that can emulate this level of human 3D perception has proved to be an extremely challenging task. In this tutorial I will show progress from early systems which captured sparse 3D models with primitive representations of deformation towards the most recent algorithms which can capture every fold and detail of hands or faces in 3D using as input video sequences taken with a single consumer camera. There is now great short-term potential for commercial uptake of this technology.
Professor Lourdes Agapito obtained her BSc, MSc and PhD (1996) degrees from the Universidad Complutense de Madrid (Spain). She held an EU Marie Curie Postdoctoral Fellowship at The University of Oxford's Robotics Research Group before being appointed as a Lecturer at Queen MaryUniversity of London in 2001. In 2008 she was awarded an ERC Starting Grant to carry out research on the estimation of 3D models of dynamic scenes from monocular video sequences. In July 2013 she joined the Department of Computer Science at University College London (UCL) where she leads a research team that focuses on developing algorithms for 3D understanding of the real world from video.
Lourdes was Program Chair for CVPR 2016, the top annual conference in computer vision; in addition she was Programme Chair for 3DV'14 and has served as Area Chair for the top Computer Vision conferences. Lourdes is Associate Editor for the International Journal of Computer Vision (IJCV) and IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), a member of the Executive Committee of the British Machine Vision Association and a member of the EPSRC Peer Review College.
Intro to Reinforcement Learning
This tutorial will give a brief introduction to the fundamental concepts in reinforcement learning. These include the exploration/exploitation dilemma, the credit assignment problem, bandit algorithms, planning and learning in Markov decision processes, and deep reinforcement learning.
Shimon Whiteson is an associate professor in the Department of Computer Science at the University of Oxford, and a tutorial fellow at St. Catherine’s College. His research focuses on artificial intelligence, with a particular focus on machine learning and decision-theoretic planning. In addition to theoretical work on these topics, he has in recent years also focused on applying them to practical problems in robotics and search engine optimisation. He studied English and Computer Science at Rice University before completing a doctorate in Computer Science at the University of Texas at Austin in 2007. He then spent eight years as an Assistant and then an Associate Professor at the University of Amsterdam before joining Oxford as an Associate Professor in 2015. He was awarded an ERC Starting Grant from the European Research Council in 2014 and a Google Faculty Research Award in 2017.
Probabilistic and Deep Models for 3D Reconstruction
3D reconstruction from multiple 2D images is an inherently ill-posed problem. Prior knowledge is required to resolve ambiguities and probabilistic models are desirable to capture the ambiguities in the reconstructed model. In this talk, I will present two recent results tackling these two aspects. First, I will introduce a probabilistic framework for volumetric 3D reconstruction where the reconstruction problem is cast as inference in a Markov random field using ray potentials. Our main contribution is a discrete-continuous inference algorithm which computes marginal distributions of each voxel's occupancy and appearance. I will show that the proposed algorithm allows for Bayes optimal predictions with respect to a natural reconstruction loss. I will further demonstrate several extensions which integrate non-local CAD priors into the reconstruction process. In the second part of my talk, I will present a novel framework for deep learning with 3D data called OctNet which enables 3D CNNs on high-dimensional inputs. I will demonstrate the utility of the OctNet representation on several 3D tasks including classification, orientation estimation and point cloud labeling. Finally, I will present an extension of OctNet called OctNetFusion which jointly predicts the space partitioning function with the output representation, resulting in an end-to-end trainable model for volumetric depth map fusion.
Andreas Geiger is a Max Planck Research Group Leader at the MPI for Intelligent Systems in Tübingen heading the Autonomous Vision Group (AVG), and a Visiting Professor at ETH Zürich. Prior to this, he was a research scientist in the Perceiving Systems department at MPI Tübingen. He studied at KIT, EPFL and MIT and received his PhD degree in 2013 from the Karlsruhe Institute of Technology. His research interests are at the intersection of 3D reconstruction and visual scene understanding with a particular focus on rich semantic and geometric priors for bridging the gap between low-level and high-level vision. He is particularly interested in autonomous driving applications. His work has received several prices, including the Heinz Maier Leibnitz Prize, the Ernst-Schoemperlen Award, as well as best paper awards at CVPR, GCPR and 3DV. He is an associate member of the Max Planck ETH Center for Learning Systems and serves as area chair and associate editor in computer vision (CVPR, ECCV, PAMI).
Deep Learning for 3D Localization
The first part of the talk will describe a novel method for 3D object detection and pose estimation from color images only. We introduce a ``holistic'’ approach that relies on a representation of a 3D pose suitable to Deep Networks and on a feedback loop. This approach, like many previous ones is however not sufficient for handling objects with an axis of rotational symmetry, as the pose of these objects is in fact ambiguous. We show how to relax this ambiguity with a combination of classification and regression. The second part will describe an approach bridging the gap between learning-based approaches and geometric approaches, for accurate and robust camera pose estimation in urban environments from single images and simple 2D maps.
Dr. Vincent Lepetit is a Full Professor at the LaBRI, University of Bordeaux, and an associate member of the Inria Manao team. He also supervizes a research group in Computer Vision for Augmented Reality at the Institute for Computer Graphics and Vision, TU Graz. He received the PhD degree in Computer Vision in 2001 from the University of Nancy, France, after working in the ISA INRIA team. He then joined the Virtual Reality Lab at EPFL as a post-doctoral fellow and became a founding member of the Computer Vision Laboratory. He became a Professor at TU Graz in February 2014, and at University of Bordeaux in January 2017. His research interests include computer vision and machine learning, and their application to 3D hand pose estimation, feature point detection and description, and 3D registration from images.