Deep knowledge of the world is necessary if we are to have autonomous and intelligent agents and artifacts that can assist us in everyday activities, or even carry out tasks entirely independently. One way to factorize the complexity of the world is to associate information and knowledge with stable entities, animate or inanimate, such as a person or a vehicle, etc. In this talk I’ll survey a number of recent efforts whose aim is to create and annotate reference representations for objects based on 3D models with the aim of delivering such information to new observations, as needed. In this object-centric view, the goal is to use these reference representations for aggregating information and knowledge about object geometry, appearance, articulation, materials, physical properties, affordances, and functionality.
We acquire such information in a multitude of ways, both from crowd-sourcing and from establishing direct links between models and signals, such as images, videos, and 3D scans -- and through these to language and text. The purity of the 3D representation allows us to establish robust maps and correspondences for transferring information among the 3D models themselves -- making our current 3D repository, ShapeNet, a true network. Furthermore, the network can act as a regularizer, allowing us to to benefit from the “wisdom of the collection” in performing operations on individual data sets or in map inference between them. This effectively enables us to add missing information to signals through computational imagination, giving us for example the ability to infer what an occluded part of an object in an image may look like, or what other object arrangements may be possible, based on the world-knowledge encoded in ShapeNet and other repositories.
The talk will also briefly discuss current approaches in designing deep neural network architectures appropriate for operating directly on irregular 3D data representations, such as meshes or pointclouds, as well as ways to learn object function from observing multiple action sequences involving objects.