Brand, M., Cooper, P., & Birnbaum, L. (1995). Seeing Physics, or: Physics is for Prediction. Proceedings Of the Workshop on Physics Based Modelling in Computer Vision. doi:10.1109/PBMCV.1995.514679

Summary

In this paper, Brand et al. argue for including physical knowledge in vision and robotics systems in order to describe scenes and make predictions about scenes. As he says, “there are physical reasons why for scenes are arranged the way they are, and we need to be aware of those reasons in order to understand the scene well enough to reach in and do something about it” (pg. 144). He describes two systems, SPROCKET and MugShot, which use physical knowledge to reason about and interact with physical scenes.

This paper makes use of qualitative physical reasoning, and maybe is more appropriate in the mental models category, but because it applies physical reasoning to robotics I’m including it here instead.

Methods

SPROCKET views images of gears and basically does scene reconstruction using physical knowledge to constrain possible interpretations. It has explicit symbolic knowledge of how the scenes can be put together (e.g. that two meshing gears limit axial translation and fix axial rotation and nonaxial translation) and uses a constraint network to resolve the image into a scene that satisifies all the constraints.

MugShot is a system to allow robots to pick up objects with handles (e.g., mugs) and uses physical knowledge to inform how much force it needs to apply. Importantly, it also takes into account the physics of the robot’s effector as well as that of the object it is trying to grasp. The paper doesn’t really go into detail about how this works, just saying that “predicting the amount of force feedback to expect and the amount of force to apply requires visual observation as well as a causal-physical model of the object” (pg. 149). I would be interested to know what this actually entails… does it estimate the center-of-mass of the mug? Does it actually compute the points it would need to grasp it in order to keep it equilibrium? Or does it only just apply force on the fly as needed?

Algorithm

n/a

Takeaways

Having physical knowledge can greatly improve the ability of AI and robotics systems to reason about the world. There is one particular quote: “As might be expected in this bottom-up attempt at segmentation, image processing takes more than an order of magnitude more time than with SPROCKET’s semantic guidance”. I wonder if this would still be true with today’s object recognition systems.

I do wonder, though, how systems like SPROCKET and MugShot come to possess their physical knowledge. It seems like encoding a series of rules is brittle – what if a rule is not always correct? What if there is a system that can’t be resolved? I suppose that imbuing a system with a physics simulator is also a bit like hard coding rules, but a physics simulator is a bit more general in the sense that it can be applied to any physical situation as long as the initial conditions (e.g. positions and meshes) of the system can be constructed. That’s a pretty big requirement, though. I think any physical reasoning system still has a ways to go before we’ll be able to make it completely general, though perhaps something like deep learning will allow us to construct the appropriate inputs to the simulator, thus resulting in something that is more general.