A team of researchers from Microsoft Research were able to modify a regular smartphone camera or webcam to capture – Kinect-like – depth, with the help of simple hardware changes and machine learning techniques.
The team modified the conventional monocular 2D camera by removing the near infrared filter which is used to block unwanted light signals in pictures. Then they made the ordinary camera to act as an infrared camera by adding a filter that only allowed infrared light through, along with a ring of several cheap near-infrared LEDs. They used hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time.
Microsoft claims that experiments were so accurate that they outperformed a conventional light fall-off baseline, and is comparable to high-quality consumer depth cameras, but with a dramatically reduced cost, power consumption, and form-factor.
Modified devices were successfully able to track the depth of user’s hands and face.
Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time, says Microsoft.
The research team led by Sean Ryan Fanello, Cem Keskin, and Shahram Izadi will present a paper on the work Tuesday at Siggraph, a computer graphics and interaction conference in Vancouver, British Columbia.