In a recent blog post, Google detailed how it has been working on depth perception in videos where both the camera and subject are moving. As a starting point, the study needed access to a vast amount of data to train the AI, and the first logical step was training it to detect people in a scene where the camera was moving but the people were static.

As it turns out, Google had the perfect resource for this data in the form of YouTube videos that were filmed for the Mannequin Challenge. In this challenge, a person or group of people would stand completely still as a camera panned around their position. Google used 2000 videos from the challenge to help train its AI to detect human figures in a variety of different scenes.

Something that makes this study even more interesting is the fact that Google is teaching its AI to create depth maps using footage that has been shot using only one camera. Most times, multiple cameras must be used to sense depth information in a scene.