Forward-looking: Nvidia has developed a technique that uses neural networks to create smooth slow-motion video from standard footage. Variable-length multi-frame interpolation uses machine learning to "hallucinate" transitions between frames of film then inserts these artificially created images between them to seamlessly slow down the final footage.

I’m not sure why, but people just love to watch slow-motion videos. In fact, it is so popular that Gavin Free and Dan Gruchy have a YouTube channel wholly devoted to the subject called The Slow Mo Guys that has almost 1.5 billion views and over 11 million subscribers. Free saw a niche to be filled since creating slow-motion video is not practical for most people. Aside from the equipment being extremely expensive, with footage shot at over 300,000 fps storage quickly becomes a problem.

Filters exist that convert regular video to slow motion, but the result is somewhat choppy since it just intersperses duplicate frames to elongate the footage. However, Nvidia researchers think they have developed a way to create slow-motion video that is even smoother than those taken with high-speed cameras like the ones that Free and Gruchy use on their channel.

According to VentureBeat, “Scientists from Nvidia, the University of Massachusetts Amherst, and the University of California, Merced engineered an unsupervised, end-to-end neural network that can generate an arbitrary number of intermediate frames to create smooth slow-motion footage.”

The technique has been dubbed “variable-length multi-frame interpolation,” and it uses machine learning to fill in the gaps between frames of a video to create smooth-running, slow-motion versions.

“You can slow it down by a factor of eight or 15 — there’s no upper limit,” said Nvidia’s Senior Director of Visual Computing and Machine Learning Research Jan Kautz.

The technique uses two convolutional neural networks (CNN) in tandem. The first makes both forward and backward estimations of the optical flow in the timeline between frames. It then generates what is called a “flow field,” which is a 2D vector of predicted motion to be inserted between the frames.

“A second CNN then interpolates the optical flow, refining the approximated flow field and predicting visibility maps in order to exclude pixels occluded by objects in the frame and subsequently reduce artifacts in and around objects in motion. Finally, the visibility map is applied to the two input images, and the intermediate optical flow field is used to warp (distort) them in such a way that one frame transitions smoothly to the next.”

The results are remarkable as you can see in the video above. Even video taken at 300K fps by the Slow Mo Guys was slowed down even further and looks even smoother than the original.

The technique uses Nvidia Tesla V100 GPUs and a cuDNN-accelerated PyTorch deep learning framework. As such, don’t expect to see a commercial version being released anytime soon.

According to Kautz, the system needs a lot of optimization before they can get it running in real time. He also says that even when it does get commercialized, most of the processing will have to be done in the cloud due to hardware limitations in the devices where the filter would likely be used.

If you are into the technical details, the team has a paper outlining it at the Cornell University Library.