SinFusion: Training Diffusion Models on a Single Image or Video

Supplementary Material

Nearest-neighbour field (NNF) comparison (SinFusion vs VGPNN [1])

As explained in the paper, our video generation method doesn't simply copy spatio-temporal chunks, but generated never-seen-before frames according to the main motions in the video.
This can be seen in the following video depiction of Fig. 9.
Each row contains four videos. From left to right -
VGPNN [1] Generated video - A video generated by VGPNN. The generation copies large spatio-temporal chunks as-is from the input video.
VGPNN NNF color map - The NNF map corresponding to the VGPNN video. Every large uniformly-colored chunk is copied as-is from the original video.
SinFusion Generated video - A video generated by SinFusion (ours). Our video is more diverse and doesn't simply copy chunks from the input video.
SinFusion NNF color map - The NNF map corresponding to our generated video. The varied color map represents diverse directions to nearest-neighbour patches, indicating that our method doesn't copy large existing chunks from the single input video.

VGPNN
Generated Video, NNF Map

SinFusion
Generated Video, NNF Map


Back to Top

Projector Ablation

Here we compare several videos generated by our model with and without the usage of the Projector model.
This shows the importance of the Projector model in removing of small artifacts generated by our auto-regressive Predictor model.
Notice how the videos that were generated without the Projector (right column) slowly accumulate visual artifacts and decay to poor quality.

Input Video
Generation example
Predictor & Projector

Generation example
No Projector (Only Predictor)


Back to Top

Comparison to VDM

As described in the paper, we show a basic qualitative comparison between our single-video DDPM to a VDM [2] trained on a single video.
Top: generated videos using our method.
Bottom: generated videos using VDM [2].
For further explanation, please see discussion in the supplementary material details file.

VDM Generation samples
VDM Generation samples
VDM Generation samples

Back to Top


Relevant references:
[1] Haim, N., Feinstein, B., Granot, N., Shocher, A., Bagon, S., Dekel, T., & Irani, M. (2022). Diverse Video Generation from a Single Video. arXiv preprint arXiv:2205.05725.‏
[2] Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., & Fleet, D. J. (2022). Video diffusion models. arXiv preprint arXiv:2204.03458.‏