[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Videos with various length and fps #741

Open
kkjh0723 opened this issue Apr 4, 2019 · 14 comments
Open

Videos with various length and fps #741

kkjh0723 opened this issue Apr 4, 2019 · 14 comments
Labels
question Further information is requested Video Video related feature/question

Comments

@kkjh0723
Copy link
kkjh0723 commented Apr 4, 2019

Hi,
Thanks for the nice library. I found DALI while looking for a video loader for action recognition. I found that DALI yet cannot handle various resolution as in the issue #725 which is necessary for public dataset such as Kinetics.

Another necessary component might be processing videos with various length and fps.
It seems VideoReader only support extract whole video into batch of sequence_length of sequences. I'm not sure because I've just tested video_test.py only.

  1. I wonder if it is possible to randomly extract one short "clip" (sequence of length sequence_length and step) from one video. It seems that this way is commonly used in training phase of Kinetics dataset.
    For evaluation, people often extract several clips along the whole video with equal interval.
  2. Additionally it will be nice if we can set the fps of all video same since videos vary in fps. I often use ffmpeg with fps filter when I extract the frames manually.

Hopefully, those process can be possible already or do you have any plan to support those features?

@JanuszL JanuszL added the question Further information is requested label Apr 8, 2019
@JanuszL
Copy link
Contributor
JanuszL commented Apr 8, 2019

Hi,
Currently, it is not possible to do all the things you requested (am I right @Kh4L?). Nevertheless, we are putting them in our backlog.
Do you have any particular dataset and network in mind you are using and you want to plug DALI inside
Tracked as DALI-694 and DALI-695.

@Kh4L
Copy link
Contributor
Kh4L commented Apr 8, 2019

Hi @kkjh0723 ,

If I understand correctly, these are the feature you are asking for:

  • Support of video files of different spatial dims (Height Width). This is currently not supported but in our backlog with higher priority.
  • Various length support: this is currently not support but in our backlog.
  • FPS: currently we are extracting all the frames frames from the video. Are you asking for an option where you could decide to get only a subset of the frames, with a given stride (like every n-th frame)?

@kkjh0723
Copy link
Author
kkjh0723 commented Apr 8, 2019

Thanks, @JanuszL,
I want to use DALI with Pytorch for Kinetics and AVA dataset. I want to follow the pre-processing of Non-local network and Slowfast network. Non-local network code is opened but written in Caffe2 (link). It seems they directly read video (from lmdb) and preprocess on GPUs.

Currently, I'm using the 3D-Resnet based code. For data loading, they extract the frames from videos and save into jpeg file. Then load and process frames using multiple workers as Pytorch dataloader.
I faced a problem when the data saved into the NFS and multiple training programs try to access the data. The GPU utility drops down from around 70% to 20% . I guess that reading thousands of small jpeg file cause the I/O bottleneck. I thought video reader can be helpful (it will be better to use database like lmdb also though).

Thanks @Kh4L ,
Exactly for the first two.
As for the FPS, what I was considered is,
when the video's fps is larger than the set fps, then same as what you said. But in case the video's fps is lower than the set fps, then duplicate the frames just like in the ffmpeg's fps filter.
I'm not sure it is typical setting used in video research, but it seems reasonable to me.

@JanuszL JanuszL added the Video Video related feature/question label Apr 9, 2019
@aBlueDragon
Copy link

I am also looking for a feature that can support loading videos with different length. I want to mention that an alternative way of dealing with these kind of data is to load full videos in a mini-batch and then pad the shorter ones to the longest video length. This is also very common in video tasks such as captioning. It would be very appreciated if one day this can be supported in DALI.

@JanuszL
Copy link
Contributor
JanuszL commented May 30, 2019

@aBlueDragon - what kind of padding do you have in mind, dummy, replication of the last frame, something else?
Also by loading a full video, you mean to create a sequence with all frames from this file?

@aBlueDragon
Copy link

@JanuszL Padding zeros would be fine, but the dataloader has to return the actually length of each sample so that we know the real length of the padded videos.
Yes, loading full video means loading all frames from the file, which is generally within 300 frames.

@JanuszL
Copy link
Contributor
JanuszL commented May 30, 2019

@aBlueDragon - thanks for the explanation.

@frankrao
Copy link

vote for various length.

We could encode videos to the same resolution and the same fps, but NO way to the same length.
I don't know any situation with fixed length.

@mzolfaghari
Copy link
mzolfaghari commented Aug 26, 2019

@raofengyun Usually you need sample N frames from the video by one of the following approaches:

  • Splitting the video into fixed-length clips (N frames) (last clip will be padded by the last frame to fix the size). Each time you pass one clip to the network. C3D and two-stream (K Simonyan - ‎2014) used this method.
  • Sampling N frames randomly from the entire video. In this case, each video splitted into N segments and from each segment one frame is selected randomly. This technique is used in some recent papers like: TSN, ECO, and TSM
  • Sampling N frames from a time interval [start_frame/time, end_frame/time]: This is used in instructional videos or sequential actions which each video has multiple consecutive actions.

@JanuszL I think adding these three sampling techniques to your framework would be quite helpful for video understanding community.
Sample datasets:

  • Kinetics
  • UCF101
  • EpicKitchen
  • SomethingSomething
  • YouCook2

@cvnovice95
Copy link

@mzolfaghari I agree your idea, for action recognition, we need the different frame sampler.

@JanuszL
Copy link
Contributor
JanuszL commented May 6, 2020

Hi,
Some updates on the request.

  • videos with the different resolutions are supported
  • videos with different lengths are supported (as long as the length is bigger than the requested sequence length - no padding yet)
  • using file_list argument (like in this example) you can specify allowed ranges of senescence for each video file

@huangjun12
Copy link

@raofengyun Usually you need sample N frames from the video by one of the following approaches:

  • Splitting the video into fixed-length clips (N frames) (last clip will be padded by the last frame to fix the size). Each time you pass one clip to the network. C3D and two-stream (K Simonyan - ‎2014) used this method.
  • Sampling N frames randomly from the entire video. In this case, each video splitted into N segments and from each segment one frame is selected randomly. This technique is used in some recent papers like: TSN, ECO, and TSM
  • Sampling N frames from a time interval [start_frame/time, end_frame/time]: This is used in instructional videos or sequential actions which each video has multiple consecutive actions.

@JanuszL I think adding these three sampling techniques to your framework would be quite helpful for video understanding community.
Sample datasets:

  • Kinetics
  • UCF101
  • EpicKitchen
  • SomethingSomething
  • YouCook2

the second one implemented in https://github.com/SunGaofeng/DALI
use case please refer to https://github.com/PaddlePaddle/models/tree/release/1.8/PaddleCV/video/models/tsn

@JanuszL
Copy link
Contributor
JanuszL commented Jul 14, 2020

@huangjun12 - if you think that is useful for the rest of the community you can fill a PR using the code from https://github.com/SunGaofeng/DALI.

@SunGaofeng
Copy link

@huangjun12 - if you think that is useful for the rest of the community you can fill a PR using the code from https://github.com/SunGaofeng/DALI.

The VideoReader for TSN model is developed by modifying the codes of the original DALI repo. I did this in such a hurry that few thoughts was considered on how to be compatible with the origin code and how to fit more models. Maybe @huangjun12 can put more time on this subject to fit for more classical models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Video Video related feature/question
Projects
Video support
  
ToDo
Development

No branches or pull requests

9 participants