The model fuses visual features (frames) with other available data to determine what content is most "important".
Unlike standard summarization, these videos are categorized by specific topics, allowing for multiple summaries of the same video depending on the user's interest. Dataset Context h336305.mp4
Each video file, such as h336305.mp4, is annotated with scores that rank individual frames based on how well they represent a specific topic. The model fuses visual features (frames) with other