Conference paper Open Access
Apostolidis, Evlampios; Adamantidou, Eleni; Metsai, Alexandros; Mezaris, Vasileios; Patras, Ioannis
This paper presents a new video summarization approach that integrates an attention mechanism to identify the significant parts of the video, and is trained unsupervisingly via generative adversarial learning. Starting from the SUM-GAN model, we rst develop an improved version of it (called SUM-GAN-sl) that has a significantly reduced number of learned parameters, performs incremental training of the model's components, and applies a stepwise label-based strategy for updating the adversarial part. Subsequently, we introduce an attention mechanism to SUM-GAN-sl in two ways: i) by integrating an attention layer within the variational auto-encoder (VAE) of the architecture (SUM-GAN-VAAE), and ii) by replacing the VAE with a deterministic attention auto-encoder (SUM-GAN-AAE). Experimental evaluation on two datasets (SumMe and TVSum) documents the contribution of the attention auto-encoder to faster and more stable training of the model, resulting in a signicant performance improvement with respect to the original model and demonstrating the competitiveness of the proposed SUM-GAN-AAE against the state of the art. Software is publicly available at: https://github.com/e-apostolidis/SUM-GAN-AAE
Name | Size | |
---|---|---|
mmm2020_lncs11961_1_preprint.pdf
md5:beaa35b2015dcdd1b1c3f0c86dcc9199 |
978.8 kB | Download |
Views | 1,066 |
Downloads | 221 |
Data volume | 216.3 MB |
Unique views | 1,049 |
Unique downloads | 208 |