movie trailer, in terms of time and eort reduction. We evaluated
the quality of our AI trailer with an extensive user study. Our AI
trailer has been viewed around 3M times on YouTube. Finally, we
explored applications of multimedia technology to another new
creative paradigm, tropes that is commonly used in movies. This
research investigation is the rst of many into what we hope will
be a very promising area of machine and human creativity espe-
cially in the arena of creative lm editing. We’re very excited about
pushing the possibilities of how AI can augment the expertise and
creativity of individuals.
ACKNOWLEDGMENTS
The authors would like to thank 20
t h
Century Fox for this great
collaboration that lead to creation of the world’s rst joint human
and machine made trailer for a full length feature lm Morgan.
REFERENCES
[1]
Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining Association
Rules Between Sets of Items in Large Databases. In Procee dings of the 1993 ACM
SIGMOD International Conference on Management of Data (SIGMOD ’93). ACM,
New York, NY, USA, 207–216.
[2]
George Awad, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, Alan F.
Smeaton, Georges Quénot, Maria Eskevich, Robin Aly, Gareth J. F. Jones, Roeland
Ordelman, Benoit Huet, and Martha Larson. 2016. TRECVID 2016: Evaluating
Video Search, Video Event Detection, Localization, and Hyperlinking. In Pro-
ceedings of TRECVID 2016. NIST, USA.
[3]
Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013.
Large-scale visual sentiment ontology and detectors using adjective noun pairs.
In Proc. ACM Multimedia. ACM, 223–232.
[4]
Felix Burkhardt, Astrid Paeschke, M. Rolfes, Walter F. Sendlmeier, and Benjamin
Weiss. 2005. A database of German emotional speech.. In INTERSPEECH. ISCA,
1517–1520.
[5]
Shizhe Chen and Qin Jin. 2016. RUC at MediaEval 2016 Emotional Impact of
Movies Task: Fusion of Multimodal Features. In Working Notes Proceedings of the
MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org.
[6]
Tao Chen, Damian Borth, Trevor Darrell, and Shih-Fu Chang. 2014. DeepSen-
tiBank: Visual Sentiment Concept Classication with Deep Convolutional Neural
Networks. CoRR abs/1410.8586 (2014). http://arxiv.org/abs/1410.8586
[7]
David Crookes. 2011. The Science of the Trailer. (Aug 2011). http://www.
independent.co.uk
[8]
Ellen Douglas-Cowie, Roddy Cowie, Ian Sneddon, Cate Cox, Orla Lowry, Mar-
garet McRorie, Jean-Claude Martin, Laurence Devillers, Sarkis Abrilian, Anton
Batliner, Noam Amir, and Kostas Karpouzis. 2007. The HUMAINE Database:
Addressing the Collection and Annotation of Naturalistic and Induced Emotional
Data.. In ACII (2007-09-05) (Lecture Notes in Computer Science), Ana Paiva, Rui
Prada, and Rosalind W. Picard (Eds.), Vol. 4738. Springer, 488–500.
[9]
Florian Eyben, Felix Weninger, Florian Gross, and Björn Schuller. 2013. Recent de-
velopments in opensmile, the munich open-source multimedia feature extractor.
In Proc. ACM Multimedia. ACM, 835–838.
[10]
Florian Eyben, Martin Wöllmer, and Björn Schuller. 2009. OpenEAR; Introducing
the munich open-source emotion and aect recognition toolkit. In 2009 3rd
International Conference on Aective Computing and Intelligent Interaction and
Workshops. 1–6.
[11]
Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: the munich
versatile and fast open-source audio feature extractor. In Proc. ACM Multimedia.
ACM, 1459–1462.
[12]
Stephen Garrett. 2012. The Art of First Impressions: How to Cut a Movie Trailer.
(Jan 2012). http://lmmakermagazine.com
[13]
Michael Grimm, Kristian Kroschel, and Shrikanth Narayanan. 2007. Support Vec-
tor Regression for Automatic Recognition of Spontaneous Emotions in Speech..
In ICASSP (4). IEEE, 1085–1088.
[14]
Benoit Huet and Bernard Merialdo. 2006. Automatic Video Summarization.
Springer Berlin Heidelberg, Berlin, Heidelberg, 27–42.
[15]
Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang-Tuan Luong, James Z.
Wang, Jia Li, and Jiebo Luo. 2011. Aesthetics and Emotions in Images: A Compu-
tational Perspective. IEEE Signal Processing Magazine 28, 5 (2011), 94–115.
[16]
Yoshihiko Kawai, Hideki Sumiyoshi, and Nobuyuki Yagi. 2007. Automated
production of TV program trailer using electronic program guide. In CIVR.
[17]
Youngmoo E Kim, Erik M Schmidt, Raymond Migneco, Brandon G Morton,
Patrick Richardson, Jerey Scott, Jacquelin A Speck, and Douglas Turnbull. 2010.
Music emotion recognition: A state of the art review. In Proc. ISMIR. 255–266.
[18]
David Kirby. 2016. The Role Of Social Media In Film Marketing. (June 2016).
www.hungtonpost.com
[19]
Petros Koutras, Athanasia Zlatintsi, Elias Iosif, Athanasios Katsamanis, Petros
Maragos, and Alexandros Potamianos. 2015. Predicting audio-visual salient
events based on visual, audio and text modalities for movie summarization. In
Proc. ICIP. IEEE, 4361–4365.
[20]
Peter J Lang, Margaret M Bradley, and Bruce N Cuthbert. 1997. International
aective picture system (IAPS): Technical manual and aective ratings. NIMH
Center for the Study of Emotion and Attention (1997), 39–58.
[21]
Rainer Lienhart, Silvia Pfeier, and Wolfgang Eelsberg. 1997. Video Abstracting.
Commun. ACM 40, 12 (Dec. 1997), 54–62.
[22]
D. Lin, S. Fidler, C. Kong, and R. Urtasun. 2014. Visual Semantic Search: Retrieving
Videos via Complex Textual Queries. In 2014 IEEE Conference on Computer Vision
and Pattern Recognition. 2657–2664.
[23]
Ye Ma, Zipeng Ye, and Mingxing Xu. 2016. THU-HCSI at MediaEval 2016:
Emotional Impact of Movies Task. In Working Notes Proceedings of the MediaEval
2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org.
[24]
Olivier Martin, Irene Kotsia, Benoit M. Macq, and Ioannis Pitas. 2006. The
eNTERFACE’05 Audio-Visual Emotion Database.. In ICDE Workshops, Roger S.
Barga and Xiaofang Zhou (Eds.). IEEE Computer Society, 8.
[25]
Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon
Shlens, Andrea Frome, Greg Corrado, and Jerey Dean. 2014. Zero-Shot Learning
by Convex Combination of Semantic Embeddings. International Conference on
Learning Representations (ICLR) (2014).
[26]
Soujanya Poria, Erik Cambria, Newton Howard, Guang-Bin Huang, and Amir
Hussain. 2016. Fusing audio, visual and textual clues for sentiment analysis from
multimodal content. Neurocomputing 174 (2016), 50–59.
[27]
Anna Rohrbach, Marcus Rohrbach, Niket Tandon, and Bernt Schiele. 2015. A
Dataset for Movie Description. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
[28]
BjÃűrn Schuller, Dejan Arsic, Gerhard Rigoll, Matthias Wimmer, and Bernd
Radig. 2007. Audiovisual Behavior Modeling by Combined Feature Spaces.. In
ICASSP (2). IEEE, 733–736.
[29]
Björn Schuller, Gerhard Rigoll, and Manfred Lang. 2003. Hidden Markov model-
based speech emotion recognition. In Multimedia and Expo, 2003. ICME’03. Pro-
ceedings. 2003 International Conference on, Vol. 1. IEEE, I–401.
[30]
BjÃűrn W. Schuller, Ronald MÃijller, Florian Eyben, JÃijrgen Gast, Benedikt
HÃűrnler, Martin WÃűllmer, Gerhard Rigoll, Anja HÃűthker, and Hitoshi Konosu.
2009. Being bored? Recognising natural interest by extensive audiovisual inte-
gration for real-life application. Image Vision Comput. 27, 12 (2009), 1760–1774.
[31]
Nicu Sebe, Ira Cohen, Theo Gevers, and Thomas S Huang. 2006. Emotion
recognition based on joint visual and audio cues. In Proc. ICPR, Vol. 1. IEEE,
1136–1139.
[32]
Mats Sjöberg, Yoann Baveye, Hanli Wang, Vu Lam Quang, Bogdan Ionescu,
Emmanuel Dellandréa, Markus Schedl, Claire-Hélène Demarty, and Liming Chen.
2015. The MediaEval 2015 Aective Impact of Movies Task.. In MediaEval.
[33]
Alan F. Smeaton, Bart Lehane, Noel E. O’Connor, Conor Brady, and Gary Craig.
2006. Automatically Selecting Shots for Action Movie Trailers. In Proceedings of
the 8th ACM International Workshop on Multimedia Information Retrieval (MIR
’06). ACM, New York, NY, USA, 231–238.
[34]
Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel
Urtasun, and Sanja Fidler. 2016. MovieQA: Understanding Stories in Movies
through Question-Answering. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
[35]
Ba Tu Truong and Svetha Venkatesh. 2007. Video Abstraction: A Systematic
Review and Classication. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1,
Article 3 (Feb. 2007).
[36]
Michel Valstar, Björn Schuller, Kirsty Smith, Florian Eyben, Bihan Jiang, Sanjay
Bilakhia, Sebastian Schnieder, Roddy Cowie, and Maja Pantic. 2013. AVEC
2013: the continuous audio/visual emotion and depression recognition challenge.
In Proceedings of the 3rd ACM international workshop on Audio/visual emotion
challenge. ACM, 3–10.
[37]
Victoria Yanulevskaya, Jan C van Gemert, Katharina Roth, Ann-Katrin Herbold,
Nicu Sebe, and Jan-Mark Geusebroek. 2008. Emotional valence categorization
using holistic image features. In Proc. ICIP. IEEE, 101–104.
[38]
Zhihong Zeng, Maja Pantic, Glenn I Roisman, and Thomas S Huang. 2009. A
survey of aect recognition methods: Audio, visual, and spontaneous expressions.
IEEE Trans. on Pattern Analysis and Machine Intelligence 31, 1 (2009), 39–58.
[39]
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva.
2014. Learning Deep Features for Scene Recognition using Places Database. In
Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling,
C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). 487–495.
[40]
Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun,
Antonio Torralba, and Sanja Fidler. 2015. Aligning Books and Movies: Towards
Story-Like Visual Explanations by Watching Movies and Reading Books. In
Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)
(ICCV ’15). 19–27.