GPU-Trained System Understands Movies

Researchers from Karlsruhe Institute of Tech, MIT and University of Toronto published MovieQA, a dataset that contains 7702 reasoning questions and answers from 294 movies. Their innovative dataset and accuracy metrics provide a well-defined challenge for question/answer machine learning algorithms.

The questions range from simpler ‘Who’ did ‘What’ to ‘Whom’ that can be solved by computer vision alone, to ‘Why’ and ‘How’ something happened in the movie, questions that can only be solved by exploiting both the visual information and dialogs.

MovieQA
Examples from the MovieQA dataset. For illustration, it shows a single frame, but all of the questions/answers are time-stamped to a much longer clip in the movie. Notice that while some questions can be answered using vision or dialogs alone, most require both. Vision can be used to locate the scene set by the question, and semantics extracted from dialogs can be used to answer

MovieQA is unique in that it contains multiple sources of information – full-length movies, plot synopses, subtitles, scripts and DVS (a service that narrates moves scenes to the visually impaired).

With the need to scale to large vocabulary data sets, they relied on a TITAN Black GPU for their overwhelming amount of training data.

In early 2016, the researchers plan to create an online benchmark that will have 15,000 questions and 75,000 answers which will encourage other to contribute.

Read the research paper >>

About Brad Nemire

Brad Nemire
Brad Nemire is on the Developer Marketing team and loves reading about all of the fascinating research being done by developers using NVIDIA GPUs. Reach out to Brad on Twitter @BradNemire and let him know how you’re using GPUs to accelerate your research. Brad graduated from San Diego State University and currently resides in San Jose, CA. Follow @BradNemire on Twitter