Inverse Visual Question Answering with Multi-Level Attentions

Public Deposited
Resource Type
  • Inverse Visual Question Answering (iVQA) is a contemporary task emerged from the need for improving visual and language understanding. It tackles the challenging problem of generating a corresponding question for a given image-answer pair. Current state-of-the-art iVQA models use the conventional way of representing images by using a convolutional neural network (CNN) to extract visual features. Although some models leverage semantic concepts as an enhancement for the answer cue, they give the same importance weights to these concepts without considering their correlation with the answers. Moreover, the existing iVQA models mainly rely on the conventional recurrent neural networks for question modelling. Nevertheless, the attention-based sequence learning mechanism for question modelling which could help to reduce model parameters remains unexplored. In this research, we address these issues by developing two novel deep multilevel attention models for the task of inverse visual question answering.

Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Rights Notes
  • Copyright © 2019 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.
Date Created
  • 2019


In Collection: