Object Detection and Tracking for Creation of Interactive Videos

Public Deposited
Resource Type
  • Object detection is a fundamental approach in creating interactive videos. In this thesis, we propose a new method for object detection, combining object recognition with tracking in a neural network. Specifically, we use GoogLeNet as a feature extractor, and then apply a long short-term memory (LSTM) network to further adjust the feature vectors extracted by GoogLeNet according to the context of the feature vectors of the previous frame. We feed the output of the LSTM to a classifier and regressor as in the Overfeat network, to obtain predicted confidences and predicted bounding boxes. We pre-train the feature extractor on ImageNet datasets, then evaluate our network on OTB100 dataset. We compare our results to results obtained without tracking. Our model shows a better performance at predicting objects in frames where occlusion and background clutter appear, and results in more consistent object bounding boxes across frames.

Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Rights Notes
  • Copyright © 2018 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.
Date Created
  • 2018


In Collection: