Unsupervised Text Mining Techniques for Forecasting Crude Oil

Public Deposited
Resource Type
  • While it has been shown that news articles can influence the rationality of investors' decisions, the effect that news may have on commodity prices such as crude oil is uncertain. I explored Natural Language Processing (NLP) techniques to extract textual features from news articles and then constructed a "horse-race" among economic and tree-based machine learning methods to forecast weekly crude oil prices. I obtained two types of textual features, latent topics and sentiment probabilities, using two state-of-the-art NLP models: Latent Dirichlet Allocation (LDA) and a pre-trained version of Bidirectional Encoder Representations from Transformers (BERT) on a financial corpus. This paper introduced a novel forecasting strategy to calculate the out-of-sample (OoS) performance metrics of competing models. The evidence I found shows that textual features can improve forecasts of oil prices, however, textual features from news on their own are not sufficient for high forecasting accuracy.

Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Rights Notes
  • Copyright © 2022 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.
Date Created
  • 2022


In Collection: