Unsupervised Text Mining Techniques for Forecasting Crude Oil

  • While it has been shown that news articles can influence the rationality of investors' decisions, the effect that news may have on commodity prices such as crude oil is uncertain. I explored Natural Language Processing (NLP) techniques to extract textual features from news articles and then constructed a "horse-race" among economic and tree-based machine learning methods to forecast weekly crude oil prices. I obtained two types of textual features, latent topics and sentiment probabilities, using two state-of-the-art NLP models: Latent Dirichlet Allocation (LDA) and a pre-trained version of Bidirectional Encoder Representations from Transformers (BERT) on a financial corpus. This paper introduced a novel forecasting strategy to calculate the out-of-sample (OoS) performance metrics of competing models. The evidence I found shows that textual features can improve forecasts of oil prices, however, textual features from news on their own are not sufficient for high forecasting accuracy.

  2022


