Abstract
This paper considers the new thrust of Statistical Analysis and Operations Research in the area of so-called Big Data. It considers the general underlying principles of good statistical modelling, particularly from the perspective of Pidd (2009), and how some initiatives in the Big Data area may not have applied these principles correctly. In particular, it notes that demonstrations of applicable techniques, which are purported to be appropriate for Big Data, frequently use data sets of stock market share prices and derivatives because of the huge quantum of high frequency share price data which is available. The paper goes on to critique the frequent use of such historical stock market price data to forecast stock market prices using time series analysis and details the limitations of such practice, suggesting that a large volume of work done towards this end by statisticians and financial analysts should be treated with circumspection. It is seen as unfortunate that a large contingent of extremely able students are directed into areas that encourage time series modelling of stock market data with the promise of forecasting what is essentially unforecastable. The paper also considers which approaches may be appropriately applied to model and understand the process of share price determination, and discusses the contributions of the Nobel-prize winning economists Fama and Shiller.
The paper then concludes by suggesting that the Big Data initiative should be treated with some caution and further echoes the sentiments of Pidd; namely that the focus in Operations Research and Statistics should remain firmly on creative modelling, rather than on the singular pursuit of large amounts of data.
Read full paper here: Kantor 2018 – Big Data