posted on 2019-09-17, 08:57authored byChris Brooks, Andreas Hoepner, Dave McMillan, Andrew VivianAndrew Vivian, Chardin Wese Simen
Financial data science and econometrics are highly complementary. They share an equivalent research process with the former’s intellectual point of departure being statistical inference and the latter’s being the data sets themselves. Two challenges arise, however, from digitalisation. First, the ever-increasing computational power allows researchers to experiment with an extremely large number of generated test subjects (i.e. p-hacking). We argue that p-hacking can be mitigated through adjustments for multiple hypothesis testing where appropriate. However, it can only truly be addressed via a strong focus on integrity (e.g. pre-registration, actual out-of-sample periods). Second, the extremely large number of observations available in big data set provides magnitudes of statistical power at which common statistical significance levels are barely relevant. This challenge can be addressed twofold. First, researchers can use more stringent statistical significance levels such as 0.1% and 0.5% instead of 1% and 5%, respectively. Second, and more importantly, researchers can use criteria such as economic significance, economic relevance and statistical relevance to assess the robustness of statistically significant coefficients. Especially statistical relevance seems crucial, as it appears far from impossible for an individual coefficient to be considered statistically significant when its actual statistical relevance (i.e. incremental explanatory power) is extremely small.
This is an Accepted Manuscript of an article published by Taylor & Francis in The European Journal of Finance on 17 September 2019, available online: http://www.tandfonline.com/10.1080/1351847X.2019.1662822.