<div>‘Big Data’ is typically noted to contain undesirable imperfections that are usually described using terminology such as ‘messy’, ‘untidy’ or ‘ragged’ requiring ‘cleaning’ as preparation for analysis. Once the data has been cleaned, a vast amount of literature exists exploring how best to proceed. The use of this pejorative terminology implies that it is imperfect data hindering analysis, rather than recognising that the encapsulated knowledge is presented in an inconvenient state for the chosen analytical tools, which in turn leads to a presumption about the unsuitability of desktop computers for this task. As there is no universally accepted definition of ‘Big Data’ this inconvenient starting state is described here as ‘nascent data’ as it carries no baggage associated with popular usage. This leads to the primary research question: Can an empirical theory of the knowledge extraction process be developed that guides the creation of tools that gather, transform and analyse nascent data? A secondary pragmatic question follows naturally from the first: Will data stakeholders use these tools?</div>
Funding
DTP 2018-19 Loughborough University
Engineering and Physical Sciences Research Council