Data collection surrounds our modern lives. By 2020 there will be an estimated 30 billion devices collecting data around the world and the volume, structure and scale of them poses one of the biggest challenges for modern statistics. It may be smart phones packed full of sensors measuring your health, location and the weather or connected cars driving past roadside monitors measuring traffic volumes or air quality.


Consider also the thousands of sensors on individual oil wells, taking measurements to maintain safety and security.  In the world of ‘big data’, massive streams of information are now being generated and collected at an unprecedented scale. Being able to interpret and take advantage of all of this data will lead to great economic and societal benefits – providing advances in areas such as e-health and communications and enabling more of us to lead healthier and more productive lives. At present, the tools available to do this do not allow us to analyse the quality of what we need in an appropriate time scale.  


‘StatScale: Statistical Scalability for Streaming Data’ is a £3.4m programme, developing new ways to interpret this being generated continuously all around us. A collaboration between both Lancaster University and University of Cambridge, the six year programme involves research with industrial partners including AstraZeneca, BT and Shell UK, and includes £2.8m funding from EPSRC. StatScale benefits from its significant partnerships with industry and these collaborations help inform StatScale’s research agenda and support the research to make a direct economic and societal benefit. 

Our industrial partners and the Office for National Statistics have agreed to trial the new methods that emerge from the programme, so that they can be rapidly tested and refined in real-world situations. StatScale’s theoretical and methodological work will form the basis of the next generation of scalable statistical algorithms. This is essential to ensure the UK’s competitive edge in science and technological industries. From smart watches to instrumented oil fields, ScatScale will have individual, national and global effects. 

The programme is led by Professors Idris Eckley and Paul Fearnhead from Lancaster University, and Professor Richard Samworth  and Dr Rajen Shah at the University of Cambridge. 

More information about Lancaster University’s Data Science Institute 

More information about the the University of Cambridge's Statistical Laboratory