In part two of this short introductory series Jim Vine explains how Big Data might be analysed in the housing sector.
In the first part of this series I tried to give an indication of how you might know whether your data are Big (with a capital “B”) or not. But of course identifying or obtaining some data that are Big is only the first part of the story. To get from Big Data to something usable you need to do some analysis to generate insights from the data, and to get to real outcomes you need to use those insights to inform your decision making or optimise your processes.
The process of analysing Big Data goes by several names - machine learning, data science and data mining among them. I will not dwell on the distinctions between the different perspectives on what each of these entails, but will instead draw on the way each of these emerging disciplines thinks, to flag up some of the key issues that could arise in any a Big Data project.
The first lesson comes from the Data Science community, about the broad range of skills that are needed to be successful in the field. Drew Conway produced this diagram of the skills that are needed for a successful data science project:
The Data Science Venn Diagram (source: Drew Conway)
For the sorts of Big Data analysis we are talking about you certainly do need the statistical skills, some coding ability and subject knowledge. But I think we could go further, adding to those at least visualisation / communications skills, because the best insights in the world will not make a jot of difference if they are not communicated to colleagues in a way that is compelling and actionable. And I would also add legal / procedural skills, because – quite rightly – Data Protection and the security of people's information is an absolute legal pre-requisite.
The next lesson comes from Data Mining, where there is actually a defined standard called 'CRISP-DM', which describes the various stages of the data mining process and – crucially – has these arrows that go backwards, to illustrate that the process is iterative.
The phases of a CRISP-DM data mining process (source: Wikimedia Commons)
I would say that, if anything, what is missing in the CRISP-DM process diagram is a few more backwards arrows, because actually the iteration might need to happen at more or less any point in the process.
The other major school of thought that Housing Big Data draws upon is machine learning. In the next part of this series I will introduce the broad types of machine learning analysis that can be used to generate insights. In the meantime, do please feel free to get in touch if you would like to be considered for inclusion when we open the project up to wider participation: email firstname.lastname@example.org.
> Next: Part 3: Machine learning
HACT’s Housing Big Data project has been generously supported by the Nominet Trust, the UK’s only dedicated Tech for Good funder that invests in the use of technology to transform the way we address social challenges.