Phil Goddard, one of HACT's Data Scientists, blogs about how promoting consistent data standards across the housing sector could be a real game-changer for effective working.
Over the course of the past few years, HACT has been pioneering the application of data science in the housing sector. Understating the ambition of such a task could hardly be understated, but due to the potential benefits that data driven business decision making can provide, it continues to be a worthy investment of resource.
HACT’s initial ‘Big Data’ project took data from around 20 housing providers, and focussed its investigation into rent arrears. It soon became apparent from the project that the major hurdles involved when pooling data from multiple organisations were data quality and consistency. This result in itself isn’t too surprising; anecdotally, a data scientist will spend 80% of his or her time cleaning a data set. Beyond this, training a predictive model to a well-defined, modestly sized, and clean dataset is a relatively straightforward process (subject to having the know-how and experience!).
Moving forward from this initial project, HACT is currently engaging in an investigation around bottom line costs, and how community insight activities can affect these. The Big Data taught many valuable lessons, and HACT went out armed with a defined data schema, asking for information in a consistent manner. For example- think about how you would define a house. If one housing provider defines a certain asset by the code 3BSD (3-bedroom semi-detached), this is no good if another uses the code HSD3B2F (house, semi-detached, 3 bedroom, two floors). These entries in fact breaking a simple rule of ‘clean’ data- they are storing more than one piece of information in a single field. A more consistent approach might be to ask the housing provider to define the property to be a house, and in a separate field define the number of bedrooms, and in another field define the construction date- and so on.
The value in this exercise is that suddenly we find ourselves in a position where data is consistent, regardless of where it came from. And consistency means data can be pooled together, and sample sizes- and therefore the quality of any quantitative investigation- will benefit.
It doesn’t take much of a stretch of imagination to arrive at the question ‘why is there no defined, consistent way of holding data in the sector’? This would certainly obviate the need to map data to a consistent format before it can be sent off for analysis. Consistent data would allow sharing of data between organisations for sharing, comparison, analysis and benchmarking to become a relatively straightforward task.
There is no real single answer to the cause of the lack of data consistency in the sector. Possibly one of the main underlying causes is that until recent times, the sector has not had the need to be innovative and investigate the value that data insight can offer. Now, in times of increasingly constrained budgets, interest in locating areas where savings can be made has peaked, but unfortunately there is a legacy of slack, inaccurate record keeping presenting itself as a major barrier to be overcome.
The value of a consistent data standard to be adopted across the sector cannot be overstated. Attempts have been made before, such as the National Register of Social Housing, but the sector as a whole needs to embrace and push for adopting standards and consistency. This task in itself would not come without challenges- what data should be collected, and what shouldn’t? How should it be stored? Where are the gaps where valuable data is missed, and where are we wasting memory by storing redundant information? What are we collecting which is of interest, but so poorly and inconsistently that the value of it becomes is eroded?
Clearly, key areas of interest are around assets and repairs, and rent arrears- two of the largest sources of expenditure housing providers encounter. Therefore, they are two candidate areas to where data standards would be of immense value. Consistent data would allow simple communication between housing providers; for example knowledge of stock locations would could lead to stock swaps to allow more efficient management. Knowledge of repair and maintenance costs could pave the way to smart benchmarking, and lead to more informed decisions about stock which should be sold to raise capital for new developments. Uniform definitions of rent arrears and tenancy demographics could naturally form the basis of a predictive model which could determine the risk of new tenancies falling into rent arrears.
Consistent data and standards could pave the way towards significant savings and streamlining of businesses through the ease of access to data insight that will be made available. What is needed is a sector wide push and commitment; only through united resolve and dedication will such a standard ever come to light.