Zoopla's journey of improving Property Data by using UPRNs
Zoopla has long been recognised as the place where you go to find out details about a home's value. We've prided ourselves on having the best automated estimates and being able to tell a story about listings and sale history. It's this data - that we call Property Data - that forms the basis of a lot of what we do at Zoopla.
For a property website it's essential to know where a home is, what its address is, and how to uniquely refer to it. This allows data from multiple sources to be brought together and for Property Data products to be built. When Zoopla launched in 2008, the engineers created their solution using Royal Mail's Postcode Address File (PAF). They took Royal Mail's data, shaped it to Zoopla's needs and wants, and produced something we call the Zoopla Property ID. This identifier and its associated software served us well for many years, and formed the basis of Zoopla's famous estimates and property detail pages. It allowed us to connect listings, bedroom and bathroom counts, price estimates, and historical sale price data together for most homes across the UK.
Under the hood, this was a SQL and Perl based ETL system that would bring in data from the Land Registry, Registers of Scotland, Zoopla's Listing history, our customised PAF based data, and a selection of other data sources. Then through various build scripts and databases it would create a Property Data model. Access to this data was then provided via a variety of classes in our legacy stack. These classes were then available to power Zoopla.co.uk for many years.
This was a deeply customised solution that solved the problem in its day, but could not be shared with anyone else. In its day this was extremely effective, providing a means of connecting and simplifying complex problems of geography, addressing, and property matching. However, as with many custom solutions, as things evolved, it made it difficult to connect new data sources together and prevented engineering and product teams from innovating.
The challenge was that the Zoopla Property ID was our key, and no one else used it. And so every time we tried to connect to another data source that used another identifier, we effectively had to disregard it and try and match it ourselves. This resulted in difficulties with data matching, and required large amounts of work to integrate new data sets. We wanted to match new data sets to provide to homeowners and use in other places in our tech stack, and we were going slower and slower.
So in 2021, we decided to rebuild this Property Data model from scratch. Rather than continuing to use our custom solution we decided to go with Ordnance Survey's Addressbase Premium product, as we felt it would give us the greatest coverage and flexibility. This was because it was driven by the Unique Property Reference Number (UPRN), the standard identifier that government organisations and many other companies have chosen to use. This decision immediately opened up possibilities for us to share data internally more effectively, connect to external data sources, and to begin to expand our property set towards our goal of having accurate data for every home in the UK.
Deciding to use UPRNs and the Addressbase Premium data set allowed us to make bold technical decisions that were aimed at increasing coverage, improving accuracy, and simplifying our software architecture so that we could move faster. We moved from MySQL and Solr to PostgreSQL, DynamoDB, and OpenSearch. We broke things out from the legacy monolith, and began to replace the calls in APIs from Legacy to new using the Adapter pattern. We were able to rebuild our algorithms using data science, and migrated parts of the website in thin slivers to use UPRN.
Working with Ordnance Survey, we then improved our usage of their data to find and display homes that were simply unavailable to us in our own custom model based on PAF, bringing the Zoopla Home experience to many more homeowners than at the beginning of 2021.
Addressbase Premium now forms the central part of our Property Data model, and UPRNs are now the mechanism by which we tie the numerous data sets we need for our products together. Our aim is to continue this journey throughout 2022, complete the migration internally, and bring in more and more data at pace against UPRNs to deliver a better product for our homeowners.