Zoopla's Trusted Data Layer
Two years ago we initiated a major data programme at Zoopla. The Data & Analytics team mission is to enable everyone in the company to be data literate, to create new opportunities for Zoopla through data & analytics solutions and to create a world class data environment at Zoopla.
Data is at the core of Zoopla’s business. We have over a million property listings for people to browse and sophisticated tools to help people find exactly what they’re looking for. That data also powers our Customer Relationship Management (CRM) tools used by thousands of estate agents around the UK.
Core to Zoopla’s Data Engineering strategy is using the latest AWS tools, and using them as they were designed to be used, so we can focus on getting the value out of the data we have rather than spending time developing custom tooling. This also makes it easy for us to get help from our AWS partners when we need it!
We work with a mixture of raw unprocessed and structured data. This data is stored as Parquet files to help with processing. We process these with both AWS Redshift and Glue and use Athena for querying. This provides us with the best mix of scalability, cost and performance. We call this platform the Trusted Data Layer (TDL). We also use Glue to create a data catalogue to help developers find what they’re looking for.
Like most modern online businesses we use a variety of third party Software-as-a-Service solutions to run our business. We process data from these services, usually in some type of event format. We craft our property data from a variety of sources including Postcode Address File (PAF) and a number of other sources. We get listing data from estate agents and valuation data from Hometrack. Then there are a wide variety of internal systems that augment the data we have with more insight.
Some of our data, particularly that sourced from older internal systems, is still processed using old-school Extract, Transform and Load (ETL) processes. These are relatively expensive to maintain and are inflexible. We’ve just started moving to an event-based system. Again we’re going with modern AWS tools, in this case using Eventbridge because of the power we get from being able to create rules to allow us to direct and control events. We’ve built the core infrastructure and defined our standards, and we are processing the first events right now. This is just the beginning of the journey to transform Zoopla’s systems into an event-driven architecture.
There are a variety of different use-cases for accessing the data in the TDL and as such a variety of different mechanisms are supported. Business customers will often work directly off reports created by our Data Analysts in Tableau. Our Data Scientists use Sagemaker to create their models and other engineering teams query the data using Athena to create new features for the website or customer reporting.
It’s an exciting time here at Zoopla, we’ve done some of the hard work and investment necessary to create the platform we need. There’s more still to do, we’re still innovating on our Eventbridge implementation as AWS adds new features but our product delivery teams now have all the tools they need to be successful.