How can we build an environment to gather, capture, process and expose all available data?

The project requirements

The Department of Care & Health was looking for a party that could determine the architecture and assistance for building the new data platform. The current data platform completely relies on a Snowflake solution. Although this solution is good for storing and transforming the data, the client needs were far beyond the functionalities of Snowflake. The department was in need for a cost-efficient platform in the cloud where the data are present for the Snowflake processes as well as for external parties like hospitals, schools, nursing homes, etc.

In addition, the question was to facilitate the general data flow through the organization. For example, there were no data pipelines that made it possible to provide data at the correct time intervals or in (near) real time depending on the use-case of the data. Without data pipelines the organization was not able to move and consolidate data from all its various data sources. The data pipelines also enabled data quality throughout the organization and is now able to anticipate on possible data loss.

Agency Care & Health

The Agency Care and Health is a service of the Flemish Government. It supports and regulates a range of care and health initiatives: from clean drinking water to healthy food, from cancer to infectious diseases and from preventive organizations to palliative care institutions. During the course of this project, the Department was mainly focusing on COVID-19 related topics.

Setting up a data platform in AWS

In general, our solution consisted of setting up a data platform in AWS. We provided the department with the necessary advice on the solution to be implemented in terms of tooling & technology and how the architecture of the data platform should look like in order to solve their business challenges.

We were also responsible for the following elements:

  • Setting up Data Pipelines in Apache NiFi
  • Setting up a Data Takeaway API: Open API with front-end service making it easy for users to upload & download data. The front-end service is used as an upload zone for hospitals, schools, etc. Other external parties integrate with the API directly
  • Setting up AWS DMS Pipelines (Database Migration Service)
  • Setting up a Data Lake: a centralized collection of all raw data on which transformations take place. Thanks to these transformations, the data is easy to use.

High-availability of data for internal and external users

The first steps towards a data platform have been taken, but there is still room for improvement. In the future we will make the platform more scalable, so it would be able to handle more actions and transformations. The platform will then replace some of the work performed by Snowflake but will still be able to work alongside of each other. We will continue to work with the Department to enhance all data services.

This project was implemented in full COVID-19 times. Out of necessity, quick decisions had to be made. For example: NiFi Pipelines were set-up in a record speed, because a lot of data (hotels, restaurants, schools, contact tracing, field agents, vaccinations, etc.) needed to be processed and made available for external parties.