You are working as an analytics developer for a government firm such as CDC. You have collected close to 200 TB of medical health records. CDC needs to convert all the existing data to another format such as parquet within three weeks. It takes three days to convert one terabyte of existing data to the parquet format. So it is impossible to use the traditional approach to convert 200 TB of data from one format to the other within three weeks.

You are required to use Big Data architecture on the cloud platform. You need to design the system architecture and write one page summary about how your job is complete within three weeks.


  • Provided the detailed architecture diagram.
  • One or two pages of summary document.
  • Screenshot of SAS script output.
