INDUSTRY - TECHNOLOGY

About The Client

Though having a robust analytical system, the Customer believed that it would not be able to satisfy the company’s future needs. Acknowledging this situation, the Customer was keeping their eyes open for a future-focused innovative solution. A system-to-be was to cope with the continuously growing amount of data, to analyse big data faster and enable comprehensive advertising channel analysis.

Project Overview

For the new analytical system, the Customer’s architects selected the following frameworks:

• Apache Hadoop – for data storage

• Apache Hive – for data aggregation, query and analysis

• Apache spark – for data processing

Amazon Web Services and Microsoft Azure were selected as cloud computing platforms. Upon the Customer’s request, during the migration, the old system and the new one were operating in parallel.

The system has been supplied with raw data taken from multiple sources, such as TV views, mobile devices browsing history, website visits data and surveys. To enable the system to process more than 1,000 different types of raw data (archives, XLS, TXT, etc.), data preparation included the following stages coded in Python:

• Data transformation

• Data parsing

• Data merging

• Data loading into the system

Challenges

End Results and client

The collaboration between Deltacubes yielded several concrete benefits, resulting in substantial enhancements to the project’s success.

Conclusion

At the project closing stage, the new system was able to process several queries up to 100 times faster than the outdated solution. With the valuable insights that the analysis of almost 30,000 attributes brought, the Customer was able to carry out comprehensive advertising channel analysis for different markets.

Technologies and Tools

Apache Hadoop, Apache Hive, Apache Spark, Python( ETL) Scala (Spark, ETL), SQL (ETL), Amazon Web Services (Cloud storage), Microsoft Azure (Cloud storage), .NET, WPF, C#.

INDUSTRY – TECHNOLOGY