Back

DataNov 17, 2020

Analytics Workbench Part 1: 8 Critical Capabilities for Advanced Analytics Platforms

Vikalp Jain

We talk a lot about how companies create meaningful engagement for customers and end users. However, a critical piece to an overall digital engagement strategy is strategic measurement of the experience through thoughtful capture and analysis of data. Earlier this year, we discussed why companies should invest in running their own analytics. In this article, we’ll cover the "how" behind running a successful analytics program and will attempt to demystify several of the technical building blocks required to enable advanced analytics.

For the remainder of the article, we’ll walk through the “what” and “why” of the eight key capabilities below:

  1. Data extraction

  2. Data storage

  3. Identity resolution

  4. Data modeling

  5. Data transformation

  6. Data warehousing

  7. Flattening the data

  8. Visualization

1. data extraction

What it is: Essentially, data extraction is the machinery that pulls data from the external platform as often as every hour and dumps that data in a secure location for storage and processing.

Why it matters: Data extraction is the most fundamental aspect of building an analytics platform, because it is the process of collecting data from multiple places and making it accessible for future transformation and analysis. Without data extraction, you won’t have the data you need for insights.

2. data storage

What it is: Data is typically stored in a layer referred to as a data lake. A data lake is used to store raw, unstructured data that can be used for a multitude of purposes.

Why it matters: It may seem simplistic to call this piece out, but it is a critical piece of the puzzle as we talk about really huge amounts of data (think petabytes) that needs to be stored quickly, in its native format, retained for a long time in a secure manner, and retrieved on demand.

3. identity resolution

What it is: Almost all of the data we can access from external marketing platforms is anonymized data, meaning it does not contain any personally identified information (PII). Identity resolution is the means by which an organization stitches together multiple data elements into a cohesive view of a single user.

Why it matters: Identity resolution is a key step in data enrichment, as it allows organizations to build a more comprehensive view of user behavior. For example, while we do not know that a single data element belongs to John Smith, aged 42, living in San Francisco, we can link it to other data elements that belong to the same user to enable targeted personalization and people-based measurement.

4. data modeling

What it is: Data modeling is often the hardest part of the process, as it involves transforming the various data sources into a single, cohesive structure. This is typically done through exploratory data analysis on the raw data in order to find key identifiers that help stitch data from different sources.

Why it matters: While data modeling can seem abstract, it should reflect business objectives and business needs. A properly built and maintained common data model will equip the organization to query the data for actionable, relevant insights into user behavior.

5. data transformation

What it is: Data transformation is the automated process of converting the raw data into the structure of the defined common data model each time a data feed is pulled from any data sources. This layer needs to be resilient, meaning that errors are auto corrected and rarely require human intervention.

Why it matters: Data transformation enables organizations to have organized and structured data at scale, while also maintaining the quality and integrity of data.

6. data warehousing

What it is: Now that we have a common data model and a way to transform the data to a common model, we need a place to store the transformed data. A data warehouse allows organizations to store large amounts of data in an organized way.

Why it matters: Data warehouses provide a central location to store transformed and structured data that has a clear purpose and intent for analysis, making this data accessible for data scientists, analysts, decision makers, etc.

7. flattening the data

What it is: Data warehouses typically store data in a normalized fashion (i.e., in multiple tables that are joined together through relationships between relevant data). “Flattening the data” means storing all of that data in a single or small handful of larger tables. A common approach is through fact tables, which are essentially pre-built query results. Fact tables are typically part of a pattern called the Star Schema and can live in the warehouse or a separate reporting database called a data mart.

Why it matters: While normalized data helps maintain the integrity of data and minimize redundancy, it is often much less efficient to query against. Flattening the data greatly increases efficiency and speed of generating insights from the data.

8. visualization

What it is: Once we have the data summarized in fact tables, we are ready to integrate a data visualization tool such as Tableau or PowerBI. These tools make it easy to configure the data through tables and charts and incorporate these on a dashboard.

Why it matters: Data visualization is the key to illustrating complex data in a way that is both digestible and meaningful to stakeholder groups. Visualizing data in thoughtful formats is the final step to making data universally available and understandable, leading to real action.

let’s talk data

We know we have only scratched the surface of this topic, but we hope this walkthrough has been helpful in demystifying some of the technical complexities that underpin advanced analytics platforms.

In next week’s article, we’ll discuss how we use Azure and AWS to build all of these foundational elements. In the meantime, please feel free to reach out to us at findoutmore@credera.com to learn more or discuss how to turn your data into actionable insights.

Have a Question?

Please complete the Captcha