By Divya Bhargava
Published on September 20, 2024
Demand for data analysts is on the rise. In fact, the Bureau of Labor Statistics projects that employment for operations research analysts will grow by 23% between 2021 and 2031, which is faster than the average for other jobs in the US. But what does a data analyst do? Data analysts are responsible for collecting, cleansing, analyzing, and visualizing data, and they play a vital role in creating KPIs (Key Performance Indicators) to measure the effectiveness of business initiatives.
In this blog, we’ll offer a primer on the key talents, tools, and training a data analyst needs to excel in this position in the 21st century, with a few real-world examples to demonstrate. Let’s dive in!
A good data analyst is naturally curious and detail-oriented, with a strong knack for problem-solving. From a business perspective, they have the ability to turn raw data into meaningful insights that drive decision-making. Key skills include proficiency in tools like Excel, SQL, and data visualization software (e.g., Tableau or Power BI), along with a solid understanding of statistics. Communication is just as important as technical expertise—a good data analyst can explain complex findings to non-technical stakeholders clearly and concisely.
The day-to-day activities of a data analyst can vary depending on the industry, but typically include:
Data collection and cleansing: Ensuring data accuracy by identifying and correcting inconsistencies.
Data analysis: Using statistical methods or specialized software to uncover trends or insights.
Reporting: Creating dashboards and visualizations to communicate findings to management.
Collaboration: Working with different departments to understand their data needs and offer solutions.
At a bank, a data analyst’s day might start by reviewing daily financial transactions to identify unusual patterns, which helps prevent fraud. They may spend the next few hours analyzing customer data to find trends in spending or loan behavior, providing insights to the marketing team for targeted campaigns.
After lunch, the analyst may meet with compliance officers to ensure the bank is adhering to regulatory requirements, using data to generate reports that show risk exposure. Finally, the day could end with building a new dashboard for the finance team to track key performance metrics, ensuring data is easily accessible for decision-making.
As you may surmise, the core activity of the data analyst is: “Hunting for answers from people, documents, and code.”
Sometimes, answering one of the questions takes a day, and other times, it may take several months. To address even a simple question may demand that the analyst converts a simple task to a network of dependencies on people, documentation, and processes.
Here's a real-world example. I was a data analyst for a retail customer, and the CTO asked two questions: How many customers added a product to their cart but didn’t buy it? And how many couldn’t find what they were searching for? My team needed to build a dashboard to answer these.
It took a week to discover there was an Omniture feed from the E-commerce team loading Hadoop tables, but there was no documentation, data dictionary, or SLA to clarify the data. It took nearly a month to identify the right fields. Another week was spent understanding the warehouse tables, followed by two weeks to build the dashboard. With a data catalog, this could have taken just a week instead of two months.
Several tools in the market support the data analyst community to help with analysis in a more automated manner. Tools like dbt and Fivetran help build data lineage and showcase how key data is transformed. However, such tools are often limited to answering technical questions related to the data.
Much of the work an analyst does is to answer questions like:
What are the various business initiatives at my organization? What is the end goal of each initiative? What is the collective goal?
What are the existing KPIs, and how are they calculated?
What data do I have? What do I need to collect?
Is the data trustworthy? How was it collected? Why was it collected? What is its purpose?
What processes or transformations are performed on the data by other analysts or source systems?
What is the quality of the data?
Are there any policies on data that restrict its use? How?
To whom should I reach out if I have a question?
How do I find the Data Dictionary to understand the data?
Perhaps the most critical values a data analyst offers are direction and clarity. What is an organization's most vital goal? What steps will they take to achieve this goal? How will they measure success? By providing clear, data-backed answers to such questions, data analysts inform strategy. They help to focus human efforts on key tasks so people are not diverted by irrelevant activities.
A data analyst’s role relies heavily on the use of various tools to gather, process, analyze, and present data. These help analysts derive insights, ensure data quality, and support decision-making across organizations. Here are the key categories of tools central to a data analyst's work:
Purpose: To find and understand governed, trusted data, complete with metadata details such as relevant policies, queries, and subject matter experts.
Examples:
Alation: A leading data catalog platform that supports data discovery, allowing analysts to locate and understand data assets across an organization.
Purpose: To transform raw data into visual formats such as charts, graphs, and dashboards, making complex data more accessible for decision-makers.
Examples:
Tableau: A popular tool for building interactive and real-time dashboards.
Power BI: Known for its integration with the Microsoft ecosystem and robust reporting capabilities.
Purpose: To perform complex statistical analysis and predictive modeling, essential for identifying trends and making data-driven predictions.
Examples:
R: A widely used tool for statistical computing and graphics.
SPSS: Known for its ease of handling complex data manipulation and conducting statistical procedures.
Purpose: To clean, structure, and prepare raw data for analysis, ensuring that datasets are accurate, complete, and consistent.
Examples:
Trifacta: A tool designed for data profiling and transformation, making data preparation more efficient.
Alteryx: Automates workflows for data blending and cleaning, allowing analysts to quickly prepare data for analysis.
Purpose: To store, retrieve, and manage large datasets, allowing analysts to perform queries and extract valuable insights from relational databases.
Examples:
SQL Server: A widely used relational database that supports structured query language (SQL) operations.
MySQL: An open-source DBMS popular for managing relational data in web applications.
Purpose: To extract data from various sources, transform it into the desired format, and load it into a destination system for analysis.
Examples:
Apache NiFi: Automates the extraction and transformation of data for easier analysis.
Talend: Provides data integration and transformation, enabling more streamlined analytics.
Purpose: To build and deploy predictive models, automate data analysis processes, and derive deeper insights from datasets.
Examples:
Python (with libraries like scikit-learn): A versatile programming language ideal for machine learning and AI tasks.
RapidMiner: Offers a user-friendly platform for data mining and machine learning.
Purpose: To share insights, reports, and dashboards with stakeholders, fostering collaboration and promoting data-driven decision-making across teams.
Examples:
Google Data Studio: A simple tool for creating shareable reports.
Jupyter Notebooks: Combines code and narrative, allowing analysts to share findings in an interactive and collaborative format.
These tools enable data analysts to work efficiently, from discovering data to visualizing insights and creating reports that drive actionable business decisions.
Aligning teams on their purpose and key goals requires public content all can access and reference. As an example, Alation’s “Documents Hub” allows the community to share details about the organization's business initiatives and interlink them to support contextual comprehension. These initiatives can be documented as business products and then associated with the respective business owners, KPIs, and technical metadata for ease of understanding.
A data analyst can use the information stored in Documents Hub to understand:
The overall goal of a given initiative
Available data assets and their sources
Relevant data quality details and policies
Ownership
Definitions and key information
Data that need to be sourced
Existing KPIs
This significantly reduces the work of an analyst trying to understand what’s already there and prevents them from reinventing the wheel.
Let’s assume you sought to put all your organization’s data products in one location. To do this, you’d create a document hub called “Data Product” and add documents for business products.
You would then associate the following assets:
Logical metadata – Business attributes, metrics, critical data elements, terms
Physical Assets – Tables, BI reports, columns
Stakeholders – Owners, information architects, subject matter experts
Policies – ROPA, GDPR, CCPA
Data Quality – Rules as policies/documents
Workflow – Restrict access to data products as needed
By aggregating data products in one location, and adding relevant metadata, you can empower other data analysts and data users to find, understand, and leverage data more effectively and efficiently.
On average, an analyst spends 30-40% of their time to find and understand the data they need. McKinsey estimates that the average employee spends 1.8 hours every day searching for and gathering information. With the help of Alation, an analyst can:
Search and discover the data available across the organization
Understand the data with the help of the data dictionary
Understand the policies associated with the data to adhere to the data usage policies
Find the owners, data quality, samples, and data profiling to analyze before access to the data asset is provided
Understand the data lineage, how data is transformed, and where it is used
Understand the impact analysis and communicate to the business
Start conversations with the appropriate owners of the data
Reducing search and discovery time has proven to increase not just the productivity of an analyst but also the relevant SME, who may have spent one hour every day talking about the same topic for the last five years or more!
Additionally, analysts enjoy productivity benefits from the catalog due to features including:
Business Lineage (Compound layout) – Understand how business dashboards are populated and where the data is sourced from
Chrome extension – Makes catalog data available to the wider analyst community for analyzing and visualizing data in other tools
Connectors like dbt and Fivetran – Enhance data lineage with helpful transformation details
Data analysts typically need a combination of formal education, technical skills, and continuous learning to excel in their role. While a degree in fields such as computer science, statistics, data science, or mathematics is often beneficial, hands-on experience and practical training are equally important. Here are key areas of training for aspiring data analysts:
Proficiency in data analysis tools such as Excel, SQL, R, Python, and visualization tools like Tableau or Power BI is essential. Online courses, certifications, and bootcamps are excellent ways to gain these skills.
Training in statistics, probability, and data modeling helps analysts interpret and analyze data effectively. Courses in statistical analysis and predictive modeling are valuable for this reason.
Understanding how to clean, structure, and manipulate data is crucial. Learning ETL (Extract, Transform, Load) processes and data wrangling tools like Alteryx or Trifacta is key to preparing data for analysis.
Effective communication and problem-solving are critical for translating data insights into business strategies. Training in presentation and storytelling with data helps analysts convey their findings clearly.
The field of data analysis evolves rapidly, so staying up-to-date with new tools, methodologies, and industry trends through continued learning, certifications, and advanced courses is essential.
As a data analyst, understanding the overall design and concept of the data products in Alation will help visualize and implement them.
Utilizing Alation features and making the most of it will reduce the time spent searching for information. Following are some of the courses from Alation University to consider as a data analyst:
Understanding technical lineage and business lineage in Alation
Asking a question
Curious to learn more? Explore this expert talk on Data Products in Alation.
In many scenarios, the knots in the thread are usually the processes. Simple and transparent processes can reduce significant effort across units.
It may initially seem like a lot of work to curate data, but a top-down approach and keeping the end goal in mind will help prioritize the work and give a more structured approach to the overall implementation.
Becoming a successful data analyst requires a mix of technical proficiency, analytical thinking, and effective communication skills. From mastering key tools like SQL, Tableau, and Python to developing a deep understanding of data wrangling and statistical analysis, the role offers both challenges and opportunities for growth. With the right training and a passion for data-driven insights, aspiring analysts can play a crucial role in helping organizations make informed decisions, optimize processes, and drive innovation. Whether you’re just starting out or looking to advance your career, the path of a data analyst is both dynamic and rewarding.
A data catalog is a critical tool in a data analyst’s stack. With a catalog, analysts can support a business implementation and associate the content around it, helping the wider data community find the information and answers they need efficiently. A data catalog like Alation will also help data leaders:
Reduce the scope for stewards/ SMEs to curate. They can focus only on the CDEs/KPIs that feed into the business products.
Establish a platform where business and technical users can co-exist and share information to improve and enhance the current processes.
Reduce duplication of effort by making all the data, terms, and KPIs available to the users.
Remember “a happy data analyst” is a “ mature data-driven organization.” Curious to see how a data catalog can help your data analyst community? Book a demo with us to learn more.