The competency model for data analysts was developed using job/task analysis, industry research, and expert interviews. While these competencies are not expected to be an exhaustive list, as expectations of data analysts may vary from organization to organization, we aimed to capture the most common, core data competencies a data analyst is expected to demonstrate to be effective in their role.
Data Management: The process of gathering, storing, and using data. Includes data collection, data cleaning and transformation, data storage, data quality, and data security. 
Associate (Level 1) 
Professional (Level 2) 
Perform standard data extraction, joining and aggregation tasks using SQL
 Aggregate numeric, categorical variables and dates by groups using PostgreSQL.
 Interpret a database schema and combine multiple tables by rows or columns using PostgreSQL.
 Extract data based on different conditions using PostgreSQL.
 Use subqueries to reference a second table (e.g. a different table, an aggregated table) within a query in PostgreSQL

Perform standard data import, joining and aggregation tasks using R or Python
 Import data from flat files into R or Python.
 Import data from databases into R or Python.
 Aggregate numeric, categorical variables and dates by groups using R or Python.
 Combine multiple tables by rows or columns using R or Python.
 Filter data based on different criteria using R or Python.

Perform standard cleaning tasks to prepare data for analysis using SQL
 Match strings in a dataset with specific patterns using PostgreSQL.
 Convert values between data types in PostgreSQL
 Clean categorical and text data by manipulating strings in PostgreSQL.
 Clean date and time data in PostgreSQL


Assess data quality and perform validation tasks using SQL
 Identify and replace missing values using PostgreSQL.
 Perform different types of data validation tasks (e.g. consistency, constraints, range validation, uniqueness)
using PostgreSQL.
 Identify and validate data types in a data set using PostgreSQL.


Exploratory Analysis: Use of statistics and visualizations to analyze and identify trends in data sets. Includes calculating metrics and creating data visualizations. 
Associate (Level 1) 
Professional (Level 2) 
Calculate metrics to report characteristics of data (e.g. measures of center and variability, missingness) and relationships between features (e.g. correlation) using SQL
 Calculate measures of center (e.g. mean, median, mode) for variables using PostgreSQL.
 Calculate measures of spread (e.g. range, standard deviation, variance) for variables using PostgreSQL.
 Calculate skewness for variables using PostgreSQL.
 Calculate missingness for variables and explain its influence on reporting characteristics of data and relationships in PostgreSQL.
 Calculate the correlation between variables using PostgreSQL

Calculate metrics to effectively report characteristics of data and relationships between features using Python or R
 Calculate measures of center (e.g. mean, median, mode) for variables using R or Python.
 Calculate measures of spread (e.g. range, standard deviation, variance) for variables using R or Python.
 Calculate skewness for variables using R or Python.
 Calculate missingness for variables and explain its influence on reporting characteristics of data and relationships in R or Python.
 Calculate the correlation between variables using R or Python.

Read and analyze data visualizations to demonstrate characteristics of data
 Distinguish between different types of data visualizations (e.g. bar chart, box plot, line graph, and histogram) in demonstrating the characteristics of data.
 Interpret the data visualizations (e.g. bar chart, box plot, line graph, and histogram)and summarize the characteristics of the data.

Create data visualizations in R or Python to demonstrate the characteristics of data
 Create and customize bar charts using R or Python.
 Create and customize box plots using R or Python.
 Create and customize line graphs using R or Python.
 Create and customize histograms graph using R or Python.

Read and analyze data visualizations to represent the relationship between features
 Distinguish between different types of data visualizations (e.g. scatterplot, heatmap, and pivot table) in representing the relationships between features.
 Interpret the data visualizations (e.g. scatterplot, heatmap, and pivot table) and summarize the relationship between features.

Create data visualizations in R or Python to represent the relationships between features
 Create and customize scatterplots using R or Python.
 Create and customize heatmaps using R or Python.
 Create and customize pivot tables using R or Python.

Statistical Experimentation: Procedure for planning experiments so that the data obtained can be analyzed to yield valid and objective conclusions. 
Associate (Level 1) 
Professional (Level 2) 
Describe statistical concepts that underpin hypothesis testing and experimentation
 Define different statistical distributions (e.g. binomial, normal, Poisson, tdistribution, chisquare, and Fdistribution, etc. ).
 Explain the statistical concepts in hypothesis testing (e.g. null hypothesis, alternative hypothesis, onetailed and twotailed hypothesis tests, etc. ).
 Explain the statistical concepts in the experimental design (e.g. control group, randomization, confounding variables, etc. ).
 Explain parameter estimation and confidence intervals.

Apply sampling methods to data
 Distinguish between different types of random sampling techniques and apply the methods using R or Python
 Sample data from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python
 Calculate a probability from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python


Implement methods for performing statistical tests
 Run statistical tests (e.g. ttest, ANOVA test, chisquare test) using R or Python.
 Analyze the results of statistical tests from R or Python.

Data Communication: The practice of presenting data findings in a compelling and actionable way to technical and nontechnical audiences. Includes data storytelling, data visualization, presentation and writing skills. 
Associate (Level 1) 
Professional (Level 2) 
Present data concepts to small, diverse audiences
 Explain findings and/or the reasoning for selecting approaches.

Frame, convey, and summarize stories using data
 Employ techniques in data storytelling to propose findings and relay solutions to business stakeholders

Employ data visualization to support findings
 Create charts using visualization tools.
 Use visualizations that support the findings being presented.

Employ multiple tactics (written and verbal) to communicate to business leaders
 Deliver a verbal presentation addressing the business goals, outcomes and recommendations
 Provide a written explanation of findings and/or reasoning for selecting approaches

Business Acumen: The collection of both general and organizationspecific knowledge about how things get done and why, used with the intent to positively impact the organization. 
Associate (Level 1) 
Professional (Level 2) 
N/A 
Collect relevant information, detect patterns, observe and interpret data
 Describe business goals
 Explain how solution addresses the business problem
 Provide recommendations for future action to be taken based on the outcome of the work done


Benchmark, monitor, and evaluate business processes
 Define metrics that can be used by the business in the future to measure success in solving the problem
 Evaluate metrics using the existing data to provide a baseline measure for the problem
