For Data Analysts

The competency model for data analysts was developed using job/task analysis, industry research, and expert interviews. While these competencies are not expected to be an exhaustive list, as expectations of data analysts may vary from organization to organization, we aimed to capture the most common, core data competencies a data analyst is expected to demonstrate to be effective in their role.

Data Management: The process of gathering, storing, and using data. Includes data collection, data cleaning and transformation, data storage, data quality, and data security.
Associate (Level 1) Professional (Level 2)
Perform standard data extraction, joining and aggregation tasks using SQL
  • Aggregate numeric, categorical variables and dates by groups using PostgreSQL. 
  • Interpret a database schema and combine multiple tables by rows or columns using PostgreSQL.  
  • Extract data based on different conditions using PostgreSQL.  
  • Use subqueries to reference a second table (e.g. a different table, an aggregated table) within a query in PostgreSQL
Perform standard data import, joining and aggregation tasks using R or Python
  • Import data from flat files into R or Python.
  • Import data from databases into R or Python.
  • Aggregate numeric, categorical variables and dates by groups using R or Python.
  • Combine multiple tables by rows or columns using R or Python.
  • Filter data based on different criteria using R or Python.
Perform standard cleaning tasks to prepare data for analysis using SQL
  • Match strings in a dataset with specific patterns using PostgreSQL.
  • Convert values between data types in PostgreSQL
  • Clean categorical and text data by manipulating strings in PostgreSQL.
  • Clean date and time data in PostgreSQL
 

Assess data quality and perform validation tasks using SQL

  • Identify and replace missing values using PostgreSQL.
  • Perform different types of data validation tasks  (e.g. consistency, constraints, range validation, uniqueness) 

using PostgreSQL.

  • Identify and validate data types in a data set using PostgreSQL.
 

 

Exploratory Analysis: Use of statistics and visualizations to analyze and identify trends in data sets. Includes calculating  metrics and creating data visualizations.
Associate (Level 1) Professional (Level 2)
Calculate metrics to report characteristics of data (e.g. measures of center and variability, missingness) and relationships between features (e.g. correlation) using SQL
  • Calculate measures of center (e.g. mean, median, mode) for variables using PostgreSQL.
  • Calculate measures of spread (e.g. range, standard deviation, variance) for variables using PostgreSQL.
  • Calculate skewness for variables using PostgreSQL.
  • Calculate missingness for variables and explain its influence on reporting characteristics of data and relationships in PostgreSQL.
  • Calculate the correlation between variables using PostgreSQL
Calculate metrics to effectively report characteristics of data and relationships between features using Python or R
  • Calculate measures of center (e.g. mean, median, mode) for variables using R or Python.
  • Calculate measures of spread (e.g. range, standard deviation, variance) for variables using R or Python.
  • Calculate skewness for variables using R or Python.
  • Calculate missingness for variables and explain its influence on reporting characteristics of data and relationships in R or Python.
  • Calculate the correlation between variables using R or Python.
Read and analyze data visualizations to demonstrate characteristics of data
  • Distinguish between different types of data visualizations (e.g. bar chart, box plot, line graph, and histogram) in demonstrating the characteristics of data.
  • Interpret the data visualizations (e.g. bar chart, box plot, line graph, and histogram)and summarize the characteristics of the data.
Create data visualizations in R or Python to demonstrate the characteristics of data
  • Create and customize bar charts using R or Python.
  • Create and customize box plots using R or Python.
  • Create and customize line graphs using R or Python.
  • Create and customize histograms graph using R or Python.
Read and analyze data visualizations to represent the relationship between features
  • Distinguish between different types of data visualizations (e.g. scatterplot, heatmap, and pivot table) in representing the relationships between features. 
  • Interpret the data visualizations (e.g. scatterplot, heatmap, and pivot table) and summarize the relationship between features.
Create data visualizations in R or Python to represent the relationships between features
  • Create and customize scatterplots using R or Python.
  • Create and customize heatmaps using R or Python.
  • Create and customize pivot tables using R or Python.

 

Statistical Experimentation: Procedure for planning experiments so that the data obtained can be analyzed to yield valid and objective conclusions.
Associate (Level 1) Professional (Level 2)
Describe statistical concepts that underpin hypothesis testing and experimentation
  • Define different statistical distributions (e.g. binomial, normal, Poisson, t-distribution, chi-square, and F-distribution, etc. ).
  • Explain the statistical concepts in hypothesis testing (e.g. null hypothesis, alternative hypothesis, one-tailed and two-tailed hypothesis tests, etc. ).
  • Explain the statistical concepts in the experimental design (e.g. control group, randomization, confounding variables, etc. ). 
  • Explain parameter estimation and confidence intervals.
Apply sampling methods to data
  • Distinguish between different types of random sampling techniques and apply the methods using R or Python
  • Sample data from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python
  • Calculate a probability from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python
  Implement methods for performing statistical tests 
  • Run statistical tests (e.g. t-test, ANOVA test, chi-square test) using R or Python. 
  • Analyze the results of statistical tests from R or Python.



Data Communication: The practice of presenting data findings in a compelling and actionable way to technical and non-technical audiences. Includes data storytelling, data visualization, presentation and writing skills.
Associate (Level 1) Professional (Level 2)
Present data concepts to small, diverse audiences
  • Explain findings and/or the reasoning for selecting approaches.
Frame, convey, and summarize stories using data
  • Employ techniques in data storytelling to propose findings and relay solutions to business stakeholders
Employ data visualization to support findings
  • Create charts using visualization tools.
  • Use visualizations that support the findings being presented.
Employ multiple tactics (written and verbal) to communicate to business leaders
  • Deliver a verbal presentation addressing the business goals, outcomes and recommendations
  • Provide a written explanation of findings and/or reasoning for selecting approaches



Business Acumen: The collection of both general and organization-specific knowledge about how things get done and why, used with the intent to positively impact the organization.
Associate (Level 1) Professional (Level 2)
N/A Collect relevant information, detect patterns, observe and interpret data
  • Describe business goals
  • Explain how solution addresses the business problem
  • Provide recommendations for future action to be taken based on the outcome of the work done
  Benchmark, monitor, and evaluate business processes
  • Define metrics that can be used by the business in the future to measure success in solving the problem
  • Evaluate metrics using the existing data to provide a baseline measure for the problem