Data7 Exploratory Data Analysis in SQL

By Greg T. Chism in Quarto

September 14, 2022

Exploratory data analysis is an essential first step towards determining the validity of your data and should be performed throughout the data pipeline. However, EDA is often performed too late or not at all.

SQL (Structured Query Language) is a programming language for database management, which lets you store, retrieve, manage, and manipulate data tables within databases. Though SQL has limited mathematical capabilities it can be used to perform EDA. A major disadvantage however is that SQL cannot be used to perform statistical graphics and other data visualization methods. For this, I recommend either the R programming language, specifically through the RStudio IDE and ggplot2 from the tidyverse package suite, or Python, specifically the seaborn library.

Here, I utilize MySQL to conduct preliminary exploratory data analysis aimed at diagnosing any major issues with an imported data set. We introduce a clean and straightforward methodology to uncover issues such as data outliers, missing data, as well as summary statistical reports.

Posted on:
September 14, 2022
Length:
1 minute read, 159 words
Categories:
Quarto
Tags:
Statistics SQL
See Also:
Data7 Exploratory Data Analysis in Python Book
Data7 Exploratory Data Analysis in Unix Shell
Data7 Exploratory Data Analysis in R Workshop Series