My Data Analytics Journey | Describing data for statistical analysis
Sources of Data
Descriptive Statistics
When confronted with a large data set, how can we quickly convey its key characteristics to someone who has never seen or worked on the data set before? One way to achieve this is through descriptive statistics, which allows us to display and describe data. Displaying data can take many shapes and form depending on the type of data.
Displaying Data
Describing data
- Mean: The Numeric "Average"
- Median: The "Middle" Number
- Mode: The "Most Common" Number
- Range
- Variance
- Standard Deviation
- Coefficient of Variance
3) Measures of Shape
- Skewness
In Excel, the formula to calculate this skewness of the dataset is through the formula =skew(data)
- Kurtosis
Kurtosis measures the heaviness of the tail when compared to a normal distribution. High kurtosis (> 0 or < 0) tells us that there are a lot of outliers and is a source for concern, while low kurtosis (= 0) tells us that there are not many outliers in the dataset.
When kurtosis is <0, the tails are thinner and shorter.
When kurtosis is >0, the tails are longer and fatter.
That's a lot of work needed to describe data! Now that we understand the significance of the terms that can be used, is there an easy way to calculate them? Of course, through the power of Excel!
Excel Step-By-Step
Step 1: Click on Excel Data Tab
Step 2: Click on Data Analysis
Step 3: Make sure "Labels in first row" and "Summary statistics" are checked before clicking on ok.
Kurtosis measures the heaviness of the tail when compared to a normal distribution. High kurtosis (> 0 or < 0) tells us that there are a lot of outliers and is a source for concern, while low kurtosis (= 0) tells us that there are not many outliers in the dataset.
When kurtosis is <0, the tails are thinner and shorter.
When kurtosis is >0, the tails are longer and fatter.
That's a lot of work needed to describe data! Now that we understand the significance of the terms that can be used, is there an easy way to calculate them? Of course, through the power of Excel!
Excel Step-By-Step
Step 1: Click on Excel Data Tab
Step 2: Click on Data Analysis
Step 3: Make sure "Labels in first row" and "Summary statistics" are checked before clicking on ok.
Results from Excel
Inferential Statistics
The other form of statistics is inferential statistics, which allows us to infer characteristics of the larger population based on the sample we have. This is important as it is often not economical nor feasible to collect data from every single unit in our target population.
Congrats on getting through another information heavy post! Now that we know how to describe the data, we can get our hands dirty and start cleaning data for our data analysis. For that and more, I will see you in the next blog post :)












Comments
Post a Comment