My Data Analytics Journey | Correlation
Linear Equation
Ahhhh yes...the fabled Y=MX+C that most of us are all to familiar with...This equation is important as allows us to predict a value based on existing data, kinda like what we do for Machine Learning (wink wink). In fact, linear regression is often used as a baseline model to compare against more complex machine learning models due to its simplicity and surprising accurate in certain scenarios but more on those at a later timing so stay tuned!
This linear equation also comes in handy when we are trying to draw a best fit line through various data points, where we draw a line that minimises the mean squared errors. However, how do we know if x and y are actually correlated? This is where the Pearson Correlation Coefficient (r) value comes in, as it tells us both the strength as well as the direction of the correlation.
Summary of r
However, it is important to note that just looking at r values can be misleading!
Even though the value of r is > 0.7 for the above 4 graphs, we can clearly see that the dataset does not necessarily have a very strong linear relationship between the two variables. In fact, for the below graph, even when r=0, the dataset can still have a strong relationship (such as a quadratic one), albeit not a linear one.
As such prior to calculating r value, it is good practice to plot the two variables on a scatterplot to conduct a visual sense check that the relationship between the two variables is indeed linear in nature.
As it is often prohibitively expensive to gather data on everyone in the target population, researchers often carry out representative sampling and extrapolate the findings from their sample back to the target population.
Do remember to check the p-value before using any variable X as a predictor of variable Y.
Another things to note is that correlation does NOT equate to causation. In simpler terms, just because two events occur in tandem does NOT mean one causes the other. Take for instance that an increase in ice-cream sales coincided with an increase in the number of drownings. Common sense tells us that neither event causes the other, and that it is most likely a third variable such as the hot weather that brings people out to swim more as well as to purchase more ice-cream.
Now, do you feel that this post has caused you to under these concepts better, or is it merely correlated with it? ;)
Now, do you feel that this post has caused you to under these concepts better, or is it merely correlated with it? ;)


.png)
%20weak.png)
%20scale.png)





Comments
Post a Comment