My Data Analytics Journey | Data Modelling for beginners
What is Data Modelling?
Data Modelling involves visualising entire information systems or its subparts to illustrate how data points and structures are connected and can include the data types used and stored in the system. An example of this is an Entity Relationship Diagram (ERD) drawn using the Crow's Feet Notation.
What is an entity?
Before we move on, let me clarify what an entity is. An entity is an object that has certain characteristics known as attributes which define the entity. An example of this is the for each dog entity, it can have different attributes such as tag no, age, weight and size which defines this dog entity. Collectively, all the dog entities gives us the Dog Entity Set.
Concept of Primary Key
Within the table, we can see that tag no. is the primary key, as it is the attributes that is able to uniquely identify the entity. In other words, if I were to ask for information regarding the dog entity with tag no 1, there would be no confusion as to which dog I am referring to. In contrast, I will not be able to uniquely identify a dog through the "Age" attribute as many dogs can be of the same age.
Now that you have a better idea of the terms used in data modelling, let us work on an actual data set to introduce more advanced concepts.
Data Dictionary
A data dictionary contains the meta data for our dataset, which is basically data containing data. For instance, a company might have organised giveaways over the years in an attempt to boost sales, and have recorded down the demographics of the winner of the giveaways in the table below.
However, it might be difficult for someone who has not worked on the data prior to understand what the columns mean and what the data should look like. As such, a data dictionary such as the one below should be included in order to explain the attributes and requirements needed for the data.
Issues with storing data in Unnormalised form (UNF)
Even though is possible to store the data in this unnormalised form, it is highly discouraged as there is a lot of redundancy and makes it hard to update records. For instance, the CustomerID, CustomerName, MemberSince columns have missing values whenever we are updating it with new giveaway records due to the fact that it is nested. As such, there is a need to normalise the data. The diagram below shows the steps that are needed to move the dataset we have from the unnormalised form to the third normalised form (3NF).
Functional Dependencies & Transitive Dependencies
B is functionally dependent on A if each instance of A is associated with exactly one value of B. For instance, each dog tag no in our earlier example is associated with just one value of the the attribute "Age". In other words, there will be no ambiguity as to what the "Age" value for dog tag no 1 will be.

Whew! That was a lot of information to digest, but I'm proud of you for having made it this far! Now that we know how to model data in a way the reduces redundancy, we'll be learning more about how we can describe data to quickly convey the main patterns found within in the next post. If that sounds exciting to you, do subscribe and check by frequently and I'll see you in the next post :)
C is transitively dependent on A if through A, we can find the value of C. For instance, if we have someone's identification number, we will be able to know their address, which allows us to know one's block number. We say that the block number attribute is transitively dependent on the identification number attribute.

0 Normal Form (0NF)--> First Normal Form (1NF):
Remove nesting
First Normal Form (1NF) --> Second Normal Form (2 NF):
Remove Partial Functional Dependencies
Second Normal Form (2NF) --> Third Normal Form (3NF):
Remove Transitive Functional Dependencies
Summary
0NF --> 3NF:
Splitting 1 nested table into 4 different tables
After achieving 3NF, we are not ready to combine these tables together using an ERD and Crow's Feet Notation. First, let us review the below important concepts.
Maximum Cardinality
The maximum number of entity instances that can participate in a relationship instance
Minimum Cardinality/Optionality
The minimum number of entity instances that can participate in a relationship instance
For instance, imagine we have another table of dog owners. In this case, each dog owner has to have at least one dog (Minimum cardinality of one) but may also have more than one dog (Maximum cardinality of many).
This two-way relationship can be illustrated using an Entity Relationship Diagram (ERD) drawn using the Crow's Feet Notation.
Completed ERD diagram
Using the tables in 3NF above, we can then proceed to use create the below ERD diagram. Since the "ThemeName" attribute can only exist if the "GiveawayCountryTheme" attribute exists, the association between is weak and hence denoted by a dotted line.
Whew! That was a lot of information to digest, but I'm proud of you for having made it this far! Now that we know how to model data in a way the reduces redundancy, we'll be learning more about how we can describe data to quickly convey the main patterns found within in the next post. If that sounds exciting to you, do subscribe and check by frequently and I'll see you in the next post :)













Comments
Post a Comment