Diamond Exploratory Analysis
In this article, I attempt to explore diamonds dataset and create visualizations using Python.
I am still relatively new to data science, so this will be a learning process for me as well. To start, we will load in the data, take a peek at it, clean it if necessary and then perform some basic analysis in order to get some possible insight. Firstly, we import libraries.
df.head(): we can see the first 5 data in the data set.
df.info(): Let’s see if we have any missing values and we will deal with them.
This is a great start as there are no missing values in this dataset to deal with, which obviously is a rare scenario.
Proceeding with data exploration through plotting.
To simplify this data set a bit more, let’s combine the ‘x’, ‘y’ and ‘z’ columns into a ‘volume’ column. We’ll also remove any outliers.
Let’s now look at some distributions of carat, depth, price and volume columns using matplotlib. We will use Seaborn later to try out both libraries a bit.
As can be seen from the second plot above, the diamond depth shows the normal distribution. Majority diamonds range in depth from ~ 60–65%. This is consistent with our intuition regarding diamonds as the optimum depth is needed to fit any diamond into a piece of jewellery such as a ring.
The carat weight distribution is correctly skewed, which means that the majority of diamonds in this dataset have an average low carat weight.
Similarly, the price of diamonds is skewed towards the correct, again consistent with an intuitive observation.
Next, let’s look at the correlations of the different variables in this data set.
We will divide the data and use the end as our ‘test’ set. Next, we will use the test data to predict prices after removing the first set and the ‘price’ column to train our model.
Let’s break the data into test and training datasets and discover the optimum ratio for maximum accuracy. We will include linear regression (for univariate regression).
We can see what is the relationship of each of these variables with price.
Thanks for reading this article I hope it’s helpful to you all! If there is a problem, you can contact me. Bye bye!
Images:https://www.scientificamerican.com/article/diamond-smartphones/