Nike Sneakers Analysis

I was able to find a small dataset that has Nike shoes reviews and ratings in April 2020.
So I ran a simple analysis to find out the most popular sneakers and their ratings.

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
In [2]:
df = pd.read_csv('nike2020.csv')

Getting rid of any unnecessary columns for the simplicity.

In [3]:
df.columns
Out[3]:
Index(['URL', 'Product Name', 'Product ID', 'Listing Price', 'Sale Price',
       'Discount', 'Brand', 'Description', 'Rating', 'Reviews', 'Images',
       'Last Visited'],
      dtype='object')
In [4]:
df.drop(columns=['URL', 'Listing Price', 'Sale Price', 'Discount', 'Brand', 'Images', 'Last Visited'], inplace=True)
In [5]:
df.head()
Out[5]:
Product Name Product ID Description Rating Reviews
0 Nike Air Force 1 '07 Essential CJ1646-600 Let your shoe game shimmer in the Nike Air For... 0.0 0
1 Nike Air Force 1 '07 CT4328-101 The legend lives on in the Nike Air Force 1 '0... 0.0 0
2 Nike Air Force 1 Sage Low LX CI3482-200 Taking both height and craft to new levels, th... 0.0 0
3 Nike Air Max Dia SE CD0479-200 Designed for a woman's foot, the Nike Air Max ... 0.0 0
4 Nike Air Max Verona CZ6156-101 Pass on the good vibes in the Nike Air Max Ver... 0.0 0
In [6]:
df.isna().sum()
Out[6]:
Product Name    0
Product ID      0
Description     3
Rating          0
Reviews         0
dtype: int64

Since there are duplicated product, I removed the duplicates using the groupby function. I used mean rating for any duplicated products, and sum of reviews for duplicacted products.

In [7]:
nike = df.groupby('Product Name').agg({'Rating':'mean', 'Reviews':'sum'}).reset_index()
In [8]:
nike.head()
Out[8]:
Product Name Rating Reviews
0 Air Jordan 1 Jester XX Low Laced 5.00 2
1 Air Jordan 1 Jester XX Low Laced SE 0.00 0
2 Air Jordan 1 Low 4.60 68
3 Air Jordan 1 Low SE 0.00 0
4 Air Jordan 1 Mid 2.25 61

First, I sorted the dataframe by the number of reviews rather than the rating.

In [9]:
top10reviews = nike.sort_values(by=['Reviews', 'Rating'], ascending=False)[:10]
In [10]:
fig = px.bar(top10reviews, x='Reviews', y='Product Name', 
             orientation='h', color='Rating', width=980, height=400,
             color_continuous_scale="Blues")

fig.update_layout(template='none', yaxis={'categoryorder':'total ascending'},
                 title='Top 10 Nike Shoes in April 2020')
fig.update_xaxes(automargin=True, title_text='Number of Reviews', 
                 showline=True, linewidth=2, mirror=True, showgrid=False)
fig.update_yaxes(automargin=True, title_text='', showline=True, linewidth=2, mirror=True)
fig.show()

Air Jordan 10 Retro has the most reviews and the highest rating. And Nike Air Force 1 '07 has the second most reviews, but the worst rating of all top 10 reviewed products. Also, it seems like the Flyknit model has a very good rating overall.

Interestingly, Air Jordan 10 Retro was discontinued from NIKEiD, but Air Force 1 was continued.

Since this is a very small dataset, I will need bigger data and in-depth analysis to confirm.