I was able to find a small dataset that has Nike shoes reviews and ratings in April 2020.
So I ran a simple analysis to find out the most popular sneakers and their ratings.
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
df = pd.read_csv('nike2020.csv')
Getting rid of any unnecessary columns for the simplicity.
df.columns
df.drop(columns=['URL', 'Listing Price', 'Sale Price', 'Discount', 'Brand', 'Images', 'Last Visited'], inplace=True)
df.head()
df.isna().sum()
Since there are duplicated product, I removed the duplicates using the groupby function. I used mean rating for any duplicated products, and sum of reviews for duplicacted products.
nike = df.groupby('Product Name').agg({'Rating':'mean', 'Reviews':'sum'}).reset_index()
nike.head()
First, I sorted the dataframe by the number of reviews rather than the rating.
top10reviews = nike.sort_values(by=['Reviews', 'Rating'], ascending=False)[:10]
fig = px.bar(top10reviews, x='Reviews', y='Product Name',
orientation='h', color='Rating', width=980, height=400,
color_continuous_scale="Blues")
fig.update_layout(template='none', yaxis={'categoryorder':'total ascending'},
title='Top 10 Nike Shoes in April 2020')
fig.update_xaxes(automargin=True, title_text='Number of Reviews',
showline=True, linewidth=2, mirror=True, showgrid=False)
fig.update_yaxes(automargin=True, title_text='', showline=True, linewidth=2, mirror=True)
fig.show()
Air Jordan 10 Retro has the most reviews and the highest rating. And Nike Air Force 1 '07 has the second most reviews, but the worst rating of all top 10 reviewed products. Also, it seems like the Flyknit model has a very good rating overall.
Interestingly, Air Jordan 10 Retro was discontinued from NIKEiD, but Air Force 1 was continued.
Since this is a very small dataset, I will need bigger data and in-depth analysis to confirm.