Video Game Analysis

Table of Contents

1. Data Cleaning

2. Data Analysis and Visuazliation

2.1 Data Information

  • 2.1.a Correlations
  • 2.1.b Histograms

2.2 Best Video Games

  • 2.2.a All Sales
  • 2.2.b Sales by Regions
  • 2.2.c Sales by Genres
  • 2.2.d Sales by Platforms
  • 2.3.a Total Sales by Generations
  • 2.3.b The Most Famous Genres in Each Generation
  • 2.4.c The Most Famous Games in Each Generation

2.4 Video Games by Regions

  • 2.4.a The Most Popular Game Genres by Regions
  • 2.4.b The Most Popular Platforms in Each Region

2.5 Video Games by Publishers

  • 2.5.a Publishers with the Most Games
  • 2.5.b Publishers with the Most High Ranking Games
  • 2.5.c Publishers with the Highest Sales

3. Machine Learning

  • #### 3.a Linear Regression
  • #### 3.b K-Nearest Neighbors
  • #### 3.c MLPClassifier Neighbors
  • #### 3.d TensorFlow

4. Conclusion



Data Information

Name Data
Rank Ranking of overall sales
Name The games name
Platform Platform of the games release (i.e. PC,PS4, etc.)
Year Year of the game's release
Genre Genre of the game
Publisher Publisher of the game
NA_Sales Sales in North America (in millions)
EU_Sales Sales in Europe (in millions)
JP_Sales Sales in Japan (in millions)
Other_Sales Sales in the rest of the world (in millions)
Global_Sales Total worldwide sales
In [225]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from collections import Counter
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
import tensorflow as tf
In [2]:
vg_2016 = pd.read_csv('vgsales.csv')
vg_2019 = pd.read_csv('vgsales_2019.csv')
ps4 = pd.read_csv('ps4.csv', header=0, encoding='unicode_escape')
xbox = pd.read_csv('xboxone.csv', header=0, encoding='unicode_escape')
In [3]:
vg_2016.head()
Out[3]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1 Wii Sports Wii 2006.0 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009.0 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37
In [4]:
vg_2016.describe()
Out[4]:
Rank Year NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
count 16598.000000 16327.000000 16598.000000 16598.000000 16598.000000 16598.000000 16598.000000
mean 8300.605254 2006.406443 0.264667 0.146652 0.077782 0.048063 0.537441
std 4791.853933 5.828981 0.816683 0.505351 0.309291 0.188588 1.555028
min 1.000000 1980.000000 0.000000 0.000000 0.000000 0.000000 0.010000
25% 4151.250000 2003.000000 0.000000 0.000000 0.000000 0.000000 0.060000
50% 8300.500000 2007.000000 0.080000 0.020000 0.000000 0.010000 0.170000
75% 12449.750000 2010.000000 0.240000 0.110000 0.040000 0.040000 0.470000
max 16600.000000 2020.000000 41.490000 29.020000 10.220000 10.570000 82.740000
In [5]:
vg_2016.columns
Out[5]:
Index(['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher', 'NA_Sales',
       'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales'],
      dtype='object')
In [6]:
vg_2016.corr()
Out[6]:
Rank Year NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
Rank 1.000000 0.178814 -0.401362 -0.379123 -0.267785 -0.332986 -0.427407
Year 0.178814 1.000000 -0.091402 0.006014 -0.169316 0.041058 -0.074735
NA_Sales -0.401362 -0.091402 1.000000 0.767727 0.449787 0.634737 0.941047
EU_Sales -0.379123 0.006014 0.767727 1.000000 0.435584 0.726385 0.902836
JP_Sales -0.267785 -0.169316 0.449787 0.435584 1.000000 0.290186 0.611816
Other_Sales -0.332986 0.041058 0.634737 0.726385 0.290186 1.000000 0.748331
Global_Sales -0.427407 -0.074735 0.941047 0.902836 0.611816 0.748331 1.000000

1. Data Cleaning

In [7]:
vg_2016.isna().sum()
Out[7]:
Rank              0
Name              0
Platform          0
Year            271
Genre             0
Publisher        58
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
dtype: int64

Lots of NULL values in Year and Publisher. Let's check.

In [8]:
vg_2019.isna().sum()
Out[8]:
Rank                  0
Name                  0
basename              0
Genre                 0
ESRB_Rating       32169
Platform              0
Publisher             0
Developer            17
VGChartz_Score    55792
Critic_Score      49256
User_Score        55457
Total_Shipped     53965
Global_Sales      36377
NA_Sales          42828
PAL_Sales         42603
JP_Sales          48749
Other_Sales       40270
Year                979
Last_Update       46606
url                   0
status                0
Vgchartzscore     54993
img_url               0
dtype: int64

What we can do is, we can get the average year of the games by each platform. For example, if PS2 was released in 2000 and PS3 was released in 2006. So the PS2 game must have been released between 2000 and 2006. So we will use the average year for all games in that platform.

In [9]:
platformYear = vg_2016.groupby('Platform').agg({'Year':'mean'}).reset_index().round(0)
platformYear['Year'] = platformYear['Year'].astype(int)
platformYear.head()
Out[9]:
Platform Year
0 2600 1982
1 3DO 1995
2 3DS 2013
3 DC 2000
4 DS 2008
In [10]:
fillDict = platformYear.set_index('Platform')['Year'].to_dict()
In [11]:
vg_2016['Year'] = vg_2016['Year'].replace(np.nan, vg_2016['Platform'].map(fillDict))
In [12]:
vg_2019['Year'] = vg_2019['Year'].replace(np.nan, vg_2019['Platform'].map(fillDict))

Because the dataset is very large for 2019 data, it takes very long to finish filling the null value. I think I need to find a better way to fill the null value to reduce the time in the future.

In [13]:
vg_2016['Year'].isna().sum()
Out[13]:
0
In [14]:
vg_2019['Year'].isna().sum()
Out[14]:
241
In [15]:
vg_2019['Year'] = vg_2019['Year'].replace(np.nan, 0)
vg_2016['Year'] = vg_2016['Year'].astype(int)
vg_2019['Year'] = vg_2019['Year'].astype(int)

Unfortunately, there are not enough data for publishers data that I can use to seperate the games. Since there is not a good way to guess the publishers from the other data unlike the year, we will have to keep it as null at this time.

In [16]:
vg_2016.loc[vg_2016['Publisher'].isna() == True].head()
Out[16]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
470 471 wwe Smackdown vs. Raw 2006 PS2 2005 Fighting NaN 1.57 1.02 0.0 0.41 3.00
1303 1305 Triple Play 99 PS 1998 Sports NaN 0.81 0.55 0.0 0.10 1.46
1662 1664 Shrek / Shrek 2 2-in-1 Gameboy Advance Video GBA 2007 Misc NaN 0.87 0.32 0.0 0.02 1.21
2222 2224 Bentley's Hackpack GBA 2005 Misc NaN 0.67 0.25 0.0 0.02 0.93
3159 3161 Nicktoons Collection: Game Boy Advance Video V... GBA 2004 Misc NaN 0.46 0.17 0.0 0.01 0.64

I will drop the columns with high null values, and columns not needed.

In [17]:
vg_2019.drop(columns=['basename', 'ESRB_Rating', 'Developer', 'Critic_Score', 'User_Score', 'Total_Shipped', 
                      'VGChartz_Score', 'PAL_Sales', 'Last_Update', 'url', 'status', 'Vgchartzscore', 
                      'img_url'], inplace=True)
In [18]:
vg_2019.head()
Out[18]:
Rank Name Genre Platform Publisher Global_Sales NA_Sales JP_Sales Other_Sales Year
0 1 Wii Sports Sports Wii Nintendo NaN NaN NaN NaN 2006
1 2 Super Mario Bros. Platform NES Nintendo NaN NaN NaN NaN 1985
2 3 Mario Kart Wii Racing Wii Nintendo NaN NaN NaN NaN 2008
3 4 PlayerUnknown's Battlegrounds Shooter PC PUBG Corporation NaN NaN NaN NaN 2017
4 5 Wii Sports Resort Sports Wii Nintendo NaN NaN NaN NaN 2009

Because there is two different rank in 2016 and 2019 data, we will seperate them with [very high, high, medium, low, very low], for each dataset. The two dataframe does not have same amount of data, so it will not be very accurate, but this time we will continue with this method until I found a better way.

In [19]:
vg_2016['Class'] = pd.cut(vg_2016['Rank'], bins=5, labels=['Very High', 'High', 'Medium', 'Low', 'Very Low'])
vg_2019['Class'] = pd.cut(vg_2019['Rank'], bins=5, labels=['Very High', 'High', 'Medium', 'Low', 'Very Low'])
In [20]:
vg_2016.head()
Out[20]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Class
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74 Very High
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24 Very High
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82 Very High
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00 Very High
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37 Very High
In [21]:
vg_2019.head()
Out[21]:
Rank Name Genre Platform Publisher Global_Sales NA_Sales JP_Sales Other_Sales Year Class
0 1 Wii Sports Sports Wii Nintendo NaN NaN NaN NaN 2006 Very High
1 2 Super Mario Bros. Platform NES Nintendo NaN NaN NaN NaN 1985 Very High
2 3 Mario Kart Wii Racing Wii Nintendo NaN NaN NaN NaN 2008 Very High
3 4 PlayerUnknown's Battlegrounds Shooter PC PUBG Corporation NaN NaN NaN NaN 2017 Very High
4 5 Wii Sports Resort Sports Wii Nintendo NaN NaN NaN NaN 2009 Very High

Create a new dataframe by appending 2019 data to 2016 data.

In [22]:
vg = vg_2016.append(vg_2019)
vg.head()
Out[22]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Class
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74 Very High
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24 Very High
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82 Very High
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00 Very High
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37 Very High

Now we will drop any duplicates using name, platfor, year and genre to avoid any mistake.

In [23]:
vg = vg.drop_duplicates(subset=['Name', 'Platform', 'Year', 'Genre'])
In [24]:
vg.shape
Out[24]:
(60045, 12)
In [25]:
vg.head()
Out[25]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Class
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74 Very High
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24 Very High
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82 Very High
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00 Very High
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37 Very High

Now we will add the two datasets: XBox One video games. I will try to clean these dataset for analysis.

In [26]:
ps4.head()
Out[26]:
Game Year Genre Publisher North America Europe Japan Rest of World Global
0 Grand Theft Auto V 2014.0 Action Rockstar Games 6.06 9.71 0.60 3.02 19.39
1 Call of Duty: Black Ops 3 2015.0 Shooter Activision 6.18 6.05 0.41 2.44 15.09
2 Red Dead Redemption 2 2018.0 Action-Adventure Rockstar Games 5.26 6.21 0.21 2.26 13.94
3 Call of Duty: WWII 2017.0 Shooter Activision 4.67 6.21 0.40 2.12 13.40
4 FIFA 18 2017.0 Sports EA Sports 1.27 8.64 0.15 1.73 11.80
In [27]:
xbox.head()
Out[27]:
Pos Game Year Genre Publisher North America Europe Japan Rest of World Global
0 1 Grand Theft Auto V 2014.0 Action Rockstar Games 4.70 3.25 0.01 0.76 8.72
1 2 Call of Duty: Black Ops 3 2015.0 Shooter Activision 4.63 2.04 0.02 0.68 7.37
2 3 Call of Duty: WWII 2017.0 Shooter Activision 3.75 1.91 0.00 0.57 6.23
3 4 Red Dead Redemption 2 2018.0 Action-Adventure Rockstar Games 3.76 1.47 0.00 0.54 5.77
4 5 MineCraft 2014.0 Misc Microsoft Studios 3.23 1.71 0.00 0.49 5.43

First we will change the column names to match with video game data.

In [28]:
ps4.rename(columns={'Game':'Name', 'North America':'NA_Sales', 'Europe':'EU_Sales',
                   'Japan':'JP_Sales', 'Rest of World':'Other_Sales', 'Global':'Global_Sales'}, inplace=True)
xbox.rename(columns={'Pos':'Rank', 'Game':'Name', 'North America':'NA_Sales', 'Europe':'EU_Sales',
                   'Japan':'JP_Sales', 'Rest of World':'Other_Sales', 'Global':'Global_Sales'}, inplace=True)

Also, ps4 data does not have rank column, but we can assume that it is already sorted by the rank. This is because as you can see the xbox data, Pos column is the rank, so we can make a reasonable guess here. Then we will add the class level as a new rank.

In [29]:
ps4['Rank'] = ps4.index + 1

Also we will need to add platforms for both datasets.

In [30]:
ps4['Platform'] = 'PS4'
xbox['Platform'] = 'XOne'

We will fill the NULL data in Year using the platform year. Unfortunately, we cannot guess publisher, so we will leave as null for now.

In [31]:
ps4.isna().sum()
Out[31]:
Name              0
Year            209
Genre             0
Publisher       209
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
Rank              0
Platform          0
dtype: int64
In [32]:
xbox.isna().sum()
Out[32]:
Rank              0
Name              0
Year            108
Genre             0
Publisher       108
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
Platform          0
dtype: int64
In [33]:
ps4['Year'] = ps4['Year'].replace(np.nan, ps4['Platform'].map(fillDict))
xbox['Year'] = xbox['Year'].replace(np.nan, xbox['Platform'].map(fillDict))
In [34]:
ps4['Year'] = ps4['Year'].astype(int)
xbox['Year'] = xbox['Year'].astype(int)

Now we will add the class for each dataset using the same method for video sales dataset.

In [35]:
ps4['Class'] = pd.cut(ps4['Rank'], bins=5, labels=['Very High', 'High', 'Medium', 'Low', 'Very Low'])
xbox['Class'] = pd.cut(xbox['Rank'], bins=5, labels=['Very High', 'High', 'Medium', 'Low', 'Very Low'])
In [36]:
ps4.head()
Out[36]:
Name Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Rank Platform Class
0 Grand Theft Auto V 2014 Action Rockstar Games 6.06 9.71 0.60 3.02 19.39 1 PS4 Very High
1 Call of Duty: Black Ops 3 2015 Shooter Activision 6.18 6.05 0.41 2.44 15.09 2 PS4 Very High
2 Red Dead Redemption 2 2018 Action-Adventure Rockstar Games 5.26 6.21 0.21 2.26 13.94 3 PS4 Very High
3 Call of Duty: WWII 2017 Shooter Activision 4.67 6.21 0.40 2.12 13.40 4 PS4 Very High
4 FIFA 18 2017 Sports EA Sports 1.27 8.64 0.15 1.73 11.80 5 PS4 Very High
In [37]:
xbox.head()
Out[37]:
Rank Name Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Platform Class
0 1 Grand Theft Auto V 2014 Action Rockstar Games 4.70 3.25 0.01 0.76 8.72 XOne Very High
1 2 Call of Duty: Black Ops 3 2015 Shooter Activision 4.63 2.04 0.02 0.68 7.37 XOne Very High
2 3 Call of Duty: WWII 2017 Shooter Activision 3.75 1.91 0.00 0.57 6.23 XOne Very High
3 4 Red Dead Redemption 2 2018 Action-Adventure Rockstar Games 3.76 1.47 0.00 0.54 5.77 XOne Very High
4 5 MineCraft 2014 Misc Microsoft Studios 3.23 1.71 0.00 0.49 5.43 XOne Very High

First let's combine this two dataset.

In [38]:
consoles = ps4.append(xbox)
consoles.head()
Out[38]:
Name Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Rank Platform Class
0 Grand Theft Auto V 2014 Action Rockstar Games 6.06 9.71 0.60 3.02 19.39 1 PS4 Very High
1 Call of Duty: Black Ops 3 2015 Shooter Activision 6.18 6.05 0.41 2.44 15.09 2 PS4 Very High
2 Red Dead Redemption 2 2018 Action-Adventure Rockstar Games 5.26 6.21 0.21 2.26 13.94 3 PS4 Very High
3 Call of Duty: WWII 2017 Shooter Activision 4.67 6.21 0.40 2.12 13.40 4 PS4 Very High
4 FIFA 18 2017 Sports EA Sports 1.27 8.64 0.15 1.73 11.80 5 PS4 Very High
In [39]:
consoles.shape
Out[39]:
(1647, 12)

Now to total dataset.

In [40]:
total = vg.append(consoles)
total.head()
Out[40]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Class
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74 Very High
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24 Very High
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82 Very High
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00 Very High
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37 Very High
In [41]:
total = total.drop_duplicates(subset=['Name', 'Platform', 'Year', 'Genre'])

Final Dataset is here!

In [42]:
total.head()
Out[42]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Class
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74 Very High
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24 Very High
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82 Very High
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00 Very High
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37 Very High
In [43]:
total.shape
Out[43]:
(60489, 12)
In [44]:
total.isna().sum()
Out[44]:
Rank                0
Name                0
Platform            0
Year                0
Genre               0
Publisher         288
NA_Sales        39113
EU_Sales        43449
JP_Sales        40334
Other_Sales     37680
Global_Sales    35816
Class               0
dtype: int64
In [122]:
percent_missing = total.isnull().sum() * 100 / len(total)
missing_values = pd.DataFrame({'column_name': total.columns,
                               'percent_missing': percent_missing})
missing_values
Out[122]:
column_name percent_missing
Rank Rank 0.000000
Name Name 0.000000
Platform Platform 0.000000
Year Year 0.000000
Genre Genre 0.000000
Publisher Publisher 0.476120
NA_Sales NA_Sales 64.661343
EU_Sales EU_Sales 71.829589
JP_Sales JP_Sales 66.679892
Other_Sales Other_Sales 62.292318
Global_Sales Global_Sales 59.210766
Class Class 0.000000
Generation Generation 0.000000
In [45]:
total.sort_values(['Global_Sales', 'Rank'], ascending=[False, True], inplace=True)

There are still lots of null values in the half of the columns, but for now, we will continue.

2. Data Analysis and Visualizations

1. Data Information

a) Correlations

In [78]:
corr = total.corr()
In [92]:
fig = go.Figure(data=go.Heatmap(z=corr, x=corr.index, y=corr.columns, 
                                hoverongaps=False, colorscale="Reds"))
fig.update_layout(title='Correlations Between Columns')
fig.show()

b) Histogram

In [101]:
fig = px.histogram(total, x="Platform",
                   title='Histogram of Platform',
                   opacity=0.8,
                   color_discrete_sequence=['indianred'])
fig.update_layout(template=None)
fig.show()
In [106]:
fig = px.histogram(total, x="Year",
                   title='Histogram of Platform',
                   opacity=0.8,
                   color_discrete_sequence=['paleturquoise'])
fig.update_layout(template=None)
fig.update_xaxes(range=[1960, 2020])
fig.show()
In [109]:
fig = px.histogram(total, x="Genre",
                   title='Histogram of Genre',
                   opacity=0.8,
                   color_discrete_sequence=['mediumpurple'])
fig.update_layout(template=None)
fig.show()

2. Best Video Games

a) All sales

In [46]:
top10all = total[:10]
In [47]:
fig = px.bar(top10all, x='Global_Sales', y='Name', orientation='h', color='Global_Sales',
            color_continuous_scale="RdBu",  labels={'Global_Sales':'Global Sales'})

fig.update_layout(template='none', yaxis={'categoryorder':'total ascending'},
                 title='Top 10 Video Games')
fig.update_xaxes(automargin=True, title_text='Global Sales', 
                 showline=True, linewidth=2, mirror=True, showgrid=False)
fig.update_yaxes(automargin=True, title_text='', showline=True, linewidth=2, mirror=True)
fig.show()

b) Sales by each regions

In [238]:
fig = go.Figure()

fig.add_trace(go.Violin(y=total['NA_Sales'], name='North America'))
fig.add_trace(go.Violin(y=total['EU_Sales'], name='Europe'))
fig.add_trace(go.Violin(y=total['JP_Sales'], name='Japan'))
fig.add_trace(go.Violin(y=total['Other_Sales'], name='Others'))

fig.update_traces(meanline_visible=True)
fig.update_layout(template=None, violingap=0, violinmode='overlay',
                 title='Sales by Regions')
fig.show()
In [49]:
NA_top5 = total.sort_values(by='NA_Sales', ascending=False)[:5]
EU_top5 = total.sort_values(by='EU_Sales', ascending=False)[:5]
JP_top5 = total.sort_values(by='JP_Sales', ascending=False)[:5]
OT_top5 = total.sort_values(by='Other_Sales', ascending=False)[:5]
In [50]:
fig = make_subplots(rows=2, cols=2, shared_yaxes=True,
                   subplot_titles=("North America", "Europe", "Japan", "Others"))

fig.add_trace(go.Bar(x=NA_top5['Name'], y=NA_top5['NA_Sales'],
                    marker=dict(color=NA_top5['NA_Sales'], coloraxis="coloraxis")),
              row=1, col=1)

fig.add_trace(go.Bar(x=EU_top5['Name'], y=EU_top5['EU_Sales'],
                    marker=dict(color=EU_top5['EU_Sales'], coloraxis="coloraxis")),
              row=1, col=2)

fig.add_trace(go.Bar(x=JP_top5['Name'], y=JP_top5['JP_Sales'],
                    marker=dict(color=JP_top5['JP_Sales'], coloraxis="coloraxis")),
              row=2, col=1)

fig.add_trace(go.Bar(x=OT_top5['Name'], y=OT_top5['Other_Sales'],
                    marker=dict(color=OT_top5['Other_Sales'], coloraxis="coloraxis")),
              row=2, col=2)


fig.update_layout(coloraxis=dict(colorscale='Purples'), 
                  showlegend=False, template=None, height=600,
                  title="Top 5 Video Games By Reigons")
fig.show()

c) By genre

In [51]:
genre_top5 = total.groupby('Genre')[['Name', 'Global_Sales']].apply(lambda x: x.nlargest(5, columns=['Global_Sales']))\
            .reset_index().drop(columns='level_1')
In [52]:
fig = px.bar(genre_top5, x="Genre", y="Global_Sales", color="Name",
            color_discrete_sequence=px.colors.qualitative.Safe)
fig.update_layout(template=None, xaxis={'categoryorder':'total descending'},
                 title='Top 5 Games by Genre')
fig.update_xaxes(title=None)
fig.update_yaxes(title=None)
fig.show()

d) By platform

In [53]:
top10_platforms = total['Platform'].value_counts()[:10].index.to_list()
top10_platforms_vg = total[total['Platform'].isin(top10_platforms)]
top5_genre = total['Genre'].value_counts()[1:6].index.to_list()
top10_platforms_vg = top10_platforms_vg[top10_platforms_vg['Genre'].isin(top5_genre)]
top5_vgs = top10_platforms_vg.groupby(['Platform','Genre']).apply(lambda x: x.nlargest(1, columns='Global_Sales')\
                           [['Name', 'Global_Sales']]).reset_index().drop(columns='level_2')
In [54]:
fig = px.sunburst(top5_vgs, path=['Platform', 'Genre', 'Name'], values='Global_Sales',
                  color='Global_Sales', color_continuous_scale='RdBu', 
                  color_continuous_midpoint=np.average(top5_vgs['Global_Sales'], weights=top5_vgs['Global_Sales']),
                  height=800)
fig.update_layout(title='Top 5 Games by Platform and Genre')
fig.show()

3. Game Trend By Time (1973 - 2020)

We will refer to the generation chart I found online to seperate the time.

Genration By Year

Generation Name Births Start Births End
Generation X 1965 1979
Xennials 1975 1985
Millennials 1980 1994
Gen Z 1995 2012
Gen Alpha 2013 2025
In [55]:
def set_gen(row):
    value = '0'
    if 1965 <= row['Year'] <= 1979:
        value = 'Generation X'
    elif (1980 <= row['Year'] <= 1985):
        value = 'Xennials'
    elif (1986 <= row['Year'] <= 1994):
        value = 'Millennials'
    elif (1995 <= row['Year'] <= 2012):
        value = 'Gen Z'
    elif (2013 <= row['Year'] <= 2025):
        value = 'Gen Alpha'
    return value
In [56]:
total['Generation'] = total.apply(set_gen, axis=1)

a) Total sales by generations

In [57]:
generation_sales = total.groupby('Generation').agg({'Global_Sales':'sum'}).reset_index()
In [58]:
generation_sales = generation_sales.reindex([3, 5, 4, 2, 1, 0])
generation_sales.reset_index(drop=True, inplace=True)
In [59]:
colors = ['lightslategray'] * 6
colors[3] = 'crimson'
genres=['Role-Playing', 'Action', 'Platform', 'Misc', 'Misc', 'Action']

fig = go.Figure(data=[go.Bar(x=generation_sales['Generation'],
                             y=generation_sales['Global_Sales'],
                             marker_color=colors, 
                             hovertemplate='Years: %{text} <br>Total Sales:%{y}',
                             text=['1965-1979', '1980-1985', '1986-1994', '1995-2012', '2013-'],
                             name='')])

fig.update_layout(title_text='Total Sales By Generations', template=None)

b) The most famous genre in each generation

In [60]:
grouped = total.groupby(['Generation', 'Genre'])['Rank'].count().to_frame()
generations_genres = grouped.loc[grouped.groupby(level='Generation')['Rank'].idxmax()]
generations_genres = generations_genres.reset_index().reindex([3, 5, 4, 2, 1, 0])
generations_genres.reset_index(drop=True, inplace=True)
In [239]:
fig = px.sunburst(generations_genres, path=['Generation', 'Genre'],
                  color='Generation', color_discrete_sequence=px.colors.qualitative.Dark2,
                  height=700)
fig.update_layout(title='The most famous genre in each generation')
fig.show()

c) Most Famous Game in Each Generation

In [62]:
grouped_games = total.groupby(['Generation', 'Name']).agg({'Global_Sales':'max'}).sort_values(by='Global_Sales', 
                                                                                              ascending=False)
generations_games = grouped_games.loc[grouped_games.groupby(level='Generation')['Global_Sales'].idxmax()]\
                    .reset_index().reindex([3, 5, 4, 2, 1, 0])
generations_games.reset_index(drop=True, inplace=True)
In [240]:
fig = px.sunburst(generations_games, path=['Generation', 'Name', 'Global_Sales'],
                  color='Generation', color_discrete_sequence=px.colors.qualitative.Dark2,
                  height=700)
fig.update_layout(title='The most famous game in each generation')
fig.show()

4. Video Games by Regions

In [64]:
top600NA = total.sort_values(by='NA_Sales', ascending=False)[:600]
top600NA = top600NA.groupby('Genre', as_index=False).agg({'NA_Sales':'mean'})\
            .sort_values(by='NA_Sales', ascending=False)[:1].reset_index(drop=True)
top600EU = total.sort_values(by='EU_Sales', ascending=False)[:600]
top600EU = top600EU.groupby('Genre', as_index=False).agg({'EU_Sales':'mean'})\
            .sort_values(by='EU_Sales', ascending=False)[:1].reset_index(drop=True)
top600JP = total.sort_values(by='JP_Sales', ascending=False)[:600]
top600JP = top600JP.groupby('Genre', as_index=False).agg({'JP_Sales':'mean'})\
            .sort_values(by='JP_Sales', ascending=False)[:1].reset_index(drop=True)
In [65]:
top_genre_region_dt = {'Region':['North America', 'Europe', 'Japan'],
                    'Genre':[top600NA['Genre'][0], top600EU['Genre'][0], top600JP['Genre'][0]],
                    'Sales':[top600NA['NA_Sales'][0], top600EU['EU_Sales'][0], top600JP['JP_Sales'][0]]}
top_genre_region = pd.DataFrame(data=top_genre_region_dt)
top_genre_region['Sales'] = top_genre_region['Sales'].round(2)
In [66]:
fig = go.Figure(data=[go.Table(header=dict(values=list(top_genre_region.columns),
                                           fill_color='white',
                                           line_color='darkslategray',
                                           align=['right', 'center', 'center'],
                                           font=dict(color='black', size=15)),
                               cells=dict(values=[top_genre_region.Region, top_genre_region.Genre, top_genre_region.Sales],
                                          fill_color = [['lightcoral', 'peachpuff', 'lightyellow']],
                                          align = ['right', 'center', 'center'],
                                          line_color='darkslategray',
                                          font = dict(color = 'darkslategray', size = 13)))])
fig.update_layout(width=800, height=300, title="The Most Popular Genre In Each Region")
fig.show()
In [67]:
NA_platforms = total.groupby('Platform', as_index=False).agg({'NA_Sales':'sum'})\
                .sort_values(by='NA_Sales', ascending=False)[:5].reset_index(drop=True)
EU_platforms = total.groupby('Platform', as_index=False).agg({'EU_Sales':'sum'})\
                .sort_values(by='EU_Sales', ascending=False)[:5].reset_index(drop=True)
JP_platforms = total.groupby('Platform', as_index=False).agg({'JP_Sales':'sum'})\
                .sort_values(by='JP_Sales', ascending=False)[:5].reset_index(drop=True)
In [68]:
fig = make_subplots(1, 3, specs=[[{'type':'domain'}, {'type':'domain'}, {'type':'domain'}]],
                    subplot_titles=['North America', 'Europe', 'Japan'])

fig.add_trace(go.Pie(labels=NA_platforms['Platform'], values=NA_platforms['NA_Sales'], 
                     hole=.3, pull=[0.2, 0, 0, 0, 0]), 1, 1)
fig.add_trace(go.Pie(labels=EU_platforms['Platform'], values=EU_platforms['EU_Sales'], 
                     hole=.3, pull=[0.2, 0, 0, 0, 0]), 1, 2)
fig.add_trace(go.Pie(labels=JP_platforms['Platform'], values=JP_platforms['JP_Sales'], 
                     hole=.3, pull=[0.2, 0, 0, 0, 0]), 1, 3)

fig.update_traces(hoverinfo='label', textinfo='value', textfont_size=15,
                  marker=dict(line=dict(color='#000000', width=2)))

fig.update_layout(title_text='The Most Popular Platform in Each Region',
                 annotations=[dict(text='NA', x=0.145, y=0.465, font_size=20, showarrow=False),
                              dict(text='EU', x=0.50, y=0.465, font_size=20, showarrow=False), 
                              dict(text='JP', x=0.858, y=0.465, font_size=20, showarrow=False)])
fig.show()

5. Video Games by Publishers

a) Publishers with the most game

In [69]:
top10_pub = total.groupby('Publisher').count()['Name'].sort_values(ascending=False)[1:11]\
            .to_frame().reset_index().rename(columns={'Name':'Number of Games'})
In [70]:
fig = px.bar(top10_pub, x='Number of Games', y='Publisher', orientation='h', color='Number of Games',
            color_continuous_scale="PuBu")

fig.update_layout(template='none', yaxis={'categoryorder':'total ascending'},
                 title='Top 10 Publishers with the Most Games')
fig.update_xaxes(automargin=True, title_text='Number of Games', 
                 showline=True, linewidth=2, mirror=True, showgrid=False)
fig.update_yaxes(automargin=True, title_text='', showline=True, linewidth=2, mirror=True)
fig.show()

b) Publishers with the most high ranking games

In [71]:
very_highs = total.loc[total['Class'] == 'Very High']
In [72]:
pub_highs = very_highs.groupby(['Publisher'], as_index=False).agg({'Class':'count'})\
           .sort_values(by='Class', ascending=False)[:10].reset_index(drop=True)\
           .rename(columns={'Class':'Number of High Ranking Games'})
In [73]:
fig = px.bar(pub_highs, x='Number of High Ranking Games', y='Publisher', orientation='h', color='Number of High Ranking Games',
            color_continuous_scale="Burg")

fig.update_layout(template='none', yaxis={'categoryorder':'total ascending'},
                 title='Top 10 Publishers with the High Ranking Games')
fig.update_xaxes(automargin=True, title_text='Number of Games', 
                 showline=True, linewidth=2, mirror=True, showgrid=False)
fig.update_yaxes(automargin=True, title_text='', showline=True, linewidth=2, mirror=True)
fig.show()

c) Publishers with the highest sales

In [74]:
pub_high_sales = total.groupby('Publisher', as_index=False).agg({'Global_Sales':'sum'})\
                .sort_values(by='Global_Sales', ascending=False)[:10].reset_index(drop=True)
In [75]:
fig = px.bar(pub_high_sales, x='Global_Sales', y='Publisher', orientation='h', color='Global_Sales',
            color_continuous_scale="Mint")

fig.update_layout(template='none', yaxis={'categoryorder':'total ascending'},
                 title='Top 10 Publishers with the Highest Sales')
fig.update_xaxes(automargin=True, title_text='Sales in Millions', 
                 showline=True, linewidth=2, mirror=True, showgrid=False)
fig.update_yaxes(automargin=True, title_text='', showline=True, linewidth=2, mirror=True)
fig.show()

3. Machine Learning

First, Let's try to find outliers.

In [111]:
def outlier(data, attributes):
    outlier_list = []
    
    for item in attributes:
        Q1 = np.percentile(data[item], 25)
        Q3 = np.percentile(data[item], 75)
        IQR = Q3 - Q1
        outlier_step = 1.5 * IQR
        outlier_det = data[(data[item] < Q1 - outlier_step) | (data[item] > Q3 + outlier_step)].index
        outlier_list.extend(outlier_det)
    
    outlier_list = Counter(outlier_list)
    multiple_outliers = list(i for i, v in outlier_list.items() if v > 2)
    
    return multiple_outliers
In [120]:
total.loc[outlier(total, ["NA_Sales", "EU_Sales", "JP_Sales", "Other_Sales", "Global_Sales"])]
C:\Users\Muffin\Anaconda3\lib\site-packages\numpy\lib\function_base.py:3826: RuntimeWarning:

Invalid value encountered in percentile

Out[120]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Class Generation

Because there are many data with 0 sales value, all of them considered to be outliers.

Let's do linear regression between global sales and NA sales.

a) Linear Regression

In [124]:
fig = px.scatter(total, x="Global_Sales", y="NA_Sales")
fig.update_layout(template=None)
fig.show()
In [127]:
total.dropna(subset=['NA_Sales', 'Global_Sales'], inplace=True)
In [128]:
x = total.NA_Sales.values.reshape(-1, 1)
Y = total.Global_Sales.values.reshape(-1, 1)
In [129]:
linear = LinearRegression()
linear.fit(x, Y)
Out[129]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [131]:
x_ = np.arange(min(x), max(x), 0.1).reshape(-1,1)
pred = linear.predict(x_)

Yhat = linear.predict(x)
print("r score: ", r2_score(Y, Yhat))
r score:  0.8758973646431796
In [144]:
fig = px.scatter(total, x="NA_Sales", y="Global_Sales", trendline="ols",
                trendline_color_override="red")
fig.update_layout(template=None, title='Linear Regression')
fig.show()
In [137]:
results = px.get_trendline_results(fig)
results.px_fit_results.iloc[0].summary()
Out[137]:
OLS Regression Results
Dep. Variable: y R-squared: 0.876
Model: OLS Adj. R-squared: 0.876
Method: Least Squares F-statistic: 1.509e+05
Date: Sun, 13 Dec 2020 Prob (F-statistic): 0.00
Time: 14:43:48 Log-Likelihood: -15636.
No. Observations: 21376 AIC: 3.128e+04
Df Residuals: 21374 BIC: 3.129e+04
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 0.0594 0.004 16.354 0.000 0.052 0.066
x1 1.7942 0.005 388.400 0.000 1.785 1.803
Omnibus: 15479.825 Durbin-Watson: 1.747
Prob(Omnibus): 0.000 Jarque-Bera (JB): 49218291.111
Skew: 1.865 Prob(JB): 0.00
Kurtosis: 238.045 Cond. No. 1.51


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

b) K-Nearest Neighbors

In [180]:
top10pub = total.groupby(["Publisher"]).sum().filter(["Global_Sales"]).sort_values(by=["Global_Sales"],ascending=False).head(10)
pub_dict = {top10pub.index.values[i]:i for i in range(len(top10pub.index.values))}
vg = total.copy()
vg["Publisher"].replace(pub_dict, inplace=True)
In [183]:
top10val = [i for i in range(len(top10pub.index.values))]
vg = vg.drop(vg.query("Publisher not in @top10val").index)
vg = vg.reset_index(drop=True)
In [192]:
unique_platforms = vg["Platform"].unique()
plat_dict = {unique_platforms[i]:i for i in range(len(unique_platforms))}
vg["Platform"].replace(plat_dict, inplace=True)
In [194]:
unique_genres = vg["Genre"].unique()
genre_dict = {unique_genres[i]:i for i in range(len(unique_genres))}
vg["Genre"].replace(genre_dict, inplace=True)
vg.dropna(inplace=True)
In [197]:
def normalization(column):
  return (column - column.mean()) / column.std()
In [198]:
vg['Year'] = normalization(vg['Year'])
vg['NA_Sales'] = normalization(vg['NA_Sales'])
vg['EU_Sales'] = normalization(vg['EU_Sales'])
vg['JP_Sales'] = normalization(vg['JP_Sales'])
vg['Other_Sales'] = normalization(vg['Other_Sales'])
vg['Global_Sales'] = normalization(vg['Global_Sales'])
In [200]:
vg.head()
Out[200]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Class Generation
0 1 Wii Sports 0 -0.076117 0 0 37.381479 42.255474 10.176579 36.930921 39.402882 Very High Gen Z
1 2 Super Mario Bros. 1 -3.861328 1 0 26.104502 4.956395 18.595182 3.098865 18.987323 Very High Xennials
2 3 Mario Kart Wii 0 0.284380 2 0 14.082390 18.591671 10.231964 14.273562 16.864105 Very High Gen Z
3 4 Wii Sports Resort 0 0.464628 0 0 13.991520 15.849955 8.819633 12.733742 15.509472 Very High Gen Z
4 6 Tetris 2 -3.140335 3 0 20.761341 3.021065 11.422754 2.262962 14.193269 Very High Millennials
In [202]:
X = vg.reset_index().filter(["Year", "Genre", "Publisher", "NA_Sales", "EU_Sales", 
                             "JP_Sales", "Other_Sales", "Global_Sales"]).to_numpy()
y = vg.reset_index().filter(["Platform"]).to_numpy()
In [204]:
X_train, X_test, y_train, y_test = train_test_split(X, y)
print("Separation from {} elements to : train = {} ; test = {}.".format(X.shape[0], X_train.shape[0], X_test.shape[0]))
Separation from 6820 elements to : train = 5115 ; test = 1705.

We will try to use a K-Nearest Neighbour.

In [206]:
knn = KNeighborsClassifier(len(unique_platforms))
knn.fit(X_train, y_train.ravel())
score = knn.score(X_test, y_test)
print("Score :", score)
Score : 0.33548387096774196

Let's do random test with Nitendo as a publisher.

In [218]:
test = np.array([5, 1, 0, 1, 2, 2, 1, 2])
test = test.reshape((1, 8))

print("For " + str(top10pub.index.values[test[0, 2]]) + " as a publisher and " 
      + str(unique_genres[test[0,1]]) + " as a genre, the platform predicted is : \n" 
      + str(unique_platforms[np.argmax(knn.predict(test))]))
For Nintendo as a publisher and 1 as a genre, the platform predicted is : 
Wii

c) MLPClassifier Neighbor

According to Wikipedia, A multilayer perceptron (MLP) is a class of feedforward artificial neural network (ANN). Additionally, the Deep Learning Book by Goodfellow states "A multilayer perceptron is just a mathematical function mapping some set of input values to output values. The function is formed by composing many simpler functions. We can think of each application of a different mathematical function as providing a new representation of the input."

In [221]:
model_sk = MLPClassifier(hidden_layer_sizes=(8, 16, 32, 64))
model_sk.fit(X_train, y_train.ravel())
score = model_sk.score(X_test, y_test)
print("Score :", score)
Score : 0.5102639296187683
C:\Users\Muffin\Anaconda3\lib\site-packages\sklearn\neural_network\multilayer_perceptron.py:566: ConvergenceWarning:

Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.

Let's run the same test as above.

In [222]:
test = np.array([5, 1, 0, 1, 2, 2, 1, 2])
test = test.reshape((1, 8))

print("For " + str(top10pub.index.values[test[0, 2]]) + " as a publisher and " 
      + str(unique_genres[test[0,1]]) + " as a genre, the platform predicted is : \n" 
      + str(unique_platforms[np.argmax(model_sk.predict(test))]))
For Nintendo as a publisher and 1 as a genre, the platform predicted is : 
Wii

d) TensorFlow

In [223]:
X_train = np.array(X_train).astype('float32')
X_test = np.array(X_test).astype('float32')
In [232]:
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(8,)))
model.add(tf.keras.layers.Dense(8, activation="relu"))
model.add(tf.keras.layers.Dense(16, activation="relu"))
model.add(tf.keras.layers.Dense(32, activation="relu"))
model.add(tf.keras.layers.Dense(64, activation="relu"))
model.add(tf.keras.layers.Dense(len(unique_platforms), activation="softmax"))
model.build(X[0].shape)
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_4 (Dense)              (None, 8)                 72        
_________________________________________________________________
dense_5 (Dense)              (None, 16)                144       
_________________________________________________________________
dense_6 (Dense)              (None, 32)                544       
_________________________________________________________________
dense_7 (Dense)              (None, 64)                2112      
_________________________________________________________________
dense_8 (Dense)              (None, 33)                2145      
=================================================================
Total params: 5,017
Trainable params: 5,017
Non-trainable params: 0
_________________________________________________________________
In [241]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model.compile(loss=loss_fn, optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=150, validation_data=(X_test, y_test))
In [234]:
loss_curve = history.history["loss"]
acc_curve = history.history["accuracy"]
loss_val_curve = history.history["val_loss"]
acc_val_curve = history.history["val_accuracy"]
In [236]:
plt.plot(loss_curve,label="Train")
plt.plot(loss_val_curve,label="Validation")
plt.legend(loc='upper right')
plt.title("Loss")
plt.show()
plt.plot(acc_curve,label="Train")
plt.plot(acc_val_curve,label="Validation")
plt.legend(loc='lower right')
plt.title("Accuracy")
plt.show()

Let's do the test again with TensorFlow.

In [237]:
test = np.array([5, 1, 0, 1, 2, 2, 1, 2])
test = test.reshape((1, 8))

print("For " + str(top10pub.index.values[test[0, 2]]) + " as a publisher and " 
      + str(unique_genres[test[0,1]]) + " as a genre, the platform predicted is : \n" 
      + str(unique_platforms[np.argmax(model.predict(test))]))
For Nintendo as a publisher and 1 as a genre, the platform predicted is : 
XOne

From this results, I will need to find a better model to predict. There are so many different machine learning models, and even after going over tutorials, I found it very difficult utilizing the right models. If I continue to learn, I hope to build the good models! (Thanks for the tutorial at, https://www.kaggle.com/aurbcd/simple-look-at-vgsales-data-viz-science)

4. Conclusion

Completing this project took a while, but I had lots of fun doing this analysis. As a person who loves video games and online games, this surprised me that not all my favorite games were on any top lists.

Also, I combined a few different data from the other times, but I wished I had a full date with fewer missing values. In some parts, I realized I need to use better functions to shorten the processing time. Since the data was pretty big, I need to find better ways to clean and process data.

Learning new machine learning models was also the fun part of this project. Although I do not have in-depth knowledge of machine learning, this project helped me understand them. It was challenging but also very interesting.

In conclusion, I still have lots to learn and improve, but I'm very excited to finish this project and move onto the new ones!