Pokémon was created in 1996 by a Japanese man named Satoshi Tajiri. It was originally released as a video game called "Pocket Monsters: Red and Green" for the Nintendo GameBoy. The game achieved explosive popularity in Japan, and then internationally, due to the adorable design of the Pokémon and themes of exploration, friendship, child-like freedom and joy in the games. Shortly after the orginial game was released, the franchise rebranded to Pokémon and the collectable card game was introduced. In the years since, a multitude of additional video games, comics, animated shows, and movies have been released.
The premise of the Pokémon franchise is that humans share a world with creatures called Pokémon that, by their nature, want to be captured, tamed, and fight battles against other Pokémon. Those who travel to befriend and capture Pokémon are called trainers. The player takes on that role in the video games and collectable card game.
Both games are based largely on probability, strategy, and general game theory. For example, in both types of games, Pokémon have an elemental type or a combination of types, such as Grass, Water, Fire, Steel, Psychic, etc. Additionally, Pokémon have health points, and various attacks that can be used. It is important to note that there are differences in the types, health points, and attack strenghts of various Pokémon between the card game and the video games.
Given those differences, health points act as a basic metric of game balance and Pokémon strength. Furthermore, for each type of game, a mathematical assesement could be made of the comparitive strengths and advantages for each type of Pokémon. The card game in particular is very dependent on probability, due to the composition of decks and the game rules. Decks are comprised of 60 cards, which can be drawn at random, and must be built with a balance of Pokémon, trainer, and energy cards in mind.
For this project, we will take data from two different sources, one for the card game, and the other for the video games. The sources for each dataset can be found here at the links below.
Card Game: https://www.kaggle.com/datasets/adampq/pokemon-tcg-all-cards-1999-2023?rvi=1
Video Game: https://data.world/data-society/pokemon-with-stats
import numpy as np
import pandas as pd
import requests
import io
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap, ListedColormap
import csv
import ast
from itertools import cycle, islice
# function for reading csv from a url
def read_csv_from_url(url):
'''
Reads csv file into a Pandas DataFrame given a url
input: url for csv file
output: Pandas DataFrame
'''
s=requests.get(url).content
return pd.read_csv(io.StringIO(s.decode('utf-8')))
# csv files containing the data
dataloc= 'https://unixweb.kutztown.edu/~ibeck950/csc223/'
card_url = dataloc + 'pokemontcg.csv'
video_url = dataloc + 'Pokemon.csv'
card_df = read_csv_from_url(card_url)
video_df = read_csv_from_url(video_url)
display(card_df.head())
display(video_df.head())
id | set | series | publisher | generation | release_date | artist | name | set_num | types | ... | retreatCost | convertedRetreatCost | rarity | flavorText | nationalPokedexNumbers | legalities | resistances | rules | regulationMark | ancientTrait | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | base1-1 | Base | Base | WOTC | First | 1/9/1999 | Ken Sugimori | Alakazam | 1 | ['Psychic'] | ... | ['Colorless', 'Colorless', 'Colorless'] | 3.0 | Rare Holo | Its brain can outperform a supercomputer. Its ... | [65] | {'unlimited': 'Legal'} | NaN | NaN | NaN | NaN |
1 | base1-2 | Base | Base | WOTC | First | 1/9/1999 | Ken Sugimori | Blastoise | 2 | ['Water'] | ... | ['Colorless', 'Colorless', 'Colorless'] | 3.0 | Rare Holo | A brutal Pokémon with pressurized water jets o... | [9] | {'unlimited': 'Legal'} | NaN | NaN | NaN | NaN |
2 | base1-3 | Base | Base | WOTC | First | 1/9/1999 | Ken Sugimori | Chansey | 3 | ['Colorless'] | ... | ['Colorless'] | 1.0 | Rare Holo | A rare and elusive Pokémon that is said to bri... | [113] | {'unlimited': 'Legal'} | [{'type': 'Psychic', 'value': '-30'}] | NaN | NaN | NaN |
3 | base1-4 | Base | Base | WOTC | First | 1/9/1999 | Mitsuhiro Arita | Charizard | 4 | ['Fire'] | ... | ['Colorless', 'Colorless', 'Colorless'] | 3.0 | Rare Holo | Spits fire that is hot enough to melt boulders... | [6] | {'unlimited': 'Legal'} | [{'type': 'Fighting', 'value': '-30'}] | NaN | NaN | NaN |
4 | base1-5 | Base | Base | WOTC | First | 1/9/1999 | Ken Sugimori | Clefairy | 5 | ['Colorless'] | ... | ['Colorless'] | 1.0 | Rare Holo | Its magical and cute appeal has many admirers.... | [35] | {'unlimited': 'Legal'} | [{'type': 'Psychic', 'value': '-30'}] | NaN | NaN | NaN |
5 rows × 29 columns
number | name | type1 | type2 | total | hp | attack | defense | sp_attack | sp_defense | speed | generation | legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | Mega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 3 | Gigantamax Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
In the Pokémon card game, there are three primary types of cards: Trainers, Energy, and Pokémon. For this project, we will be focusing on just the Pokémon, so we will remove any trainer or energy card entries.
Several of the columns are not needed, so they will be dropped from card_df, since they are not relevant to the analysis being performed. Additionally, several columns will be dropped from video_df for the same reasons.
Some of the datatypes in card_df were converted to strings and must be returned to their intended datatypes. Many items also had unnecessary quotes around them. There is a slight discrepancy in the type names between the card game and the video games. Colorless Pokémon from the card game are equivalent to the Normal Pokémon from the video games, the same is true of the Metal and Steel types, and the Electric and Lightning types. For this analysis, we will use the video game type names.
# removes the trainer and energy cards
card_df = card_df.drop(card_df[card_df['supertype'] == 'Trainer'].index)
card_df = card_df.drop(card_df[card_df['supertype'] == 'Energy'].index)
# in the dataset, the national pokedex number was stored as a string, formated like a list of floats which is certainly
# an interesting choice on the part of the data set creator, since the pokedex number is a unique integer identifier.
# the following lines replace that column with the value of the float inside the string formatted list
card_df['nationalPokedexNumbers'] = card_df['nationalPokedexNumbers'].fillna('[]')
card_df['nationalPokedexNumbers'] = card_df['nationalPokedexNumbers'].apply(lambda x: int(x.strip('[]').split(',')[0]) if x.strip('[]') else None)
# each pokemon has a primary type, and also possibly a secondary type. This splits the list into two seperate columns
card_df['type1'] = card_df['types'].apply(lambda x: (x.strip('[]').split(',')[0]) if x.strip('[]') else 'None')
card_df['type2'] = card_df['types'].apply(lambda x: x.strip('[]').split(',')[1].strip('\'') if (x.strip('[]') and len(x.strip('[]').split(',')) > 1) else 'None')
card_df = card_df.drop(['types'], axis=1)
# remove unneeded quotes
card_df['type1'] = card_df['type1'].apply(lambda x: x.strip('\'\''))
card_df['type2'] = card_df['type2'].apply(lambda x: x.strip('\'\''))
# match the typing between games.
card_df['type1'] = card_df['type1'].apply(lambda x:'Normal' if x == 'Colorless' else x)
card_df['type2'] = card_df['type2'].apply(lambda x:'Normal' if x == 'Colorless' else x)
card_df['type1'] = card_df['type1'].apply(lambda x:'Steel' if x == 'Metal' else x)
card_df['type2'] = card_df['type2'].apply(lambda x:'Steel' if x == 'Metal' else x)
card_df['type1'] = card_df['type1'].apply(lambda x:'Dark' if x == 'Darkness' else x)
card_df['type2'] = card_df['type2'].apply(lambda x:'Dark' if x == 'Darkness' else x)
card_df['type1'] = card_df['type1'].apply(lambda x:'Electric' if x == 'Lightning' else x)
card_df['type2'] = card_df['type2'].apply(lambda x:'Electric' if x == 'Lightning' else x)
# removes unneeded categories
card_df = card_df.drop(['id'], axis=1)
card_df = card_df.drop(['set'], axis=1)
card_df = card_df.drop(['series'], axis=1)
card_df = card_df.drop(['publisher'], axis=1)
card_df = card_df.drop(['release_date'], axis=1)
card_df = card_df.drop(['artist'], axis=1)
card_df = card_df.drop(['set_num'], axis=1)
card_df = card_df.drop(['rarity'], axis=1)
card_df = card_df.drop(['regulationMark'], axis=1)
card_df = card_df.drop(['ancientTrait'], axis=1)
card_df = card_df.drop(['evolvesFrom'], axis=1)
card_df = card_df.drop(['evolvesTo'], axis=1)
card_df = card_df.drop(['supertype'], axis=1)
card_df = card_df.drop(['flavorText'], axis=1)
card_df = card_df.drop(['rules'], axis=1)
card_df = card_df.drop(['retreatCost'], axis=1)
card_df = card_df.drop(['level'], axis=1)
card_df = card_df.drop(['legalities'], axis=1)
card_df = card_df.drop(['abilities'], axis=1)
card_df = card_df.drop(['attacks'], axis=1)
card_df = card_df.drop(['weaknesses'], axis=1)
card_df = card_df.drop(['convertedRetreatCost'], axis=1)
card_df = card_df.drop(['resistances'], axis=1)
video_df = video_df.drop(['legendary'], axis=1)
video_df = video_df.drop(['total'], axis=1)
video_df = video_df.drop(['attack'], axis=1)
video_df = video_df.drop(['defense'], axis=1)
video_df = video_df.drop(['sp_attack'], axis=1)
video_df = video_df.drop(['sp_defense'], axis=1)
video_df = video_df.drop(['speed'], axis=1)
# fill NaN with appropriate values
video_df['type2'] = video_df['type2'].fillna('None')
#sets the index to the name of the pokemon
card_df = card_df.set_index(card_df['name'])
video_df = video_df.set_index(video_df['name'])
display(card_df.head())
display(video_df.head())
generation | name | subtypes | hp | nationalPokedexNumbers | type1 | type2 | |
---|---|---|---|---|---|---|---|
name | |||||||
Alakazam | First | Alakazam | ['Stage 2'] | 80.0 | 65.0 | Psychic | None |
Blastoise | First | Blastoise | ['Stage 2'] | 100.0 | 9.0 | Water | None |
Chansey | First | Chansey | ['Basic'] | 120.0 | 113.0 | Normal | None |
Charizard | First | Charizard | ['Stage 2'] | 120.0 | 6.0 | Fire | None |
Clefairy | First | Clefairy | ['Basic'] | 40.0 | 35.0 | Normal | None |
number | name | type1 | type2 | hp | generation | |
---|---|---|---|---|---|---|
name | ||||||
Bulbasaur | 1 | Bulbasaur | Grass | Poison | 45 | 1 |
Ivysaur | 2 | Ivysaur | Grass | Poison | 60 | 1 |
Venusaur | 3 | Venusaur | Grass | Poison | 80 | 1 |
Mega Venusaur | 3 | Mega Venusaur | Grass | Poison | 80 | 1 |
Gigantamax Venusaur | 3 | Gigantamax Venusaur | Grass | Poison | 80 | 1 |
Each of the 1025 species of Pokémon has its own unique identification number assigned to it, for easy reference in the Pokédex, a handheld device used by the player to identify Pokémon in the video games. In real life, it can be found online at https://pokemondb.net/pokedex/national and can be used as a tool when building a card game deck or for looking up facts about a Pokémon. It should be noted that there are variations of each species that may have different attack strenghts, health points, and may even be different types, while still sharing the same identification number.
In this cell, the data is sorted by number in ascending order, and then is multiindexed, first by number and second by name. This allows for easy lookup.
# sort the cards by pokedex number, in ascending order
card_df_num = card_df.sort_values(by=['nationalPokedexNumbers'])
video_df_num = video_df.sort_values(by=['number'])
# multiindex both arrays based on the name and pokedex numbers
tc1 = card_df_num['nationalPokedexNumbers'].values
tc2 = card_df_num['name'].values
indx_card = pd.MultiIndex.from_arrays([tc1, tc2],names=['number', 'name'])
tv1 = video_df_num['number'].values
tv2 = video_df_num['name'].values
indx_video = pd.MultiIndex.from_arrays([tv1, tv2],names=['number', 'name'])
#drop columns used for indexing
card_df_num = card_df_num.drop(['nationalPokedexNumbers'], axis=1)
card_df_num = card_df_num.drop(['name'], axis=1)
video_df_num = video_df_num.drop(['number'], axis=1)
video_df_num = video_df_num.drop(['name'], axis=1)
card_df_num.index = indx_card
video_df_num.index = indx_video
# these can be used to locate any Pokemon by number
display(video_df_num.loc[[52]])
display(card_df_num.loc[[52]])
type1 | type2 | hp | generation | ||
---|---|---|---|---|---|
number | name | ||||
52 | Meowth | Normal | None | 40 | 1 |
Alolan Meowth | Dark | None | 40 | 7 | |
Galarian Meowth | Steel | None | 50 | 8 |
generation | subtypes | hp | type1 | type2 | ||
---|---|---|---|---|---|---|
number | name | |||||
52.0 | Meowth | Third | ['Basic'] | 50.0 | Normal | None |
Meowth | Sixth | ['Basic'] | 70.0 | Normal | None | |
Alolan Meowth | Seventh | ['Basic'] | 60.0 | Dark | None | |
Meowth | Fourth | ['Basic'] | 60.0 | Normal | None | |
Meowth | Seventh | ['Basic'] | 60.0 | Normal | None | |
Meowth | Sixth | ['Basic'] | 60.0 | Normal | None | |
Meowth | Second | ['Basic'] | 50.0 | Normal | None | |
Alolan Meowth | Other | ['Basic'] | 60.0 | Dark | None | |
Team Rocket's Meowth | First | ['Basic'] | 40.0 | Normal | None | |
Galarian Meowth | Eighth | ['Basic'] | 70.0 | Steel | None | |
Galarian Meowth | Eighth | ['Basic'] | 70.0 | Steel | None | |
Meowth δ | Third | ['Basic'] | 50.0 | Dark | 'Metal | |
Meowth | First | ['Basic'] | 40.0 | Normal | None | |
Alolan Meowth | Other | ['Basic'] | 70.0 | Dark | None | |
Meowth | Sixth | ['Basic'] | 60.0 | Normal | None | |
Galarian Meowth | Eighth | ['Basic'] | 60.0 | Steel | None | |
Meowth | Other | ['Basic'] | 60.0 | Normal | None | |
Alolan Meowth | Seventh | ['Basic'] | 60.0 | Dark | None | |
Meowth | First | ['Basic'] | 50.0 | Normal | None | |
Meowth | First | ['Basic'] | 50.0 | Normal | None | |
Meowth | Seventh | ['Basic'] | 60.0 | Normal | None | |
Meowth | Fourth | ['Basic'] | 60.0 | Normal | None | |
Meowth | Fourth | ['Basic'] | 50.0 | Normal | None | |
Meowth | Sixth | ['Basic'] | 60.0 | Normal | None | |
Meowth | Sixth | ['Basic'] | 60.0 | Normal | None | |
Galarian Meowth | Eighth | ['Basic'] | 70.0 | Steel | None | |
Meowth | Fifth | ['Basic'] | 60.0 | Normal | None | |
Meowth | Third | ['Basic'] | 50.0 | Normal | None | |
Meowth | Seventh | ['Basic'] | 70.0 | Normal | None | |
Meowth | Third | ['Basic'] | 50.0 | Normal | None | |
Meowth | First | ['Basic'] | 50.0 | Normal | None | |
Rocket's Meowth | Third | ['Basic'] | 60.0 | Dark | None | |
Meowth | Fifth | ['Basic'] | 60.0 | Normal | None | |
Meowth | Third | ['Basic'] | 50.0 | Normal | None | |
Giovanni's Meowth | First | ['Basic'] | 50.0 | Normal | None | |
Meowth δ | Third | ['Basic'] | 40.0 | Dark | None | |
Alolan Meowth | Seventh | ['Basic'] | 70.0 | Dark | None | |
Alolan Meowth | Seventh | ['Basic'] | 60.0 | Dark | None | |
Meowth | Third | ['Basic'] | 50.0 | Normal | None | |
Meowth | Second | ['Basic'] | 40.0 | Normal | None | |
Giovanni's Meowth | First | ['Basic'] | 40.0 | Normal | None | |
Meowth | Sixth | ['Basic'] | 60.0 | Normal | None | |
Meowth | Eighth | ['Basic'] | 70.0 | Normal | None | |
Galarian Meowth | Eighth | ['Basic'] | 70.0 | Steel | None | |
Meowth | Fifth | ['Basic'] | 60.0 | Normal | None | |
Galarian Meowth | Eighth | ['Basic'] | 60.0 | Steel | None | |
Meowth VMAX | Eighth | ['VMAX'] | 300.0 | Normal | None | |
Meowth V | Eighth | ['Basic', 'V'] | 180.0 | Normal | None | |
Alolan Meowth | Seventh | ['Basic'] | 70.0 | Dark | None | |
Meowth | Ninth | ['Basic'] | 70.0 | Normal | None | |
Meowth | Second | ['Basic'] | 50.0 | Normal | None | |
Meowth | Fifth | ['Basic'] | 70.0 | Normal | None |
Each Pokémon has a primary type, and many, but not all, have a secondary type. We will study the distribution of type in the video games, counting both the primary and secondary types. We will add a None category for those who do not have a secondary type.
# create the labels, corresponding colors, and count of each type, for the video game
labels_video = ['Bug', 'Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire', 'Flying','Ghost','Grass', 'Ground', 'Ice', 'Normal', 'Poison', 'Psychic', 'Rock', 'Steel', 'Water']
colors_video = ['olivedrab','darkslategrey', 'gold', 'yellow', 'pink', 'brown', 'red', 'skyblue','indigo','green', 'brown', 'cyan','lightgrey', 'magenta','purple', 'khaki','grey', 'blue']
primary_video_type_data = [video_df['type1'].value_counts()[type] for type in labels_video]
secondary_video_type_data = [video_df['type2'].value_counts()[type] for type in labels_video]
video_type_data = [primary + secondary for primary, secondary in zip(primary_video_type_data, secondary_video_type_data)]
#create a plot of each type
fig, ax_video = plt.subplots()
vtotal = sum(video_type_data) - video_df['type2'].value_counts()['None']
ax_video.set_title('Distribution of Types: Pokémon Video Games', loc='center')
patches, _ = ax_video.pie(video_type_data, colors = colors_video)
legend_entries_video = [f'{vlabel} ({vcount/vtotal*100:.1f}%)' for vlabel, vcount in zip(labels_video, video_type_data)]
_ = ax_video.legend(legend_entries_video, loc='upper right', bbox_to_anchor=(1.5, 1.075))
The type distribution in the video games is much more uniform than the card game. This is likely due to that fact that since the cards are collectibles, many different versions of each Pokémon are made, often with unique traits to appeal to collectors. Additionally, the video games types have been more or less consistent since the first Pokémon game, whereas the card game has gone through type classification changes with each generation of cards published.
Since most Pokémon in the video games also have a subtype, we will perform an analysis of what sub-type is most common for each primary type.
labels_w_none = ['Bug', 'Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire', 'Flying','Ghost','Grass', 'Ground', 'Ice', 'None', 'Normal', 'Poison', 'Psychic', 'Rock', 'Steel', 'Water']
cmapv = ListedColormap(['olivedrab','darkslategrey', 'gold', 'yellow', 'pink', 'brown', 'red', 'skyblue','indigo','green', 'brown', 'cyan','whitesmoke','lightgrey', 'magenta','purple', 'khaki','grey', 'blue'])
type_counts = {type1: {type2: 0 for type2 in labels_w_none} for type1 in labels_video}
# Iterate over each row in the dataframe
for index, row in video_df.iterrows():
type1 = row['type1']
type2 = row['type2']
# Increment the count for the corresponding type2 under type1
type_counts[type1][type2] += 1
# Convert the dictionary to a list of lists
type_counts_list = [[type_counts[type1][type2] for type2 in labels_w_none] for type1 in labels_video]
type_counts_df = pd.DataFrame(type_counts_list, columns=labels_w_none, index=labels_video).transpose()
type_percents_list = [[type_counts[type1][type2]/sum(type_counts_df[type1])*100 for type2 in labels_w_none] for type1 in labels_video]
type_percents_df = pd.DataFrame(type_percents_list, columns=labels_w_none, index=labels_video)
transp = type_percents_df.transpose()
transp.columns.name = 'Primary Type'
transp.index.name = 'Secondary Type'
display(transp)
type_percent_fig = plt.figure()
type_percents_df.plot(kind='bar', stacked=True, title='Percentage of Subtype per Primary Type', colormap=cmapv)
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.ylabel('Percentage of Total Subtypes')
plt.show()
Primary Type | Bug | Dark | Dragon | Electric | Fairy | Fighting | Fire | Flying | Ghost | Grass | Ground | Ice | Normal | Poison | Psychic | Rock | Steel | Water |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Secondary Type | ||||||||||||||||||
Bug | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4.477612 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.263158 | 0.000000 | 2.380952 | 0.00 | 3.333333 | 0.000000 | 1.459854 |
Dark | 0.000000 | 0.000000 | 0.000000 | 3.174603 | 0.000000 | 6.666667 | 1.492537 | 0.000000 | 2.325581 | 3.157895 | 7.142857 | 0.000000 | 0.000000 | 11.904762 | 1.25 | 3.333333 | 0.000000 | 5.109489 |
Dragon | 0.000000 | 8.333333 | 0.000000 | 3.174603 | 0.000000 | 0.000000 | 2.985075 | 22.222222 | 4.651163 | 7.368421 | 4.761905 | 0.000000 | 0.854701 | 9.523810 | 1.25 | 3.333333 | 7.692308 | 2.189781 |
Electric | 4.819277 | 0.000000 | 2.439024 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.380952 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 5.000000 | 0.000000 | 1.459854 |
Fairy | 2.409639 | 8.333333 | 2.439024 | 3.174603 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.325581 | 5.263158 | 0.000000 | 2.631579 | 4.273504 | 2.380952 | 13.75 | 5.000000 | 10.256410 | 2.919708 |
Fighting | 4.819277 | 4.166667 | 4.878049 | 0.000000 | 0.000000 | 0.000000 | 10.447761 | 0.000000 | 0.000000 | 3.157895 | 0.000000 | 0.000000 | 3.418803 | 4.761905 | 3.75 | 1.666667 | 2.564103 | 2.189781 |
Fire | 2.409639 | 6.250000 | 2.439024 | 1.587302 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6.976744 | 0.000000 | 2.380952 | 2.631579 | 0.000000 | 4.761905 | 1.25 | 5.000000 | 0.000000 | 0.000000 |
Flying | 18.072289 | 12.500000 | 14.634146 | 9.523810 | 8.695652 | 4.444444 | 11.940299 | 0.000000 | 6.976744 | 7.368421 | 9.523810 | 5.263158 | 23.076923 | 7.142857 | 10.00 | 10.000000 | 5.128205 | 5.109489 |
Ghost | 1.204819 | 4.166667 | 7.317073 | 1.587302 | 0.000000 | 2.222222 | 2.985075 | 0.000000 | 0.000000 | 1.052632 | 9.523810 | 2.631579 | 0.000000 | 0.000000 | 5.00 | 0.000000 | 10.256410 | 1.459854 |
Grass | 7.228916 | 4.166667 | 0.000000 | 1.587302 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 25.581395 | 0.000000 | 0.000000 | 0.000000 | 1.709402 | 0.000000 | 2.50 | 3.333333 | 0.000000 | 2.189781 |
Ground | 2.409639 | 0.000000 | 17.073171 | 0.000000 | 0.000000 | 0.000000 | 4.477612 | 0.000000 | 4.651163 | 1.052632 | 0.000000 | 7.894737 | 0.854701 | 4.761905 | 0.00 | 10.000000 | 5.128205 | 7.299270 |
Ice | 0.000000 | 4.166667 | 7.317073 | 3.174603 | 0.000000 | 2.222222 | 0.000000 | 0.000000 | 0.000000 | 3.157895 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.25 | 3.333333 | 0.000000 | 3.649635 |
None | 22.891566 | 29.166667 | 31.707317 | 53.968254 | 86.956522 | 66.666667 | 50.746269 | 44.444444 | 34.883721 | 46.315789 | 42.857143 | 50.000000 | 62.393162 | 40.476190 | 55.00 | 25.000000 | 33.333333 | 53.284672 |
Normal | 0.000000 | 10.416667 | 0.000000 | 3.174603 | 0.000000 | 0.000000 | 2.985075 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.50 | 0.000000 | 0.000000 | 0.000000 |
Poison | 14.457831 | 0.000000 | 0.000000 | 6.349206 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 11.627907 | 16.842105 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 1.666667 | 0.000000 | 2.189781 |
Psychic | 3.614458 | 4.166667 | 9.756098 | 1.587302 | 0.000000 | 6.666667 | 2.985075 | 0.000000 | 0.000000 | 2.105263 | 4.761905 | 10.526316 | 2.564103 | 4.761905 | 0.00 | 3.333333 | 17.948718 | 4.379562 |
Rock | 3.614458 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.492537 | 0.000000 | 0.000000 | 0.000000 | 7.142857 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 7.692308 | 4.379562 |
Steel | 8.433735 | 4.166667 | 0.000000 | 6.349206 | 4.347826 | 6.666667 | 1.492537 | 22.222222 | 0.000000 | 3.157895 | 9.523810 | 5.263158 | 0.000000 | 0.000000 | 2.50 | 6.666667 | 0.000000 | 0.729927 |
Water | 3.614458 | 0.000000 | 0.000000 | 1.587302 | 0.000000 | 4.444444 | 1.492537 | 11.111111 | 0.000000 | 0.000000 | 0.000000 | 7.894737 | 0.854701 | 7.142857 | 0.00 | 10.000000 | 0.000000 | 0.000000 |
<Figure size 640x480 with 0 Axes>
As can be seen in Percentage of Subtype per Primary Type, it is fairly common for Pokémon in the videogames to not have a secondary type. The Fairy type, in particular, very rarely have a secondary type. Water has the most potential subtypes, at 15, followed by Rock type with 14, and the Electric type has 13 possible subtypes. Additionally, every primary type has a Flying subtype.
Unlike the video game, where most Pokémon have a primary type and a second type, in Pokémon: The Card Game most Pokémon cards only have one type, with a handful of exceptions. Given the over 10,000 different Pokémon cards produced since 1999, it would be interesting to study the distribution of each of the 11 types available in the card game.
# create the labels, corresponding colors, and count of each type, for the card game
labels_card = ['Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire', 'Grass', 'Normal', 'Psychic', 'Steel', 'Water']
colors_card = ['darkslategrey', 'gold', 'yellow', 'pink', 'brown', 'red', 'green', 'lightgrey', 'purple', 'grey', 'blue']
card_type_data = [card_df['type1'].value_counts()[type] for type in labels_card]
#create a plot of each type
fig, ax_card = plt.subplots()
ctotal = sum(card_type_data)
ax_card.set_title('Distribution of Types: Pokémon: The Card Game', loc='center')
patches, _ = ax_card.pie(card_type_data, colors = colors_card)
legend_entries_card = [f'{clabel} ({ccount/ctotal*100:.1f}%)' for clabel, ccount in zip(labels_card, card_type_data)]
_ = ax_card.legend(legend_entries_card, loc='upper right', bbox_to_anchor=(1.5, .825))
As can be seen in the figure titled Distribution of Pokémon Types, the most common type of Pokémon Card is the Water type, followed by Grass, Psychic, Normal, Fighting, and then Fire. I was surprised that Fire type did not make up a higher percentage, since it was one of the original three types of Pokémon. The high percentage of Psychic type cards makes sense, given that the past few releases of cards have been Psychic and Dark Type focused. The low percetage of Fairy type cards makes sense as well, given that Fairy type cards were only released from 2013 until 2020, when the type was discontinued.
The sheer number of cards or video game charecters designed for a type of Pokémon is not the only important statistic, since the health points of each Pokémon play a large role in determining it's strength in-game.
card_grouped_type = card_df.groupby(by='type1')
hp_type_card = card_grouped_type['hp'].agg([np.mean, np.median])
hp_type_card.rename(columns = {'mean':'mean hp'}, inplace = True)
hp_type_card.rename(columns = {'median':'median hp'}, inplace = True)
hp_type_card.index.names = ['type']
display(hp_type_card)
# isolate the mean and media hp
mean_hp = hp_type_card['mean hp']
median_hp = hp_type_card['median hp']
# create the bar plot
fig, ax = plt.subplots()
x = np.arange(len(labels_card)) # x-axis positions
width = 0.35 # width of the bars
# plot the mean HP bars
ax.bar(x - width/2, mean_hp, width, label='Mean HP', color=colors_card)
# plot the median HP bars
ax.bar(x + width/2, median_hp, width, label='Median HP', color=colors_card)
# set the labels, title, and legend
ax.set_xlabel('Pokémon Type')
ax.set_ylabel('HP')
ax.set_title('Mean and Median HP by Type for Pokémon: The Card Game')
ax.set_xticks(x)
ax.set_xticklabels(labels_card, rotation=45)
# show the plot
plt.tight_layout()
plt.show()
hp_type_card['mean hp'].plot(kind='box', title='Spread of Mean HP (Card Game)')
mean hp | median hp | |
---|---|---|
type | ||
Dark | 117.568134 | 100.0 |
Dragon | 152.840647 | 150.0 |
Electric | 103.268634 | 80.0 |
Fairy | 106.160338 | 90.0 |
Fighting | 106.896770 | 90.0 |
Fire | 109.961598 | 90.0 |
Grass | 93.222936 | 80.0 |
Normal | 94.812634 | 80.0 |
Psychic | 100.600616 | 80.0 |
Steel | 125.968379 | 110.0 |
Water | 102.744539 | 80.0 |
<Axes: title={'center': 'Spread of Mean HP (Card Game)'}>
In Mean and Median HP by Type for Pokémon: The Card Game, the left bar is the mean hp and the median hp is the right bar. It can be clearly seen that Dragon Pokémon have the highest mean and median health points and are outliers in Spread of Mean HP (Card Game). This likely explains why they are among the least common types of card. Grass Pokémon on the other hand, are slightly weaker, on average, than all of the other types but not enough so to be an outlier. Aside from thhe Dragon type, the mean hps fall into a range of just over 24 hp. In the context of the card game, 24 hp does not constitute a significant differance in strength between non-outlier types. However, the difference of nearly 60 hp from the mean hp of Dragon type and the minimum mean hp of Grass type does constituted a signifigcant difference. Based on my player experience, a difference of 50 hp or more usually gaurentees a loss for the weaker Pokémon, demonstrating a Dragon type advantage in health points over the lowest section of types: Grass, Normal, Psychic, and Water.
# groupby type1 and type2, find the means
video_grouped_type1 = video_df.groupby(by=['type1'])
health_points_type1 = video_grouped_type1['hp'].agg('mean')
video_grouped_type2 = video_df.groupby(by=['type2'])
health_points_type2 = video_grouped_type2['hp'].agg('mean')
# calculate weights for each type
weight_1 = sum(primary_video_type_data)/sum(video_type_data)
weight_2 = sum(secondary_video_type_data)/sum(video_type_data)
# combine them into a list, and generate a dataframe
hp_type = [health_points_type1[t]*weight_1 + health_points_type2[t]*weight_2 for t in labels_video]
hp_type_video = pd.DataFrame(hp_type, columns=['mean hp'], index=labels_video)
display(hp_type_video)
# create the bar plot
fig, ax = plt.subplots()
x = np.arange(len(labels_video)) # x-axis positions
width = .9 # width of the bars
# plot the mean HP bar
ax.bar(x, hp_type, width, label='Mean HP', color=colors_video)
# set the labels, title, and legend
ax.set_xlabel('Pokémon Type')
ax.set_ylabel('HP')
ax.set_title('Mean HP by Type for Pokémon Video Games')
ax.set_xticks(x)
ax.set_xticklabels(labels_video, rotation=45)
# show the plot
plt.tight_layout()
plt.show()
hp_type_video['mean hp'].plot(kind='box', title='Spread of Mean HP (Video Games)', ylabel='mean hp')
plt.show()
mean hp | |
---|---|
Bug | 58.665122 |
Dark | 73.579744 |
Dragon | 87.597242 |
Electric | 67.126120 |
Fairy | 69.807957 |
Fighting | 76.466587 |
Fire | 72.254509 |
Flying | 72.378524 |
Ghost | 65.245092 |
Grass | 67.502494 |
Ground | 73.994854 |
Ice | 79.109103 |
Normal | 72.515669 |
Poison | 70.278922 |
Psychic | 72.749649 |
Rock | 68.664699 |
Steel | 71.477437 |
Water | 68.751725 |
As can be seen in Mean HP by Type for Pokémon Video Games, the Pokémon, in general, have less HP then their card game counterparts. By comparing the bar chart to Spread of Mean HP (Video Games) we can see that just like the card game, Dragon type is an outlier with very high hp. We also can see that the Bug type is considerably weaker than the other types. The non-outlier types fall within a range of approximately 14 hp which, given my player experience, is a deficit that is challenging but does not necessarily guarantee a loss for the weaker Pokémon. On the other hand, a difference of more than 20 hp usually results in a victory for the Pokémon with more health. That means that the Dragon type has a considerable advantage over the Bug, Electric, Ghost, and Grass types. This also means that the Dragon and Ice types have a definite advantage over the Bug type, with the Fighting type being very close to having the advantage as well.
It should be noted that since not all types from the video games are in the card game, the extra types have been dropped for this comparison.
# create a dataframe that merges the Pokemon from the games into one table, to analyze the mean hitpoints of shared typed
hp_type_card = hp_type_card.drop(['median hp'], axis=1)
hp_merged = pd.merge(hp_type_video, hp_type_card, left_index=True, right_index=True)
hp_merged.rename(columns = {'mean hp_x':'mean hp: video game'}, inplace = True)
hp_merged.rename(columns = {'mean hp_y':'mean hp: card game'}, inplace = True)
hp_merged.index.names = ['type']
# add columns of weighted means
weightv = sum(video_df["type1"].value_counts())
weightc = sum(card_df["type1"].value_counts())
totalcount = weightv + weightc
weightv = weightv/totalcount
weightc = weightc/totalcount
hp_merged['overall weighted mean hp'] = [hp_merged.at[tp, 'mean hp: video game']*weightv + hp_merged.at[tp, 'mean hp: card game']*weightc for tp in labels_card]
display(hp_merged)
# plot the figure
fig, ax = plt.subplots()
x = np.arange(len(labels_card)) # x-axis positions
width = 0.9 # width of the bars
# plot the mean HP bars
ax.bar(x, hp_merged['overall weighted mean hp'], width, label='Mean HP', color=colors_card)
# set the labels, title, and legend
ax.set_xlabel('Pokémon Type')
ax.set_ylabel('Mean HP')
ax.set_title('Mean Health Points by Type for all Pokémon Games')
ax.set_xticks(x)
ax.set_xticklabels(labels_card, rotation=45)
# show the plot
plt.tight_layout()
plt.show()
hp_merged['overall weighted mean hp'].plot(kind='box', title='Spread of Mean HP for all Pokémon Games', ylabel='mean hp')
plt.show()
mean hp: video game | mean hp: card game | overall weighted mean hp | |
---|---|---|---|
type | |||
Dark | 73.579744 | 117.568134 | 114.539323 |
Dragon | 87.597242 | 152.840647 | 148.348327 |
Electric | 67.126120 | 103.268634 | 100.780049 |
Fairy | 69.807957 | 106.160338 | 103.657303 |
Fighting | 76.466587 | 106.896770 | 104.801507 |
Fire | 72.254509 | 109.961598 | 107.365284 |
Grass | 67.502494 | 93.222936 | 91.451961 |
Normal | 72.515669 | 94.812634 | 93.277381 |
Psychic | 72.749649 | 100.600616 | 98.682944 |
Steel | 71.477437 | 125.968379 | 122.216418 |
Water | 68.751725 | 102.744539 | 100.403972 |
As can be seen in Mean Health Points by Type for all Pokémon Games and the box plot, Dragon type is once again an outlier in hp for the types in both games. The non-outlier types have a range of about 31 hp. In fact, the Dragon type have on average 26 more hp then the second highest type, Steel.
The smaller the percentage of all Pokémon each type makes up, the rarer it is for a player or card collector to find a Pokémon of that type. Additionally, it has been shown that some advantage exists for types with a higher mean hp. Therefore, it is hypothosized that the rarer a Pokémon type is, the higher its mean health points will be.
video_pct = np.array([float(p/vtotal*100) for p in video_type_data])
plt.scatter(video_pct, hp_type_video['mean hp'])
theta = np.polyfit(video_pct, hp_type_video['mean hp'], 1)
y_line = theta[1] + theta[0] * video_pct
plt.plot(video_pct, y_line, 'r')
plt.title('Correlation Between HP and Type Distribution (Video Games)')
plt.xlabel('Type Distribution %')
plt.ylabel('Mean HP')
Text(0, 0.5, 'Mean HP')
There is a loose correlation between type distribution and mean health points in the video games. However, the type distirubtions do not have a large spread, meaning that even the rarer types that may have a higher hp are not that much more rare then the more common types that may have lower hp.
card_pct = np.array([float(p/ctotal*100) for p in card_type_data])
plt.scatter(card_pct, hp_type_card['mean hp'])
theta = np.polyfit(card_pct, hp_type_card['mean hp'], 1)
y_line = theta[1] + theta[0] * card_pct
plt.plot(card_pct, y_line, 'r')
plt.title('Correlation Between HP and Type Distribution (Card Game)')
plt.xlabel('Type Distribution %')
plt.ylabel('Mean HP')
Text(0, 0.5, 'Mean HP')
There is a strong correlation between mean hp and and the rarity of a type. This makes sense, from a game balence standpoint. However, we can see the point that corelates to the Dragon type is still approximately 23 hp higher then it should be, given a linear fit.
Overall Type Distribution: The video games have a more even distribution of types for each Pokémon. The card game has certain types that are substantially rarer than others. The most common type in each game is Water. The least common video game type is Ice, and the least common card game type is Dragon.
Type to Sub-Type Distribution: In the video games, the every primary type has a Pokémon with a secondary Flying Type. Water type has the greatest variety of potential subtypes with 14, and the Fairy type has just 2 potential subtypes. Additionally, more Pokémon do not have a second type than I originally thought.
Mean Health Points by Type Comparison: In the card and video games, both individually and combined, Dragon type has the highest average health points, to the point of being a statistical outlier and having an advantage over several weaker types. Additionally, the Bug type has the lowest average health points in the video games, also a statistical outlier at a disadvatage to three of the more powerful types, including Dragon type. The type with the lowest heath in the card game is the Grass type, which has the lowest health points of any type shared by both games. It is only at a disadvantage to the Dragon type, in both games.
Correlation Between Mean Health Points and Type Distribution: For the video games, there is a loose negative linear correlation between the percentage of overall type distribution and the mean hp of a type, with a smaller range of type distributions than the card game. The card game has a strong negative linear correlation between overall type distribution and the mean hp of a type. However, in the card game, the Dragon type has a greater mean health points over 20 points higher than the expected value from the linear fit. This was interpreted to mean that in spite of the small percentage of Dragon type cards for all cards made, it holds a statistically-backed advantage in health points.