Background Information¶

Pokémon was created in 1996 by a Japanese man named Satoshi Tajiri. It was originally released as a video game called "Pocket Monsters: Red and Green" for the Nintendo GameBoy. The game achieved explosive popularity in Japan, and then internationally, due to the adorable design of the Pokémon and themes of exploration, friendship, child-like freedom and joy in the games. Shortly after the orginial game was released, the franchise rebranded to Pokémon and the collectable card game was introduced. In the years since, a multitude of additional video games, comics, animated shows, and movies have been released.

The premise of the Pokémon franchise is that humans share a world with creatures called Pokémon that, by their nature, want to be captured, tamed, and fight battles against other Pokémon. Those who travel to befriend and capture Pokémon are called trainers. The player takes on that role in the video games and collectable card game.

Both games are based largely on probability, strategy, and general game theory. For example, in both types of games, Pokémon have an elemental type or a combination of types, such as Grass, Water, Fire, Steel, Psychic, etc. Additionally, Pokémon have health points, and various attacks that can be used. It is important to note that there are differences in the types, health points, and attack strenghts of various Pokémon between the card game and the video games.

Given those differences, health points act as a basic metric of game balance and Pokémon strength. Furthermore, for each type of game, a mathematical assesement could be made of the comparitive strengths and advantages for each type of Pokémon. The card game in particular is very dependent on probability, due to the composition of decks and the game rules. Decks are comprised of 60 cards, which can be drawn at random, and must be built with a balance of Pokémon, trainer, and energy cards in mind.

Importing the Data¶

For this project, we will take data from two different sources, one for the card game, and the other for the video games. The sources for each dataset can be found here at the links below.

Card Game: https://www.kaggle.com/datasets/adampq/pokemon-tcg-all-cards-1999-2023?rvi=1

Video Game: https://data.world/data-society/pokemon-with-stats

In [1]:
import numpy as np
import pandas as pd
import requests
import io
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap, ListedColormap
import csv
import ast
from itertools import cycle, islice

# function for reading csv from a url 
def read_csv_from_url(url):
    '''
    Reads csv file into a Pandas DataFrame given a url
    
    input: url for csv file
    output: Pandas DataFrame
    '''
    s=requests.get(url).content
    return pd.read_csv(io.StringIO(s.decode('utf-8')))
    
# csv files containing the data
dataloc= 'https://unixweb.kutztown.edu/~ibeck950/csc223/'
card_url = dataloc + 'pokemontcg.csv'
video_url = dataloc + 'Pokemon.csv'

card_df = read_csv_from_url(card_url)
video_df = read_csv_from_url(video_url)

display(card_df.head())
display(video_df.head())
id set series publisher generation release_date artist name set_num types ... retreatCost convertedRetreatCost rarity flavorText nationalPokedexNumbers legalities resistances rules regulationMark ancientTrait
0 base1-1 Base Base WOTC First 1/9/1999 Ken Sugimori Alakazam 1 ['Psychic'] ... ['Colorless', 'Colorless', 'Colorless'] 3.0 Rare Holo Its brain can outperform a supercomputer. Its ... [65] {'unlimited': 'Legal'} NaN NaN NaN NaN
1 base1-2 Base Base WOTC First 1/9/1999 Ken Sugimori Blastoise 2 ['Water'] ... ['Colorless', 'Colorless', 'Colorless'] 3.0 Rare Holo A brutal Pokémon with pressurized water jets o... [9] {'unlimited': 'Legal'} NaN NaN NaN NaN
2 base1-3 Base Base WOTC First 1/9/1999 Ken Sugimori Chansey 3 ['Colorless'] ... ['Colorless'] 1.0 Rare Holo A rare and elusive Pokémon that is said to bri... [113] {'unlimited': 'Legal'} [{'type': 'Psychic', 'value': '-30'}] NaN NaN NaN
3 base1-4 Base Base WOTC First 1/9/1999 Mitsuhiro Arita Charizard 4 ['Fire'] ... ['Colorless', 'Colorless', 'Colorless'] 3.0 Rare Holo Spits fire that is hot enough to melt boulders... [6] {'unlimited': 'Legal'} [{'type': 'Fighting', 'value': '-30'}] NaN NaN NaN
4 base1-5 Base Base WOTC First 1/9/1999 Ken Sugimori Clefairy 5 ['Colorless'] ... ['Colorless'] 1.0 Rare Holo Its magical and cute appeal has many admirers.... [35] {'unlimited': 'Legal'} [{'type': 'Psychic', 'value': '-30'}] NaN NaN NaN

5 rows × 29 columns

number name type1 type2 total hp attack defense sp_attack sp_defense speed generation legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 Mega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 3 Gigantamax Venusaur Grass Poison 525 80 82 83 100 100 80 1 False

Cleaning the Data¶

In the Pokémon card game, there are three primary types of cards: Trainers, Energy, and Pokémon. For this project, we will be focusing on just the Pokémon, so we will remove any trainer or energy card entries.

Several of the columns are not needed, so they will be dropped from card_df, since they are not relevant to the analysis being performed. Additionally, several columns will be dropped from video_df for the same reasons.

Some of the datatypes in card_df were converted to strings and must be returned to their intended datatypes. Many items also had unnecessary quotes around them. There is a slight discrepancy in the type names between the card game and the video games. Colorless Pokémon from the card game are equivalent to the Normal Pokémon from the video games, the same is true of the Metal and Steel types, and the Electric and Lightning types. For this analysis, we will use the video game type names.

In [2]:
# removes the trainer and energy cards
card_df = card_df.drop(card_df[card_df['supertype'] == 'Trainer'].index)
card_df = card_df.drop(card_df[card_df['supertype'] == 'Energy'].index)

# in the dataset, the national pokedex number was stored as a string, formated like a list of floats which is certainly
# an interesting choice on the part of the data set creator, since the pokedex number is a unique integer identifier.
# the following lines replace that column with the value of the float inside the string formatted list
card_df['nationalPokedexNumbers'] = card_df['nationalPokedexNumbers'].fillna('[]')
card_df['nationalPokedexNumbers'] = card_df['nationalPokedexNumbers'].apply(lambda x: int(x.strip('[]').split(',')[0]) if x.strip('[]') else None)

# each pokemon has a primary type, and also possibly a secondary type. This splits the list into two seperate columns
card_df['type1'] = card_df['types'].apply(lambda x: (x.strip('[]').split(',')[0]) if x.strip('[]') else 'None')
card_df['type2'] = card_df['types'].apply(lambda x: x.strip('[]').split(',')[1].strip('\'') if (x.strip('[]') and len(x.strip('[]').split(',')) > 1) else 'None')
card_df = card_df.drop(['types'], axis=1)

# remove unneeded quotes
card_df['type1'] = card_df['type1'].apply(lambda x: x.strip('\'\''))
card_df['type2'] = card_df['type2'].apply(lambda x: x.strip('\'\''))
# match the typing between games.
card_df['type1'] = card_df['type1'].apply(lambda x:'Normal' if x == 'Colorless' else x)
card_df['type2'] = card_df['type2'].apply(lambda x:'Normal' if x == 'Colorless' else x)
card_df['type1'] = card_df['type1'].apply(lambda x:'Steel' if x == 'Metal' else x)
card_df['type2'] = card_df['type2'].apply(lambda x:'Steel' if x == 'Metal' else x)
card_df['type1'] = card_df['type1'].apply(lambda x:'Dark' if x == 'Darkness' else x)
card_df['type2'] = card_df['type2'].apply(lambda x:'Dark' if x == 'Darkness' else x)
card_df['type1'] = card_df['type1'].apply(lambda x:'Electric' if x == 'Lightning' else x)
card_df['type2'] = card_df['type2'].apply(lambda x:'Electric' if x == 'Lightning' else x)

# removes unneeded categories
card_df = card_df.drop(['id'], axis=1)
card_df = card_df.drop(['set'], axis=1)
card_df = card_df.drop(['series'], axis=1)
card_df = card_df.drop(['publisher'], axis=1)
card_df = card_df.drop(['release_date'], axis=1)
card_df = card_df.drop(['artist'], axis=1)
card_df = card_df.drop(['set_num'], axis=1)
card_df = card_df.drop(['rarity'], axis=1)
card_df = card_df.drop(['regulationMark'], axis=1)
card_df = card_df.drop(['ancientTrait'], axis=1)
card_df = card_df.drop(['evolvesFrom'], axis=1)
card_df = card_df.drop(['evolvesTo'], axis=1)
card_df = card_df.drop(['supertype'], axis=1)
card_df = card_df.drop(['flavorText'], axis=1)
card_df = card_df.drop(['rules'], axis=1)
card_df = card_df.drop(['retreatCost'], axis=1)
card_df = card_df.drop(['level'], axis=1)
card_df = card_df.drop(['legalities'], axis=1)
card_df = card_df.drop(['abilities'], axis=1)
card_df = card_df.drop(['attacks'], axis=1)
card_df = card_df.drop(['weaknesses'], axis=1)
card_df = card_df.drop(['convertedRetreatCost'], axis=1)
card_df = card_df.drop(['resistances'], axis=1)

video_df = video_df.drop(['legendary'], axis=1)
video_df = video_df.drop(['total'], axis=1)
video_df = video_df.drop(['attack'], axis=1)
video_df = video_df.drop(['defense'], axis=1)
video_df = video_df.drop(['sp_attack'], axis=1)
video_df = video_df.drop(['sp_defense'], axis=1)
video_df = video_df.drop(['speed'], axis=1)

# fill NaN with appropriate values
video_df['type2'] = video_df['type2'].fillna('None')

#sets the index to the name of the pokemon
card_df = card_df.set_index(card_df['name'])
video_df = video_df.set_index(video_df['name'])

display(card_df.head())
display(video_df.head())
generation name subtypes hp nationalPokedexNumbers type1 type2
name
Alakazam First Alakazam ['Stage 2'] 80.0 65.0 Psychic None
Blastoise First Blastoise ['Stage 2'] 100.0 9.0 Water None
Chansey First Chansey ['Basic'] 120.0 113.0 Normal None
Charizard First Charizard ['Stage 2'] 120.0 6.0 Fire None
Clefairy First Clefairy ['Basic'] 40.0 35.0 Normal None
number name type1 type2 hp generation
name
Bulbasaur 1 Bulbasaur Grass Poison 45 1
Ivysaur 2 Ivysaur Grass Poison 60 1
Venusaur 3 Venusaur Grass Poison 80 1
Mega Venusaur 3 Mega Venusaur Grass Poison 80 1
Gigantamax Venusaur 3 Gigantamax Venusaur Grass Poison 80 1

Pokédex Number Indexing¶

Each of the 1025 species of Pokémon has its own unique identification number assigned to it, for easy reference in the Pokédex, a handheld device used by the player to identify Pokémon in the video games. In real life, it can be found online at https://pokemondb.net/pokedex/national and can be used as a tool when building a card game deck or for looking up facts about a Pokémon. It should be noted that there are variations of each species that may have different attack strenghts, health points, and may even be different types, while still sharing the same identification number.

In this cell, the data is sorted by number in ascending order, and then is multiindexed, first by number and second by name. This allows for easy lookup.

In [3]:
# sort the cards by pokedex number, in ascending order
card_df_num = card_df.sort_values(by=['nationalPokedexNumbers'])
video_df_num = video_df.sort_values(by=['number'])

# multiindex both arrays based on the name and pokedex numbers
tc1 = card_df_num['nationalPokedexNumbers'].values
tc2 = card_df_num['name'].values
indx_card = pd.MultiIndex.from_arrays([tc1, tc2],names=['number', 'name'])

tv1 = video_df_num['number'].values
tv2 = video_df_num['name'].values
indx_video = pd.MultiIndex.from_arrays([tv1, tv2],names=['number', 'name'])

#drop columns used for indexing
card_df_num = card_df_num.drop(['nationalPokedexNumbers'], axis=1)
card_df_num = card_df_num.drop(['name'], axis=1)
video_df_num = video_df_num.drop(['number'], axis=1)
video_df_num = video_df_num.drop(['name'], axis=1)

card_df_num.index = indx_card
video_df_num.index = indx_video
In [4]:
# these can be used to locate any Pokemon by number
display(video_df_num.loc[[52]])
display(card_df_num.loc[[52]])
type1 type2 hp generation
number name
52 Meowth Normal None 40 1
Alolan Meowth Dark None 40 7
Galarian Meowth Steel None 50 8
generation subtypes hp type1 type2
number name
52.0 Meowth Third ['Basic'] 50.0 Normal None
Meowth Sixth ['Basic'] 70.0 Normal None
Alolan Meowth Seventh ['Basic'] 60.0 Dark None
Meowth Fourth ['Basic'] 60.0 Normal None
Meowth Seventh ['Basic'] 60.0 Normal None
Meowth Sixth ['Basic'] 60.0 Normal None
Meowth Second ['Basic'] 50.0 Normal None
Alolan Meowth Other ['Basic'] 60.0 Dark None
Team Rocket's Meowth First ['Basic'] 40.0 Normal None
Galarian Meowth Eighth ['Basic'] 70.0 Steel None
Galarian Meowth Eighth ['Basic'] 70.0 Steel None
Meowth δ Third ['Basic'] 50.0 Dark 'Metal
Meowth First ['Basic'] 40.0 Normal None
Alolan Meowth Other ['Basic'] 70.0 Dark None
Meowth Sixth ['Basic'] 60.0 Normal None
Galarian Meowth Eighth ['Basic'] 60.0 Steel None
Meowth Other ['Basic'] 60.0 Normal None
Alolan Meowth Seventh ['Basic'] 60.0 Dark None
Meowth First ['Basic'] 50.0 Normal None
Meowth First ['Basic'] 50.0 Normal None
Meowth Seventh ['Basic'] 60.0 Normal None
Meowth Fourth ['Basic'] 60.0 Normal None
Meowth Fourth ['Basic'] 50.0 Normal None
Meowth Sixth ['Basic'] 60.0 Normal None
Meowth Sixth ['Basic'] 60.0 Normal None
Galarian Meowth Eighth ['Basic'] 70.0 Steel None
Meowth Fifth ['Basic'] 60.0 Normal None
Meowth Third ['Basic'] 50.0 Normal None
Meowth Seventh ['Basic'] 70.0 Normal None
Meowth Third ['Basic'] 50.0 Normal None
Meowth First ['Basic'] 50.0 Normal None
Rocket's Meowth Third ['Basic'] 60.0 Dark None
Meowth Fifth ['Basic'] 60.0 Normal None
Meowth Third ['Basic'] 50.0 Normal None
Giovanni's Meowth First ['Basic'] 50.0 Normal None
Meowth δ Third ['Basic'] 40.0 Dark None
Alolan Meowth Seventh ['Basic'] 70.0 Dark None
Alolan Meowth Seventh ['Basic'] 60.0 Dark None
Meowth Third ['Basic'] 50.0 Normal None
Meowth Second ['Basic'] 40.0 Normal None
Giovanni's Meowth First ['Basic'] 40.0 Normal None
Meowth Sixth ['Basic'] 60.0 Normal None
Meowth Eighth ['Basic'] 70.0 Normal None
Galarian Meowth Eighth ['Basic'] 70.0 Steel None
Meowth Fifth ['Basic'] 60.0 Normal None
Galarian Meowth Eighth ['Basic'] 60.0 Steel None
Meowth VMAX Eighth ['VMAX'] 300.0 Normal None
Meowth V Eighth ['Basic', 'V'] 180.0 Normal None
Alolan Meowth Seventh ['Basic'] 70.0 Dark None
Meowth Ninth ['Basic'] 70.0 Normal None
Meowth Second ['Basic'] 50.0 Normal None
Meowth Fifth ['Basic'] 70.0 Normal None

Analysis of Type Distribution in Pokémon Video Games¶

Each Pokémon has a primary type, and many, but not all, have a secondary type. We will study the distribution of type in the video games, counting both the primary and secondary types. We will add a None category for those who do not have a secondary type.

In [5]:
# create the labels, corresponding colors, and count of each type, for the video game
labels_video = ['Bug', 'Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire', 'Flying','Ghost','Grass', 'Ground', 'Ice', 'Normal', 'Poison', 'Psychic', 'Rock', 'Steel', 'Water']
colors_video = ['olivedrab','darkslategrey', 'gold', 'yellow', 'pink', 'brown', 'red', 'skyblue','indigo','green', 'brown', 'cyan','lightgrey', 'magenta','purple', 'khaki','grey', 'blue']
primary_video_type_data = [video_df['type1'].value_counts()[type] for type in labels_video]
secondary_video_type_data = [video_df['type2'].value_counts()[type] for type in labels_video]

video_type_data = [primary + secondary for primary, secondary in zip(primary_video_type_data, secondary_video_type_data)]

#create a plot of each type
fig, ax_video = plt.subplots()
vtotal = sum(video_type_data) - video_df['type2'].value_counts()['None']
ax_video.set_title('Distribution of Types: Pokémon Video Games', loc='center')
patches, _ = ax_video.pie(video_type_data, colors = colors_video)
legend_entries_video = [f'{vlabel} ({vcount/vtotal*100:.1f}%)' for vlabel, vcount in zip(labels_video, video_type_data)]
_ = ax_video.legend(legend_entries_video, loc='upper right', bbox_to_anchor=(1.5, 1.075))

The type distribution in the video games is much more uniform than the card game. This is likely due to that fact that since the cards are collectibles, many different versions of each Pokémon are made, often with unique traits to appeal to collectors. Additionally, the video games types have been more or less consistent since the first Pokémon game, whereas the card game has gone through type classification changes with each generation of cards published.

Type to Subtype Distribution in Pokémon Video Games¶

Since most Pokémon in the video games also have a subtype, we will perform an analysis of what sub-type is most common for each primary type.

In [6]:
labels_w_none = ['Bug', 'Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire', 'Flying','Ghost','Grass', 'Ground', 'Ice', 'None', 'Normal', 'Poison', 'Psychic', 'Rock', 'Steel', 'Water']
cmapv = ListedColormap(['olivedrab','darkslategrey', 'gold', 'yellow', 'pink', 'brown', 'red', 'skyblue','indigo','green', 'brown', 'cyan','whitesmoke','lightgrey', 'magenta','purple', 'khaki','grey', 'blue'])

type_counts = {type1: {type2: 0 for type2 in labels_w_none} for type1 in labels_video}

# Iterate over each row in the dataframe

for index, row in video_df.iterrows():
    type1 = row['type1']
    type2 = row['type2']
    
    # Increment the count for the corresponding type2 under type1
    type_counts[type1][type2] += 1

# Convert the dictionary to a list of lists
type_counts_list = [[type_counts[type1][type2] for type2 in labels_w_none] for type1 in labels_video]
type_counts_df = pd.DataFrame(type_counts_list, columns=labels_w_none, index=labels_video).transpose()

type_percents_list = [[type_counts[type1][type2]/sum(type_counts_df[type1])*100 for type2 in labels_w_none] for type1 in labels_video]
type_percents_df = pd.DataFrame(type_percents_list, columns=labels_w_none, index=labels_video)
transp = type_percents_df.transpose()
transp.columns.name = 'Primary Type'
transp.index.name = 'Secondary Type'
display(transp)

type_percent_fig = plt.figure()
type_percents_df.plot(kind='bar', stacked=True, title='Percentage of Subtype per Primary Type', colormap=cmapv)
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.ylabel('Percentage of Total Subtypes')
plt.show()
Primary Type Bug Dark Dragon Electric Fairy Fighting Fire Flying Ghost Grass Ground Ice Normal Poison Psychic Rock Steel Water
Secondary Type
Bug 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 4.477612 0.000000 0.000000 0.000000 0.000000 5.263158 0.000000 2.380952 0.00 3.333333 0.000000 1.459854
Dark 0.000000 0.000000 0.000000 3.174603 0.000000 6.666667 1.492537 0.000000 2.325581 3.157895 7.142857 0.000000 0.000000 11.904762 1.25 3.333333 0.000000 5.109489
Dragon 0.000000 8.333333 0.000000 3.174603 0.000000 0.000000 2.985075 22.222222 4.651163 7.368421 4.761905 0.000000 0.854701 9.523810 1.25 3.333333 7.692308 2.189781
Electric 4.819277 0.000000 2.439024 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.380952 0.000000 0.000000 0.000000 0.00 5.000000 0.000000 1.459854
Fairy 2.409639 8.333333 2.439024 3.174603 0.000000 0.000000 0.000000 0.000000 2.325581 5.263158 0.000000 2.631579 4.273504 2.380952 13.75 5.000000 10.256410 2.919708
Fighting 4.819277 4.166667 4.878049 0.000000 0.000000 0.000000 10.447761 0.000000 0.000000 3.157895 0.000000 0.000000 3.418803 4.761905 3.75 1.666667 2.564103 2.189781
Fire 2.409639 6.250000 2.439024 1.587302 0.000000 0.000000 0.000000 0.000000 6.976744 0.000000 2.380952 2.631579 0.000000 4.761905 1.25 5.000000 0.000000 0.000000
Flying 18.072289 12.500000 14.634146 9.523810 8.695652 4.444444 11.940299 0.000000 6.976744 7.368421 9.523810 5.263158 23.076923 7.142857 10.00 10.000000 5.128205 5.109489
Ghost 1.204819 4.166667 7.317073 1.587302 0.000000 2.222222 2.985075 0.000000 0.000000 1.052632 9.523810 2.631579 0.000000 0.000000 5.00 0.000000 10.256410 1.459854
Grass 7.228916 4.166667 0.000000 1.587302 0.000000 0.000000 0.000000 0.000000 25.581395 0.000000 0.000000 0.000000 1.709402 0.000000 2.50 3.333333 0.000000 2.189781
Ground 2.409639 0.000000 17.073171 0.000000 0.000000 0.000000 4.477612 0.000000 4.651163 1.052632 0.000000 7.894737 0.854701 4.761905 0.00 10.000000 5.128205 7.299270
Ice 0.000000 4.166667 7.317073 3.174603 0.000000 2.222222 0.000000 0.000000 0.000000 3.157895 0.000000 0.000000 0.000000 0.000000 1.25 3.333333 0.000000 3.649635
None 22.891566 29.166667 31.707317 53.968254 86.956522 66.666667 50.746269 44.444444 34.883721 46.315789 42.857143 50.000000 62.393162 40.476190 55.00 25.000000 33.333333 53.284672
Normal 0.000000 10.416667 0.000000 3.174603 0.000000 0.000000 2.985075 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.50 0.000000 0.000000 0.000000
Poison 14.457831 0.000000 0.000000 6.349206 0.000000 0.000000 0.000000 0.000000 11.627907 16.842105 0.000000 0.000000 0.000000 0.000000 0.00 1.666667 0.000000 2.189781
Psychic 3.614458 4.166667 9.756098 1.587302 0.000000 6.666667 2.985075 0.000000 0.000000 2.105263 4.761905 10.526316 2.564103 4.761905 0.00 3.333333 17.948718 4.379562
Rock 3.614458 0.000000 0.000000 0.000000 0.000000 0.000000 1.492537 0.000000 0.000000 0.000000 7.142857 0.000000 0.000000 0.000000 0.00 0.000000 7.692308 4.379562
Steel 8.433735 4.166667 0.000000 6.349206 4.347826 6.666667 1.492537 22.222222 0.000000 3.157895 9.523810 5.263158 0.000000 0.000000 2.50 6.666667 0.000000 0.729927
Water 3.614458 0.000000 0.000000 1.587302 0.000000 4.444444 1.492537 11.111111 0.000000 0.000000 0.000000 7.894737 0.854701 7.142857 0.00 10.000000 0.000000 0.000000
<Figure size 640x480 with 0 Axes>

As can be seen in Percentage of Subtype per Primary Type, it is fairly common for Pokémon in the videogames to not have a secondary type. The Fairy type, in particular, very rarely have a secondary type. Water has the most potential subtypes, at 15, followed by Rock type with 14, and the Electric type has 13 possible subtypes. Additionally, every primary type has a Flying subtype.

Analysis of Type Distribution in Pokémon: The Card Game¶

Unlike the video game, where most Pokémon have a primary type and a second type, in Pokémon: The Card Game most Pokémon cards only have one type, with a handful of exceptions. Given the over 10,000 different Pokémon cards produced since 1999, it would be interesting to study the distribution of each of the 11 types available in the card game.

In [7]:
# create the labels, corresponding colors, and count of each type, for the card game
labels_card = ['Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire', 'Grass', 'Normal', 'Psychic', 'Steel', 'Water']
colors_card = ['darkslategrey', 'gold', 'yellow', 'pink', 'brown', 'red', 'green', 'lightgrey', 'purple', 'grey', 'blue']
card_type_data = [card_df['type1'].value_counts()[type] for type in labels_card]

#create a plot of each type
fig, ax_card = plt.subplots()
ctotal = sum(card_type_data)
ax_card.set_title('Distribution of Types: Pokémon: The Card Game', loc='center')
patches, _ = ax_card.pie(card_type_data, colors = colors_card)
legend_entries_card = [f'{clabel} ({ccount/ctotal*100:.1f}%)' for clabel, ccount in zip(labels_card, card_type_data)]
_ = ax_card.legend(legend_entries_card, loc='upper right', bbox_to_anchor=(1.5, .825))

As can be seen in the figure titled Distribution of Pokémon Types, the most common type of Pokémon Card is the Water type, followed by Grass, Psychic, Normal, Fighting, and then Fire. I was surprised that Fire type did not make up a higher percentage, since it was one of the original three types of Pokémon. The high percentage of Psychic type cards makes sense, given that the past few releases of cards have been Psychic and Dark Type focused. The low percetage of Fairy type cards makes sense as well, given that Fairy type cards were only released from 2013 until 2020, when the type was discontinued.

Measure of Health Points by Type in Pokémon¶

The sheer number of cards or video game charecters designed for a type of Pokémon is not the only important statistic, since the health points of each Pokémon play a large role in determining it's strength in-game.

HP by Type in Pokémon: The Card Game¶

In [8]:
card_grouped_type = card_df.groupby(by='type1')
hp_type_card = card_grouped_type['hp'].agg([np.mean, np.median])
hp_type_card.rename(columns = {'mean':'mean hp'}, inplace = True)
hp_type_card.rename(columns = {'median':'median hp'}, inplace = True)
hp_type_card.index.names = ['type']
display(hp_type_card)

# isolate the mean and media hp
mean_hp = hp_type_card['mean hp']
median_hp = hp_type_card['median hp']

# create the bar plot
fig, ax = plt.subplots()
x = np.arange(len(labels_card))  # x-axis positions
width = 0.35  # width of the bars

# plot the mean HP bars
ax.bar(x - width/2, mean_hp, width, label='Mean HP', color=colors_card)

# plot the median HP bars
ax.bar(x + width/2, median_hp, width, label='Median HP', color=colors_card)

# set the labels, title, and legend
ax.set_xlabel('Pokémon Type')
ax.set_ylabel('HP')
ax.set_title('Mean and Median HP by Type for Pokémon: The Card Game')
ax.set_xticks(x)
ax.set_xticklabels(labels_card, rotation=45)

# show the plot
plt.tight_layout()
plt.show()

hp_type_card['mean hp'].plot(kind='box', title='Spread of Mean HP (Card Game)')
mean hp median hp
type
Dark 117.568134 100.0
Dragon 152.840647 150.0
Electric 103.268634 80.0
Fairy 106.160338 90.0
Fighting 106.896770 90.0
Fire 109.961598 90.0
Grass 93.222936 80.0
Normal 94.812634 80.0
Psychic 100.600616 80.0
Steel 125.968379 110.0
Water 102.744539 80.0
Out[8]:
<Axes: title={'center': 'Spread of Mean HP (Card Game)'}>

In Mean and Median HP by Type for Pokémon: The Card Game, the left bar is the mean hp and the median hp is the right bar. It can be clearly seen that Dragon Pokémon have the highest mean and median health points and are outliers in Spread of Mean HP (Card Game). This likely explains why they are among the least common types of card. Grass Pokémon on the other hand, are slightly weaker, on average, than all of the other types but not enough so to be an outlier. Aside from thhe Dragon type, the mean hps fall into a range of just over 24 hp. In the context of the card game, 24 hp does not constitute a significant differance in strength between non-outlier types. However, the difference of nearly 60 hp from the mean hp of Dragon type and the minimum mean hp of Grass type does constituted a signifigcant difference. Based on my player experience, a difference of 50 hp or more usually gaurentees a loss for the weaker Pokémon, demonstrating a Dragon type advantage in health points over the lowest section of types: Grass, Normal, Psychic, and Water.

HP by Type in Pokémon Video Games¶

In [9]:
# groupby type1 and type2, find the means
video_grouped_type1 = video_df.groupby(by=['type1'])
health_points_type1 = video_grouped_type1['hp'].agg('mean')
video_grouped_type2 = video_df.groupby(by=['type2'])
health_points_type2 = video_grouped_type2['hp'].agg('mean')

# calculate weights for each type
weight_1 = sum(primary_video_type_data)/sum(video_type_data)
weight_2 = sum(secondary_video_type_data)/sum(video_type_data)

# combine them into a list, and generate a dataframe
hp_type = [health_points_type1[t]*weight_1 + health_points_type2[t]*weight_2 for t in labels_video]
hp_type_video = pd.DataFrame(hp_type, columns=['mean hp'], index=labels_video)
display(hp_type_video)

# create the bar plot
fig, ax = plt.subplots()
x = np.arange(len(labels_video))  # x-axis positions
width = .9  # width of the bars

# plot the mean HP bar
ax.bar(x, hp_type, width, label='Mean HP', color=colors_video)

# set the labels, title, and legend
ax.set_xlabel('Pokémon Type')
ax.set_ylabel('HP')
ax.set_title('Mean HP by Type for Pokémon Video Games')
ax.set_xticks(x)
ax.set_xticklabels(labels_video, rotation=45)

# show the plot
plt.tight_layout()
plt.show()

hp_type_video['mean hp'].plot(kind='box', title='Spread of Mean HP (Video Games)', ylabel='mean hp')
plt.show()
mean hp
Bug 58.665122
Dark 73.579744
Dragon 87.597242
Electric 67.126120
Fairy 69.807957
Fighting 76.466587
Fire 72.254509
Flying 72.378524
Ghost 65.245092
Grass 67.502494
Ground 73.994854
Ice 79.109103
Normal 72.515669
Poison 70.278922
Psychic 72.749649
Rock 68.664699
Steel 71.477437
Water 68.751725

As can be seen in Mean HP by Type for Pokémon Video Games, the Pokémon, in general, have less HP then their card game counterparts. By comparing the bar chart to Spread of Mean HP (Video Games) we can see that just like the card game, Dragon type is an outlier with very high hp. We also can see that the Bug type is considerably weaker than the other types. The non-outlier types fall within a range of approximately 14 hp which, given my player experience, is a deficit that is challenging but does not necessarily guarantee a loss for the weaker Pokémon. On the other hand, a difference of more than 20 hp usually results in a victory for the Pokémon with more health. That means that the Dragon type has a considerable advantage over the Bug, Electric, Ghost, and Grass types. This also means that the Dragon and Ice types have a definite advantage over the Bug type, with the Fighting type being very close to having the advantage as well.

HP by Type in both Pokémon Games¶

It should be noted that since not all types from the video games are in the card game, the extra types have been dropped for this comparison.

In [10]:
# create a dataframe that merges the Pokemon from the games into one table, to analyze the mean hitpoints of shared typed
hp_type_card = hp_type_card.drop(['median hp'], axis=1)
hp_merged = pd.merge(hp_type_video, hp_type_card, left_index=True, right_index=True)
hp_merged.rename(columns = {'mean hp_x':'mean hp: video game'}, inplace = True)
hp_merged.rename(columns = {'mean hp_y':'mean hp: card game'}, inplace = True)
hp_merged.index.names = ['type']

# add columns of weighted means
weightv = sum(video_df["type1"].value_counts())
weightc = sum(card_df["type1"].value_counts())
totalcount = weightv + weightc
weightv = weightv/totalcount
weightc = weightc/totalcount
hp_merged['overall weighted mean hp'] = [hp_merged.at[tp, 'mean hp: video game']*weightv + hp_merged.at[tp, 'mean hp: card game']*weightc for tp in labels_card]

display(hp_merged)

# plot the figure
fig, ax = plt.subplots()
x = np.arange(len(labels_card))  # x-axis positions
width = 0.9  # width of the bars

# plot the mean HP bars
ax.bar(x, hp_merged['overall weighted mean hp'], width, label='Mean HP', color=colors_card)


# set the labels, title, and legend
ax.set_xlabel('Pokémon Type')
ax.set_ylabel('Mean HP')
ax.set_title('Mean Health Points by Type for all Pokémon Games')
ax.set_xticks(x)
ax.set_xticklabels(labels_card, rotation=45)

# show the plot
plt.tight_layout()
plt.show()

hp_merged['overall weighted mean hp'].plot(kind='box', title='Spread of Mean HP for all Pokémon Games', ylabel='mean hp')
plt.show()
mean hp: video game mean hp: card game overall weighted mean hp
type
Dark 73.579744 117.568134 114.539323
Dragon 87.597242 152.840647 148.348327
Electric 67.126120 103.268634 100.780049
Fairy 69.807957 106.160338 103.657303
Fighting 76.466587 106.896770 104.801507
Fire 72.254509 109.961598 107.365284
Grass 67.502494 93.222936 91.451961
Normal 72.515669 94.812634 93.277381
Psychic 72.749649 100.600616 98.682944
Steel 71.477437 125.968379 122.216418
Water 68.751725 102.744539 100.403972

As can be seen in Mean Health Points by Type for all Pokémon Games and the box plot, Dragon type is once again an outlier in hp for the types in both games. The non-outlier types have a range of about 31 hp. In fact, the Dragon type have on average 26 more hp then the second highest type, Steel.

Potential Correlation Between Health Points and Type Distribution¶

The smaller the percentage of all Pokémon each type makes up, the rarer it is for a player or card collector to find a Pokémon of that type. Additionally, it has been shown that some advantage exists for types with a higher mean hp. Therefore, it is hypothosized that the rarer a Pokémon type is, the higher its mean health points will be.

In [11]:
video_pct = np.array([float(p/vtotal*100) for p in video_type_data])

plt.scatter(video_pct, hp_type_video['mean hp'])
theta = np.polyfit(video_pct, hp_type_video['mean hp'], 1)
y_line = theta[1] + theta[0] * video_pct
plt.plot(video_pct, y_line, 'r')
plt.title('Correlation Between HP and Type Distribution (Video Games)')
plt.xlabel('Type Distribution %')
plt.ylabel('Mean HP')
Out[11]:
Text(0, 0.5, 'Mean HP')

There is a loose correlation between type distribution and mean health points in the video games. However, the type distirubtions do not have a large spread, meaning that even the rarer types that may have a higher hp are not that much more rare then the more common types that may have lower hp.

In [12]:
card_pct = np.array([float(p/ctotal*100) for p in card_type_data])

plt.scatter(card_pct, hp_type_card['mean hp'])
theta = np.polyfit(card_pct, hp_type_card['mean hp'], 1)
y_line = theta[1] + theta[0] * card_pct
plt.plot(card_pct, y_line, 'r')
plt.title('Correlation Between HP and Type Distribution (Card Game)')
plt.xlabel('Type Distribution %')
plt.ylabel('Mean HP')
Out[12]:
Text(0, 0.5, 'Mean HP')

There is a strong correlation between mean hp and and the rarity of a type. This makes sense, from a game balence standpoint. However, we can see the point that corelates to the Dragon type is still approximately 23 hp higher then it should be, given a linear fit.

Conclusions¶

Overall Type Distribution: The video games have a more even distribution of types for each Pokémon. The card game has certain types that are substantially rarer than others. The most common type in each game is Water. The least common video game type is Ice, and the least common card game type is Dragon.

Type to Sub-Type Distribution: In the video games, the every primary type has a Pokémon with a secondary Flying Type. Water type has the greatest variety of potential subtypes with 14, and the Fairy type has just 2 potential subtypes. Additionally, more Pokémon do not have a second type than I originally thought.

Mean Health Points by Type Comparison: In the card and video games, both individually and combined, Dragon type has the highest average health points, to the point of being a statistical outlier and having an advantage over several weaker types. Additionally, the Bug type has the lowest average health points in the video games, also a statistical outlier at a disadvatage to three of the more powerful types, including Dragon type. The type with the lowest heath in the card game is the Grass type, which has the lowest health points of any type shared by both games. It is only at a disadvantage to the Dragon type, in both games.

Correlation Between Mean Health Points and Type Distribution: For the video games, there is a loose negative linear correlation between the percentage of overall type distribution and the mean hp of a type, with a smaller range of type distributions than the card game. The card game has a strong negative linear correlation between overall type distribution and the mean hp of a type. However, in the card game, the Dragon type has a greater mean health points over 20 points higher than the expected value from the linear fit. This was interpreted to mean that in spite of the small percentage of Dragon type cards for all cards made, it holds a statistically-backed advantage in health points.

In [ ]: