To calculate monthly lightning strikes from 2016-2018, the second structured dataset is used, calculating percentages of total lightning strikes that occurred for each year in a given month.
Like the first dataset, the date column needs to be converted.
View Code
# Load dataset
df_2 = pd.read_csv('lightning_strikes_dataset_2.csv')
# Datetime conversion
df_2['date'] = pd.to_datetime(df_2['date'])
df_2.head()
Results:
| date | number_of_strikes | center_point_geom |
| 2016-01-04 | 55 | POINT(-83.2 21.1) |
| 2016-01-04 | 33 | POINT(-83.1 21.1) |
| 2016-01-05 | 46 | POINT(-77.5 22.1) |
| 2016-01-05 | 28 | POINT(-76.8 22.3) |
| 2016-01-05 | 28 | POINT(-77 22.1) |
The 2016-2017 dataframe is combined with the 2018 dataframe using concat() so that data can be aggregated across all years.
The week and weekday columns that were created earlier are dropped for this operation only to simplify the results.
View Code
# Create new 2016-2018 dataframe
union_df = pd.concat([df.drop(['weekday', 'week'], axis=1), df_2], ignore_index=True)
union_df.head()
Results:
| date | number_of_strikes | center_point_geom |
| 2018-01-03 | 194 | POINT(-75 27) |
| 2018-01-03 | 41 | POINT(-78.4 29) |
| 2018-01-03 | 33 | POINT(-73.9 27) |
| 2018-01-03 | 38 | POINT(-73.8 27) |
| 2018-01-03 | 92 | POINT(-79 28) |
Three new columns need to be created, year, month number, and month name, to assist with naming bars on the bar plot.
View Code
# Add 3 new columns
union_df['year'] = union_df.date.dt.year
union_df['month'] = union_df.date.dt.month
union_df['month_txt'] = union_df.date.dt.month_name()
union_df.head()
Results:
| date | number_of_strikes | center_point_geom | year | month | month_txt |
| 2018-01-03 | 194 | POINT(-75 27) | 2018 | 1 | January |
| 2018-01-03 | 41 | POINT(-78.4 29) | 2018 | 1 | January |
| 2018-01-03 | 33 | POINT(-73.9 27) | 2018 | 1 | January |
| 2018-01-03 | 38 | POINT(-73.8 27) | 2018 | 1 | January |
| 2018-01-03 | 92 | POINT(-79 28) | 2018 | 1 | January |
With this new dataframe, the total number of strikes per year can be called
View Code
union_df[['year', 'number_of_strikes']].groupby(['year']).sum()
Results:
| year | number_of_strikes |
| 2016 | 41582229 |
| 2017 | 35095195 |
| 2018 | 44600989 |
A new dataframe is created called lightning_by_month highlighting percentages of total lightning strikes that occurred in a given month for each year.
To avoid any future errors with agg(), sum is passed as a string since that is what Pandas expects for built-ins (sum, mean, max etc.)
View Code
# Lightning by month
lightning_by_month = union_df.groupby(['month_txt', 'year']).agg(
number_of_strikes = pd.NamedAgg(column = 'number_of_strikes', aggfunc='sum')
).reset_index()
lightning_by_month.head()
Results:
| month_txt | year | number_of_strikes |
| April | 2016 | 2636427 |
| April | 2017 | 3819075 |
| April | 2018 | 1524339 |
| August | 2016 | 7250442 |
| August | 2017 | 6021702 |
With the monthly total strike count in hand, agg() is used to calculate the same yearly totals from before into a dataframe that can be merged downstream.
View Code
# lightning by year
lightning_by_year = union_df.groupby(['year']).agg(
year_strikes = pd.NamedAgg(column='number_of_strikes', aggfunc='sum')
).reset_index()
lightning_by_year.head()
Results:
| year | year_strikes |
| 2016 | 41582229 |
| 2017 | 35095195 |
| 2018 | 44600989 |
With both lightning_by_month and lightning_by_year, a new dataframe called percentage_lightning can be created.
merge() offers this capability by merging both lightning_by_month and lightning_by_year into a single dataframe, specifying the merge on the year column. Wherever year contains the same value in both dataframes, a row is created in the merged dataframe with all other columns.
percentage_lightning will add a new column called year_strikes representing total number of strikes for each year.
View Code
# Combine
percentage_lightning = lightning_by_month.merge(lightning_by_year, on='year')
percentage_lightning.head()
Results:
| month_txt | year | number_of_strikes | year_strikes |
| April | 2016 | 2636427 | 41582229 |
| April | 2017 | 3819075 | 35095195 |
| April | 2018 | 1524339 | 44600989 |
| August | 2016 | 7250442 | 41582229 |
| August | 2017 | 6021702 | 35095195 |
To get the percentage of total lightning strikes that occurred during each month, the number_of_strikes column is divided by the year_strikes column and multiplied by 100.
View Code
percentage_lightning['percentage_lightning_per_month'] = (percentage_lightning.number_of_strikes/
percentage_lightning.year_strikes * 100.0)
percentage_lightning.head()
Results:
| month_txt | year | number_of_strikes | year_strikes | percentage_lightning_per_month |
| April | 2016 | 2636427 | 41582229 | 6.340273 |
| April | 2017 | 3819075 | 35095195 | 10.882045 |
| April | 2018 | 1524339 | 44600989 | 3.417725 |
| August | 2016 | 7250442 | 41582229 | 17.436396 |
| August | 2017 | 6021702 | 35095195 | 17.158195 |
Now the percentages by month for each year can be plotted
View Code
# Plot
plt.figure(figsize=(10,6));
month_order = ['January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
sns.barplot(
data = percentage_lightning,
x = 'month_txt',
y = 'percentage_lightning_per_month',
hue = 'year',
order = month_order );
plt.xlabel("Month");
plt.ylabel("% of lightning strikes");
plt.title("% of lightning strikes each Month (2016-2018)");