Date String Manipulations

Technical Implementation

Data Loading & Inspection

Before getting started, all required libraries and extensions are imported, the dataset is loaded, and a brief overview is obtained using .head() while .info() is used to assess data types and data shape.

View Code

# Import
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
  
# Load dataset
df = pd.read_csv('lightning_strikes_dataset.csv')
df.head()
df.info()

Results:

`df.head()`

	date	number_of_strikes	center_point_geom
0	2016-08-05	16	POINT(-101.5 24.7)
1	2016-08-05	16	POINT(-85 34.3)
2	2016-08-05	16	POINT(-89 41.4)
3	2016-08-05	16	POINT(-89.8 30.7)
4	2016-08-05	16	POINT(-86.2 37.9)

`df.info()`

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10479003 entries, 0 to 10479002
Data columns (total 3 columns):
 #   Column             Dtype 
---  ------             ----- 
 0   date               object
 1   number_of_strikes  int64 
 2   center_point_geom  object
dtypes: int64(1), object(2)
memory usage: 239.8+ MB

Summary

Similar to another project, the date column has been recorded as an object type and, as the target feature, will be converted to datetime format to extract the required temporal components.

.info() reveals the dataset to contain ~10.5M rows with 3 columns as better represented with .head().

Feature Engineering

At this phase of the project, the date column is converted to a proper datetime format using pd.to_datetime()

Four new columns, week, month, quarter and year, are created using the datetime.strftime() method with associated format codes.

View Code

# Create columns
df['week'] = df['date'].dt.strftime('%Y-W%V')
df['month'] = df['date'].dt.strftime('%Y-%m')
df['quarter'] = df['date'].dt.to_period('Q').dt.strftime('%Y-Q%q')
df['year'] = df['date'].dt.strftime('%Y') 

# Check
df.head(10)

Results:

`df.head(10)`

	date	number_of_strikes	center_point_geom	week	month	quarter	year
0	2016-08-05	16	POINT(-101.5 24.7)	2016-W31	2016-08	2016-Q3	2016
1	2016-08-05	16	POINT(-85 34.3)	2016-W31	2016-08	2016-Q3	2016
2	2016-08-05	16	POINT(-89 41.4)	2016-W31	2016-08	2016-Q3	2016
3	2016-08-05	16	POINT(-89.8 30.7)	2016-W31	2016-08	2016-Q3	2016
4	2016-08-05	16	POINT(-86.2 37.9)	2016-W31	2016-08	2016-Q3	2016
5	2016-08-05	16	POINT(-97.8 38.9)	2016-W31	2016-08	2016-Q3	2016
6	2016-08-05	16	POINT(-81.9 36)	2016-W31	2016-08	2016-Q3	2016
7	2016-08-05	16	POINT(-90.9 36.7)	2016-W31	2016-08	2016-Q3	2016
8	2016-08-05	16	POINT(-106.6 26.1)	2016-W31	2016-08	2016-Q3	2016
9	2016-08-05	16	POINT(-108 31.6)	2016-W31	2016-08	2016-Q3	2016

Summary

The date column was converted from a string to a datetime format to enable time based calculations. New temporal features (week, month, quarter and year) were then extracted using datetime methods and formatting, allowing easier aggregation and analysis of geotemporal patterns.

2018 Weekly Strikes Visualization

The dataset is filtered to 2018 and aggregated with groupby() and sum() before plotting

Using pandas versions v.2.x+ requires setting numeric_only=True in the sum() during aggregation or an error will be thrown.

View Code

# 2018 data summed by week
df_by_week_2018 = df[df['year'] == '2018'].groupby(['week']).sum(numeric_only=True).reset_index()
df_by_week_2018.head()

# Plot
plt.bar(x = df_by_week_2018['week'], height = df_by_week_2018['number_of_strikes'])
plt.plot()
plt.xlabel("Week number")
plt.ylabel("Number of lightning strikes")
plt.title("Number of lightning strikes per week (2018)");
plt.show()
  
# Adjusting
plt.figure(figsize = (20, 5)) # Increase output size.
plt.bar(x = df_by_week_2018['week'], height = df_by_week_2018['number_of_strikes'])
plt.plot()
plt.xlabel("Week number")
plt.ylabel("Number of lightning strikes")
plt.title("Number of lightning strikes per week (2018)")
plt.xticks(rotation = 45, fontsize = 8) # Rotate x-axis labels and decrease font size.

plt.show()

Results:

Summary

The dataset was filtered for 2018 and grouped by week using groupby() to calculate total weekly lightning strikes. The results were visualized using bar(), with plt.figure(figsize=(20, 5)) and plt.xticks(rotation=45, fontsize=8) applied to improve readability.

The visualization shows relatively low activity early in the year, followed by a sharp increase during mid-year, with peak lightning activity occurring in late summer before declining toward the end of the year.

2016-2018 Quarterly Strikes Visualization

Analysis now shifts focus to quarterly lightning strikes for the full date range of available data. Because it is easiest to work with numbers in millions, .div(1000000) will be used during aggregation before plotting.

A number_of_strikes_formatted column is added representing the number in millions.

A labeling function is used to annotate each bar in the graph with its corresponding number_of_strikes_formatted text. The pyplot function uses positional arguments x, y, and s for x-axis, y-axis, and text respectively.

View Code

# 2016-2018 data by quarter and sum
df_by_quarter = df.groupby(['quarter']).sum(numeric_only=True).reset_index()

# Text converted to millions
df_by_quarter['number_of_strikes_formatted'] = df_by_quarter['number_of_strikes'].div(1000000).round(1).astype(str) + 'M'

df_by_quarter.head()

# pyplot function
def addlabels(x, y, labels):
    for i in range(len(x)):
        plt.text(i, y[i], labels[i], ha = 'center', va = 'bottom')

# Plot
plt.figure(figsize = (15, 5))
plt.bar(x = df_by_quarter['quarter'], height = df_by_quarter['number_of_strikes'])
addlabels(df_by_quarter['quarter'], df_by_quarter['number_of_strikes'], df_by_quarter['number_of_strikes_formatted'])
plt.plot()
plt.xlabel('Quarter')
plt.ylabel('Number of lightning strikes')
plt.title('Number of lightning strikes per quarter (2016-2018)')
plt.show()

Results:

`df_by_quarter.head()`

	quarter	number_of_strikes	number_of_strikes_formatted
0	2016-Q1	2683798	2.7M
1	2016-Q2	15084857	15.1M
2	2016-Q3	21843820	21.8M
3	2016-Q4	1969754	2.0M
4	2017-Q1	2444279	2.4M

Summary

The grouped visualization reveals consistent seasonal trends, with peak lightning strikes occurring in Q2 and Q3 across all years, while also highlighting variation in intensity between years.

Year-over-Year Quarterly Strike Visualization

For a grouped analysis, two new columns are created to enable grouped comparisons representing two extracted substrings, quarter and year

A grouped bar chart is created to highlight lightning activity across the same quarter in different years.

View Code

# Create two new columns
df_by_quarter['quarter_number'] = df_by_quarter['quarter'].str[-2:]
df_by_quarter['year'] = df_by_quarter['quarter'].str[:4]

# Check
df_by_quarter.head()

# Plot
plt.figure(figsize = (15, 5))
p = sns.barplot(
    data = df_by_quarter,
    x = 'quarter_number',
    y = 'number_of_strikes',
    hue = 'year')
for b in p.patches:
    p.annotate(str(round(b.get_height()/1000000, 1))+'M', 
                   (b.get_x() + b.get_width() / 2., b.get_height() + 1.2e6), 
                   ha = 'center', va = 'bottom', 
                   xytext = (0, -12), 
                   textcoords = 'offset points')
plt.xlabel("Quarter")
plt.ylabel("Number of lightning strikes")
plt.title("Number of lightning strikes per quarter (2016-2018)")
plt.show()

Results:

`.head()`

	quarter	number_of_strikes	number_of_strikes_formatted	quarter_number	year
0	2016-Q1	2683798	2.7M	Q1	2016
1	2016-Q2	15084857	15.1M	Q2	2016
2	2016-Q3	21843820	21.8M	Q3	2016
3	2016-Q4	1969754	2.0M	Q4	2016
4	2017-Q1	2444279	2.4M	Q1	2017

Summary

Two new features, quarter_number and year, were extracted from the quarter column to enable grouped comparison. A grouped bar chart was then created with each bar being annotated with its respective value (in millions) to improve readability. Ultimately, the bar chart now highlights a grouped version of quarterly lightning activity across years.

Project Overview

This project focuses on transforming and extracting meaningful features from date string data to enable structured temporal analysis. Using Python, raw string based date values are converted into datetime format and expanded into multiple time based components such as week, month, quarter, and year. These engineered features are then used to aggregate and visualize lightning strike activity across different time scales, revealing seasonal patterns and year-over-year trends. The project highlights how effective date manipulation serves as a critical step in preparing data for temporal analysis and visualization.

NOAA Lightning Strike Analysis (Date String Manipulations)

Key Takeaways

Project Background

Project Goal

Technical Implementation

`df.head()`

`df.info()`

`df.head(10)`

`df_by_quarter.head()`

`.head()`

Project Overview