Quantile Estimation of Portugese Hydro Power Seasonality¶

In this example we'll use power output data from Portugese hydro-plants to demonstrate how the quantile LOWESS model can be used.

Imports¶

import pandas as pd

import matplotlib.pyplot as plt

from moepy import lowess, eda

Loading Data¶

We'll start by reading in the Portugese hydro output data

df_portugal_hydro = pd.read_csv('../data/lowess_examples/portugese_hydro.csv')

df_portugal_hydro.index = pd.to_datetime(df_portugal_hydro['datetime'])
df_portugal_hydro = df_portugal_hydro.drop(columns='datetime')

df_portugal_hydro['day_of_the_year'] = df_portugal_hydro.index.dayofyear
df_portugal_hydro = df_portugal_hydro.resample('D').mean()
df_portugal_hydro = df_portugal_hydro.rename(columns={'power_MW': 'average_power_MW'})

df_portugal_hydro.head()

datetime	average_power_MW	day_of_the_year
2015-01-01	698.5	1
2015-01-02	1065.75	2
2015-01-03	905.125	3
2015-01-04	795.708	4
2015-01-05	1141.62	5

Quantile LOWESS¶

We now just need to feed this data into our quantile_model wrapper

# Estimating the quantiles
df_quantiles = lowess.quantile_model(df_portugal_hydro['day_of_the_year'].values,
                                     df_portugal_hydro['average_power_MW'].values,
                                     frac=0.4, num_fits=40)

# Cleaning names and sorting for plotting
df_quantiles.columns = [f'p{int(col*100)}' for col in df_quantiles.columns]
df_quantiles = df_quantiles[df_quantiles.columns[::-1]]

df_quantiles.head()

100% 9/9 [00:16<00:02, 1.73s/it]

x	p90	p80	p70	p60	p50	p40	p30	p20	p10
1	1885.08	1400.78	1006.97	910.769	795.475	693.001	604.221	498.096	407.17
2	1885.93	1406.29	1015.76	917.074	800.255	697.121	607.521	500.673	409.021
3	1886.8	1411.81	1024.54	923.37	805.008	701.225	610.814	503.239	410.866
4	1887.68	1417.32	1033.31	929.659	809.738	705.317	614.105	505.797	412.695
5	1888.57	1422.84	1042.08	935.952	814.456	709.409	617.404	508.359	414.485

We can then visualise the estimated quantile fits of the data

fig, ax = plt.subplots(dpi=150)

ax.scatter(df_portugal_hydro['day_of_the_year'], df_portugal_hydro['average_power_MW'], s=1, color='k', alpha=0.5)
df_quantiles.plot(cmap='viridis', legend=False, ax=ax)

eda.hide_spines(ax)
ax.legend(frameon=False, bbox_to_anchor=(1, 0.8))
ax.set_xlabel('Day of the Year')
ax.set_ylabel('Hydro Power Average (MW)')
ax.set_xlim(0, 365)
ax.set_ylim(0)

(0.0, 2620.8375)

png

We can also ask questions like: "what day of a standard year would the lowest power output be recorded?"

scenario = 'p50'

print(f'In a {scenario} year the lowest hydro power output will most likely fall on day {df_quantiles[scenario].idxmin()}')

In a p50 year the lowest hydro power output will most likely fall on day 228

We can also identify the peridos when our predictions will have the greatest uncertainty

s_80pct_pred_intvl = df_quantiles['p90'] - df_quantiles['p10']

print(f'Day {s_80pct_pred_intvl.idxmax()} is most likely to have the greatest variation in hydro power output')

# Plotting
fig, ax = plt.subplots(dpi=150)

s_80pct_pred_intvl.plot(ax=ax)

eda.hide_spines(ax)
ax.set_xlabel('Day of the Year')
ax.set_ylabel('Hydro Power Output 80%\nPrediction Interval Size (MW)')
ax.set_xlim(0, 365)
ax.set_ylim(0)

Day 115 is most likely to have the greatest variation in hydro power output





(0.0, 1724.0724938300584)

png