Skip to content

Wind Power Curve Fitting & Cleaning

Binder

In this notebook we'll look at how we can use our LOWESS methods to first fit a power curve for a wind turbine, then estimate the uncertainty in our results. We'll then utilise a feature of the quantile LOWESS fits to demonstrate how these techniques can also be used for cleaning raw turbine data.


Imports

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from moepy import lowess, eda


Power Curve Fitting

We'll start by loading in some clean turbine output and wind speed data, this was sourced from the Power Curve Working Group Analysis example data repository.

df_pcwga = (pd
            .read_csv('../data/lowess_examples/turbine_power_wind_speed_clean.csv')
            [['TimeStamp', 'Turbine Wind Speed Mean', 'Turbine Power']]
            .replace(-99.99, np.nan)
            .dropna()
           )

df_pcwga.head()
Unnamed: 0 TimeStamp Turbine Wind Speed Mean Turbine Power
0 07/10/2011 12:50 15.51 1996.91
1 07/10/2011 13:00 15.7101 1987.74
2 07/10/2011 13:10 16.6708 1991.9
3 07/10/2011 13:20 15.2098 1987.7
4 07/10/2011 13:30 15.44 1991.03


We'll then fit a standard LOWESS and visualise the results

%%time

x = df_pcwga['Turbine Wind Speed Mean'].values
y = df_pcwga['Turbine Power'].values

lowess_model = lowess.Lowess()
lowess_model.fit(x, y, frac=0.2, num_fits=100)

x_pred = np.linspace(0, 25, 101)
y_pred = lowess_model.predict(x_pred)

# Plotting
fig, ax = plt.subplots(dpi=150)

ax.plot(x_pred, y_pred, label='Robust LOWESS', color='r', linewidth=1)
ax.scatter(x, y, label='Observations', s=0.5, color='k', linewidth=0, alpha=1)

eda.hide_spines(ax)
ax.set_xlabel('Wind Speed (m/s)')
ax.set_ylabel('Power Output (MW)')
ax.set_xlim(0, 26)
ax.set_ylim(-25)

lgnd = ax.legend(frameon=False) 
lgnd.legendHandles[1]._sizes = [10]
lgnd.legendHandles[1].set_alpha(1)
Wall time: 512 ms

png


This looks good but we can do more. In this next step we'll estimate the upper and lower quantiles of the power curve fit that represent a prediction interval of 68%.

# Estimating the quantiles
df_quantiles = lowess.quantile_model(x, y, frac=0.2, qs=[0.16, 0.84], num_fits=40)

# Cleaning names and sorting for plotting
df_quantiles.columns = [f'p{int(col*100)}' for col in df_quantiles.columns]
df_quantiles = df_quantiles[df_quantiles.columns[::-1]]

df_quantiles.head()
100% 2/2 [00:06<00:03, 3.19s/it]
x p84 p16
0.323938 -16.2569 -8.34796
0.463498 -14.4719 -8.53898
0.475784 -14.3153 -8.55634
0.503802 -13.9573 -8.59603
0.546645 -13.4059 -8.65677


We'll visualise this fit within the domain where the quantiles do not cross

valid_pred_intvl_idx = (df_quantiles['p84']-df_quantiles['p16']).pipe(lambda s: s[s>0]).index

# Plotting
fig, ax = plt.subplots(dpi=150)

ax.scatter(x, y, s=0.5, color='k', linewidth=0, alpha=1, label='Observations')
ax.fill_between(valid_pred_intvl_idx, df_quantiles.loc[valid_pred_intvl_idx, 'p16'], df_quantiles.loc[valid_pred_intvl_idx, 'p84'], color='r', alpha=0.25, label='68% Prediction Interval')

eda.hide_spines(ax)
ax.legend(frameon=False)
ax.set_xlabel('Wind Speed (m/s)')
ax.set_ylabel('Power Output (MW)')
ax.set_xlim(0, 25)
ax.set_ylim(0)

lgnd = ax.legend(frameon=False) 
lgnd.legendHandles[0]._sizes = [20]
lgnd.legendHandles[0].set_alpha(1)

png


With the prediction interval we've looked at the likely range of power output for given wind speeds, but what if instead of the range of the underlying values we wanted to know the range in our estimate of the average power curve? For this we can use confidence intervals, which express the certainty we have in the particular statistical parameter we're calculating.

In order to estimate this uncertainty we'll first bootstrap our model.

df_bootstrap = lowess.bootstrap_model(x, y, num_runs=500, frac=0.2, num_fits=30)

df_bootstrap.head()
100% 500/500 [00:55<00:00, 0.11s/it]
x 0 1 2 3 4 5 6 7 8 9 ... 490 491 492 493 494 495 496 497 498 499
15.51 1977.78 1978.05 1978.95 1978.33 1979.43 1978.33 1978.78 1979.79 1979.47 1979 ... 1978.14 1978.5 1977.23 1977.66 1976.37 1976.28 1978.61 1978.84 1977.94 1977.22
15.7101 1978.48 1978.6 1979.52 1979.05 1979.96 1979.06 1979.39 1980.34 1980.1 1979.74 ... 1978.92 1979.14 1977.82 1978.38 1977.14 1977.14 1979.45 1979.56 1978.61 1977.96
16.6708 1981.24 1981.17 1981.83 1981.57 1981.87 1981.95 1981.61 1982.23 1982.3 1982.31 ... 1981.98 1981.56 1980.13 1981.42 1980.3 1980.6 1982.85 1981.93 1981.15 1980.83
15.2098 1976.43 1977.04 1977.8 1977.1 1978.41 1976.9 1977.58 1978.67 1978.21 1977.58 ... 1976.64 1977.27 1976.05 1976.45 1974.94 1974.57 1977.02 1977.52 1976.61 1975.82
15.44 1977.49 1977.84 1978.72 1978.03 1979.21 1978.03 1978.52 1979.56 1979.2 1978.69 ... 1977.83 1978.23 1976.99 1977.4 1976.06 1975.93 1978.28 1978.54 1977.66 1976.92


From the bootstrapped results we can then extract the confidence intervals, in our case we'll look at the range covering 95% of our estimates.

df_conf_intvl = lowess.get_confidence_interval(df_bootstrap, conf_pct=0.95)

# Plotting
fig, ax = plt.subplots(dpi=150)

ax.scatter(x, y, s=0.5, color='k', linewidth=0, alpha=1, label='Observations')
ax.fill_between(df_conf_intvl.index, df_conf_intvl['min'], df_conf_intvl['max'], color='r', alpha=1, label='95% Confidence')

eda.hide_spines(ax)
ax.legend(frameon=False)
ax.set_xlabel('Wind Speed (m/s)')
ax.set_ylabel('Power Output (MW)')
ax.set_xlim(0, 25)
ax.set_ylim(0)

lgnd = ax.legend(frameon=False) 
lgnd.legendHandles[0]._sizes = [20]
lgnd.legendHandles[0].set_alpha(1)

png


We'll now visualise how the width of the confidence interval changes with wind speed. Interestingly the two troughs in the confidence interval width appear to correspond to the cut-in and start of the rated power wind speeds.

fig, ax = plt.subplots(dpi=150)

df_conf_intvl.diff(axis=1)['max'].plot(ax=ax, color='r', linewidth=1)

eda.hide_spines(ax)
ax.legend(frameon=False)
ax.set_xlabel('Wind Speed (m/s)')
ax.set_ylabel('95% Confidence\nInterval Width (MW)')
ax.set_xlim(0, 25)
ax.set_ylim(0)
ax.get_legend().remove()

png

# take the peak, then get the troughs on either side and print some info
# would be interesting to compare against these fits - https://www.hindawi.com/journals/jen/2016/8519785/tab1/


Power Curve Cleaning

We'll start by loading the raw turbine wind speed and output data in

df_raw_pc = pd.read_csv('../data/lowess_examples/turbine_power_wind_speed_raw.csv')

df_raw_pc.head(3)
Unnamed: 0 UTC active_power wind_speed
0 30/09/2017 18:26 14 4.02
1 30/09/2017 18:56 14 4.02
2 30/09/2017 19:26 14 4.02


We'll then try and fit a LOWESS estimate for the power curve.

%%time

x = df_raw_pc['wind_speed'].values
y = df_raw_pc['active_power'].values

lowess_model = lowess.Lowess()
lowess_model.fit(x, y, frac=0.2, num_fits=100)

x_pred = np.linspace(0, 25, 101)
y_pred = lowess_model.predict(x_pred)

# Plotting
fig, ax = plt.subplots(dpi=150)

ax.plot(x_pred, y_pred, label='Robust LOWESS', color='r', linewidth=1)
ax.scatter(x, y, label='Observations', s=0.5, color='k', linewidth=0, alpha=1)

eda.hide_spines(ax)
ax.set_xlabel('Wind Speed (m/s)')
ax.set_ylabel('Power Output (MW)')
ax.set_xlim(0, 26)
ax.set_ylim(-25)

lgnd = ax.legend(frameon=False) 
lgnd.legendHandles[1]._sizes = [10]
lgnd.legendHandles[1].set_alpha(1)
Wall time: 788 ms

png


Unfortunately the fit is thrown by the large number of occurences where the farm is under-powered or set to output 0, we want to remove these so that we can estimate the 'standard' power curve. We'll create a quantile LOWESS fit to see if that helps us understand the data any better.

# Estimating the quantiles
df_quantiles = lowess.quantile_model(x, y, frac=0.2, qs=np.linspace(0.025, 0.975, 41), num_fits=40)

# Cleaning names and sorting for plotting
df_quantiles.columns = [f'p{int(col*100)}' for col in df_quantiles.columns]
df_quantiles = df_quantiles[df_quantiles.columns[::-1]]

df_quantiles.head()
100% 41/41 [02:52<00:05, 4.20s/it]
x p97 p95 p92 p90 p88 p85 p83 p80 p78 p76 ... p23 p21 p19 p16 p14 p12 p9 p7 p4 p2
0.25 -31.1389 -36.0637 -48.7875 -50.2252 -54.3752 -61.8818 -64.9263 -67.3705 -68.176 -69.0324 ... -78.7881 -73.9447 -69.322 -63.1938 -58.9408 -54.0337 -50.6578 -49.2366 -50.801 -51.8248
0.33 -24.4137 -30.6108 -43.6044 -45.5205 -49.913 -57.5452 -60.7292 -63.275 -64.2354 -65.2338 ... -77.3919 -72.8946 -68.5794 -62.7402 -58.6804 -53.9613 -50.6087 -49.1445 -50.546 -51.5789
0.41 -17.8378 -25.26 -38.5274 -40.8969 -45.5215 -53.2747 -56.5928 -59.2317 -60.3441 -61.4871 ... -76.014 -71.8585 -67.8522 -62.3073 -58.4534 -53.9134 -50.5746 -49.0702 -50.2914 -51.3353
0.49 -11.3659 -19.9748 -33.5237 -36.3257 -41.1744 -49.0455 -52.4947 -55.22 -56.4821 -57.7727 ... -74.6419 -70.8232 -67.1261 -61.8803 -58.2436 -53.8769 -50.5466 -49.0072 -50.0365 -51.0934
0.57 -4.9476 -14.7154 -28.557 -31.7751 -36.8429 -44.8302 -48.4098 -51.2177 -52.6278 -54.069 ... -73.2619 -69.7745 -66.3859 -61.4438 -58.0338 -53.8375 -50.5151 -48.9481 -49.7806 -50.8524


Plotting these we can see an interesting relationship, where for many of the lower quantiles the estimate peaks then quickly drops to around 0. It is the area below and to the right of these peaks that we want to remove from our power curve estimate.

fig, ax = plt.subplots(dpi=150)

ax.scatter(x, y, s=0.1, color='k', alpha=1)
df_quantiles.plot(cmap='viridis', legend=False, ax=ax)

eda.hide_spines(ax)
ax.set_xlabel('Wind Speed (m/s)')
ax.set_ylabel('Power Output (MW)')
ax.set_xlim(0, 26)
ax.set_ylim(0)
(0.0, 2557.302509884304)

png


We'll first identify the ratio between the peak value and the final value (above speeds of 25 m/s). We'll then remove points after the peak where the ratio between their peak and final values exceeds a defined threshold (in our case 2).

N.b. there's probably a much nicer way to do this where the for loop isn't needed, this would be handy to implement as it would be good to have a vector containing all of the indexes that have been removed.

exceeded_quantiles = ((df_quantiles.max()/df_quantiles.iloc[-1].clip(0.1) > 2)
                      .replace(False, np.nan)
                      .dropna()
                      .index
                     )

s_maxs = df_quantiles[exceeded_quantiles].max()
s_idxmaxs = df_quantiles[exceeded_quantiles].idxmax()

cleaned_x = x
cleaned_y = y

for exceeded_quantile in exceeded_quantiles:
    min_x = s_idxmaxs[exceeded_quantile]
    max_y = s_maxs[exceeded_quantile]

    idxs_to_remove = (cleaned_x > min_x) & (cleaned_y < max_y)

    cleaned_x = cleaned_x[~idxs_to_remove]
    cleaned_y = cleaned_y[~idxs_to_remove]


Visualising the results we can clearly see that the unwanted periods have been removed

fig, axs = plt.subplots(dpi=250, ncols=2, figsize=(10, 4))

axs[0].scatter(x, y, s=0.1, color='k', alpha=1)
axs[1].scatter(cleaned_x, cleaned_y, s=0.1, color='k', alpha=1)

axs[0].set_title('Original')
axs[1].set_title('Cleaned')

for ax in axs:
    eda.hide_spines(ax)
    ax.set_xlabel('Wind Speed (m/s)')
    ax.set_ylabel('Power Output (MW)')

png


We're now ready to make our power curve LOWESS estimate again

%%time

lowess_model = lowess.Lowess()
lowess_model.fit(cleaned_x, cleaned_y, frac=0.2, num_fits=100)

x_pred = np.linspace(0, 25, 101)
y_pred = lowess_model.predict(x_pred)

# Plotting
fig, ax = plt.subplots(dpi=150)

ax.plot(x_pred, y_pred, label='Robust LOWESS', color='r', linewidth=1)
ax.scatter(cleaned_x, cleaned_y, label='Observations', s=0.5, color='k', linewidth=0, alpha=1)

eda.hide_spines(ax)
ax.set_xlabel('Wind Speed (m/s)')
ax.set_ylabel('Power Output (MW)')
ax.set_xlim(0, 26)
ax.set_ylim(-25)

lgnd = ax.legend(frameon=False) 
lgnd.legendHandles[1]._sizes = [10]
lgnd.legendHandles[1].set_alpha(1)
Wall time: 728 ms

png