Estimating Remaining Useful Life (RUL) of Commercial Jet Engines

"Pratt & Witney Jet Engine" by Erwyn van der Meer is marked with CC BY-NC-ND 2.0. To view the terms, visit https://creativecommons.org/licenses/by-nd-nc/2.0/jp/?ref=openverse

Project Overview¶

Expensive and critical equipment such as commercial jet engines are nowadays equipped with sensors that collect condition monitoring sensor data. We can use this data to build models that estimate the remaining useful life (RUL) of these equipment. Using these estimates, equipment with low estimated RUL can then be serviced or replaced before a catastrophic failure occurs.

In this project we will use the Commercial Modular Aero-Propulsion System Simulation (CMAPSS) data set to estimate the remaining useful life (RUL) of each engine units.

Problem Statement¶

The data set consists of a set of timeseries sensor reading values for each engine unit for each flight. We will be training a regression machine learning model that takes in the sensor reading values as input features and produce an estimated remaining life cycle count of the engine.

Metrics¶

We will be using root-mean-square error (RMSE) to measure the accuracy of our model. RMSE is given by the following equation:

$\large RMSE = \sqrt{\frac{1}{N}\Sigma_{i=1}^{N}{\Big({y - \hat{y}}\Big)^2}} $

where y is the actual RUL in units of engine cycles and y_hat is the predicted RUL value.

Imports¶

In [2]:
import os
import h5py
import time
import math
import random

import numpy as np
import pandas as pd
from pandas import DataFrame

import xgboost as xgb

import matplotlib.pyplot as plt
from matplotlib import gridspec
%matplotlib inline

from sklearn.preprocessing import StandardScaler
In [3]:
random.seed(13)
np.random.seed(13)

Data Set Description¶

We will use the prognostics benchmark data set, N-CMPASS, prepared by Arias Chao, M et al [3]. The data set contains collections of multivariate timeseries sensor readings for a set of turbofan engines in various flight conditions. The entire data set contains 8 such collections, but during the project implementation, we will focus on only one collection due to time constraint. Each collection is in Hierarchical Data Format version 5, or .h5 file format.

For example, the dataset number 2 in the collection contains the following. The input timeseries contain 14 sensor readings, 4 flight scenario related variables, and 4 auxiliary variables for 9 different turbofan units. The entire timeseries is 6.5M time stamps long. The target variable is Remaining Useful Life (RUL) in units of engine cycles. Using the multivariate sensor reading data, target variable RUL can be predicted.

The data set size is in zipped file format around 15 GB and can be downloaded at https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/, under the section Turbofan Engine Degradation Simulation Data Set-2.

Loading the Data¶

In [4]:
def load_data(file_name):
    with h5py.File(file_name, 'r') as hdf:
        # Development set
        W_dev = np.array(hdf.get('W_dev'))             # W
        X_s_dev = np.array(hdf.get('X_s_dev'))         # X_s
        Y_dev = np.array(hdf.get('Y_dev'))             # RUL  
        A_dev = np.array(hdf.get('A_dev'))             # Auxiliary

        # Test set
        W_test = np.array(hdf.get('W_test'))           # W
        X_s_test = np.array(hdf.get('X_s_test'))       # X_s
        Y_test = np.array(hdf.get('Y_test'))           # RUL  
        A_test = np.array(hdf.get('A_test'))           # Auxiliary

        # Variable names
        W_var = np.array(hdf.get('W_var'))
        X_s_var = np.array(hdf.get('X_s_var'))  
        A_var = np.array(hdf.get('A_var'))
        W_var = list(np.array(W_var, dtype='str'))
        X_s_var = list(np.array(X_s_var, dtype='str'))  
        A_var = list(np.array(A_var, dtype='str'))
        
    column_names = {
        'X_s_var' : X_s_var,
        'W_var' : W_var,
        'A_var' : A_var
    }
    df_dev = pd.DataFrame(np.concatenate((X_s_dev, W_dev, A_dev, Y_dev), axis = 1), columns = X_s_var + W_var + A_var + ['y'])
    df_test = pd.DataFrame(np.concatenate((X_s_test, W_test, A_test, Y_test), axis = 1), columns = X_s_var + W_var + A_var + ['y'])
    
    df_dev.columns = df_dev.columns.astype('str')
    df_dev = df_dev.astype({'unit': 'int32'})
    df_dev = df_dev.astype({'cycle': 'int32'})
    df_dev = df_dev.astype({'Fc': 'int32'})

    df_test.columns = df_test.columns.astype('str')
    df_test = df_test.astype({'unit': 'int32'})
    df_test = df_test.astype({'cycle': 'int32'})
    df_test = df_test.astype({'Fc': 'int32'})
    
    # Delete temp arrays to save memory
    del X_s_dev
    del W_dev
    del X_s_test
    del W_test
    
    return (df_dev, df_test, A_dev, A_test, column_names)
In [5]:
df_dev, df_test, A_dev, A_test, column_names = load_data('data_set/N-CMAPSS_DS02-006.h5')

Data Exploration¶

We will first explore how many flights in total are in the development and test datasets.

In [6]:
df_dev[['unit','cycle']].groupby(['unit'], as_index=False).max()
Out[6]:
unit cycle
0 2 75
1 5 89
2 10 82
3 16 63
4 18 71
5 20 66

In the development data set, we observe that a total of 6 Engine units (Unit 2, 5, 10, 16, 18 and 20) are flown between 63 cycles and 89 cycles. A total of 446 cycles are available in the development data set.

In [7]:
df_test[['unit','cycle']].groupby(['unit'], as_index=False).max()
Out[7]:
unit cycle
0 11 59
1 14 76
2 15 67

In the test data set, we see three engine units (Unit 11, 14 and 15) are flown between 59 to 76 cycles each.

Now we will plot a typical flight profile.

In [8]:
# We plot the altitude and speed of the unit
def plot_flight_profile(data, unit, cycle):
    flight = data[(data.unit == unit) & (data.cycle == cycle)].reset_index()
    
    fig, (alt_plot, speed_plot) = plt.subplots(1, 2, figsize=(12,5))
    fig.suptitle(f"Unit {int(unit)} - Cycle {int(cycle)} - {len(flight)} Total flight data points")
    
    alt_plot.plot(flight.alt)
    alt_plot.title.set_text("Altitude")
    
    # Errors
    speed_plot.plot(flight.Mach)
    speed_plot.title.set_text("Speed (Mach)")
In [9]:
plot_flight_profile(df_dev, 10, 1)

We observe the aircraft climbing to altitute of around 30,000 feet and cruised at speed more than 0.7 mach, which is typical for a commerical jet aircaft.

Here we also note, that each sensor reading is taken every second, so total flight data points of 11764 corresponds to 11764 seconds or 3.2 hours.

Next, we explore the various sensor readings. The data set contains measurements (x_s) and virtual sensor measurements (X_v) for each time step. For this analysis, we will be using only the actual measurable senor readings (x_s), as virtual sensor readings will not be availble in real world setting.

In [10]:
print(f"The sensors in the data set are {column_names['X_s_var']}")
The sensors in the data set are ['T24', 'T30', 'T48', 'T50', 'P15', 'P2', 'P21', 'P24', 'Ps30', 'P40', 'P50', 'Nf', 'Nc', 'Wf']

The definition of the sensors are given as following by Arias Chao, M et al.

and the location of the sensors in the engine are shown below in the same paper

We will box plot each sensor reading and compare between a unit with high RUL and a unit with Low RUL.

In [11]:
def plot_compare_sensor(data, sensor_columns):
    high_rul_unit = data[data['y'] > 50] # More than 50 cycles remaining useful life
    low_rul_unit = data[data['y'] < 5] # Fewer than 5 cycles remaining
    
    fig, _ = plt.subplots(7, 2, figsize=(12,30))
    #fig.suptitle("Sensor Values")
    
    for i, ax in enumerate(fig.axes):
        ax.boxplot([high_rul_unit[sensor_columns[i]], low_rul_unit[sensor_columns[i]]], showfliers=False, labels=['High RUL', 'Low RUL'])
        ax.title.set_text(f"{sensor_columns[i]}")
In [12]:
plot_compare_sensor(df_dev, column_names['X_s_var'])

From the aggregate sensor readings alone, we will not be able to distinguish well between high RUL engines and low RUL engines.

Benchmark Modeling with XGBoost¶

Next, we will use a gradient boosted decision tree to get a baseline prediction.

In [13]:
X_dev = df_dev[column_names['X_s_var'] + column_names['W_var']]
y_dev = df_dev['y']

X_test = df_test[column_names['X_s_var'] + column_names['W_var']]
y_test = df_test['y']

dtrain = xgb.DMatrix(X_dev, label = y_dev)
dtest = xgb.DMatrix(X_test)
In [14]:
param = {
    'max_depth': 12, 
    'eta':0.1, 
    'objective':'reg:squarederror' 
}

num_boost_round = 100

bst = xgb.train(param, dtrain, num_boost_round)
y_pred = bst.predict(dtest)
In [15]:
def rmse(y, y_pred):
    return np.sqrt(np.mean((y - y_pred)**2))
In [16]:
rmse(y_test, y_pred)
Out[16]:
8.749051872719608

We obtain a benchmark model RMSE of 8.75 using the gradient boosted model.

In [36]:
xgb.plot_importance(bst)
Out[36]:
<AxesSubplot:title={'center':'Feature importance'}, xlabel='F score', ylabel='Features'>

Next we proceed to model using a more complex model such as 1 Dimensional CNN.

Capturing Timeseries Patterns with 1-Dimensional Convolutional Neural Network¶

Explanation of how 1D CNN works

Preparing data for 1DCNN¶

For the input into 1DCNN network, we will first need to prepare the input data to have a fixed length.

In [16]:
class CMAPSSDataset(Dataset):
    def __init__(self, training_data):
        self.training_data = training_data
        
    def __len__(self):
        return len(self.training_data)

    def __getitem__(self, idx):
        return np.transpose(self.training_data[idx][0]).astype(np.float32), self.training_data[idx][1].astype(np.float32)
In [17]:
def groupby_unit_and_cycle(df, unit, cycle):
    # For each unit and cycle, get the sensor and W readings
    X_s_uc_sensor = df[(df.unit == unit) & (df.cycle == cycle)][column_names['X_s_var'] + column_names['W_var']]
    y_uc = df[(df.unit == unit) & (df.cycle == cycle)]['y']
    
    X_s_uc = pd.concat([X_s_uc_sensor, y_uc], axis=1)
    return X_s_uc

flights_dev = []
for unit in df_dev.unit.unique():
    for cycle in df_dev[df_dev.unit == unit]['cycle'].unique():
        flights_dev.append(groupby_unit_and_cycle(df_dev, unit, cycle))
        
flights_test = []
for unit in df_test.unit.unique():
    for cycle in df_test[df_test.unit == unit]['cycle'].unique():
        flights_test.append(groupby_unit_and_cycle(df_test, unit, cycle))
        
# Fit the Standard Scaler
scaler = StandardScaler()
scaler.fit(pd.concat(flights_dev, axis = 0).drop('y', axis = 1))
        
print(f'Dev Data Set Flights: {len(flights_dev)}')
print(f'Test Data Set Flights: {len(flights_test)}')
Dev Data Set Flights: 446
Test Data Set Flights: 202
In [18]:
# Define the input length
input_length = 100

# Define the validation data percent
val_data_percent = 0.05 # 5 percent
In [19]:
def create_fixed_length_data(X, y, input_length = 100):
    data = []
    i = 0
    while True:
        truncated_X = X[i * int(input_length/2): i * int(input_length/2) + input_length] # Take first n seconds
        if truncated_X.shape[0] < input_length:
            truncated_X = X[-input_length:]
            data.append((truncated_X, y.iloc[0]))
            break
        data.append((truncated_X, y.iloc[0]))
        i += 1
    return data
In [20]:
train_data = []
val_data = []

for flight in flights_dev:
    flight_y = flight['y']
    flight_x = scaler.transform(flight.drop('y', axis=1))
    fixed_length_data = create_fixed_length_data(flight_x, flight_y, input_length)
    if random.random() >= val_data_percent:
        train_data += fixed_length_data
    else:
        val_data += fixed_length_data

print(f'Total training flight segments: {len(train_data)}')
print(f'Total validation flight segments: {len(val_data)}')
Total training flight segments: 99683
Total validation flight segments: 5381
In [21]:
cmapss_train_dataset = CMAPSSDataset(train_data)
cmapss_val_dataset = CMAPSSDataset(val_data)
train_dataloader = DataLoader(cmapss_train_dataset, batch_size=100, shuffle=True)
val_dataloader = DataLoader(cmapss_val_dataset, batch_size=100, shuffle=True)
In [22]:
cmapss_train_dataset[0][0].shape
Out[22]:
(18, 100)

Define PyTorch CNN Network¶

In [25]:
class CMAPSS1DCNN(nn.Module):
    def __init__(self, input_channels, input_length, hyper_params):
        super(CMAPSS1DCNN, self).__init__()
                
        self.one_by_one_conv = nn.Sequential(
            nn.Conv1d(input_channels, input_channels, 1),
            nn.ReLU())
        
        self.conv_layers = nn.ModuleList(
            [nn.Sequential(
                nn.Conv1d(input_channels, input_channels, hyper_params['conv_kernel_size'], padding = int((hyper_params['conv_kernel_size'] - 1) / 2)),
                nn.ReLU()) for i in range(hyper_params['num_conv_layers'])]
        )
        
        self.flatten = nn.Flatten()

        self.linear_relu_stack = nn.Sequential(
              nn.Linear(input_channels * input_length, hyper_params['linear_layer_size']),
              nn.Tanh(),
              nn.Linear(hyper_params['linear_layer_size'], 1))

    def forward(self, x):
        residual = self.one_by_one_conv(x)
        y = residual
                
        for i, conv_layer in enumerate(self.conv_layers):
            y = conv_layer(y) + residual
        
        y = self.flatten(y)
        y = self.linear_relu_stack(y)
        return torch.squeeze(y)
In [ ]:
# Add RayTune Library here to parallely tune hyper parameters
In [ ]:
# Add Wandb
In [26]:
hyper_params = {
    'num_conv_layers' : 6,
    'linear_layer_size': 500,
    'conv_kernel_size': 3
}

input_channels = cmapss_train_dataset[0][0].shape[0]
input_length = cmapss_train_dataset[0][0].shape[1]

cnn_model = CMAPSS1DCNN(input_channels, input_length, hyper_params = hyper_params)
cnn_model.to(device)

print(cnn_model)
CMAPSS1DCNN(
  (one_by_one_conv): Sequential(
    (0): Conv1d(18, 18, kernel_size=(1,), stride=(1,))
    (1): ReLU()
  )
  (conv_layers): ModuleList(
    (0): Sequential(
      (0): Conv1d(18, 18, kernel_size=(3,), stride=(1,), padding=(1,))
      (1): ReLU()
    )
    (1): Sequential(
      (0): Conv1d(18, 18, kernel_size=(3,), stride=(1,), padding=(1,))
      (1): ReLU()
    )
    (2): Sequential(
      (0): Conv1d(18, 18, kernel_size=(3,), stride=(1,), padding=(1,))
      (1): ReLU()
    )
    (3): Sequential(
      (0): Conv1d(18, 18, kernel_size=(3,), stride=(1,), padding=(1,))
      (1): ReLU()
    )
    (4): Sequential(
      (0): Conv1d(18, 18, kernel_size=(3,), stride=(1,), padding=(1,))
      (1): ReLU()
    )
    (5): Sequential(
      (0): Conv1d(18, 18, kernel_size=(3,), stride=(1,), padding=(1,))
      (1): ReLU()
    )
  )
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=1800, out_features=500, bias=True)
    (1): Tanh()
    (2): Linear(in_features=500, out_features=1, bias=True)
  )
)

Training the Network¶

In [27]:
# Change here for CNN vs LSTM
model = cnn_model
In [81]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    train_loss = 0
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        X = X.to(device)
        y = y.to(device)
        pred = model(X)
        loss = loss_fn(pred, y)
        train_loss += loss.item()

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    train_loss /= num_batches
    print(f"Train Avg loss: {train_loss:>8f} \n")

def val_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    val_loss = 0
    error = 0

    with torch.no_grad():
        for X, y in dataloader:
            X = X.to(device)
            y = y.to(device)
            pred = model(X)
            val_loss += loss_fn(pred, y).item()
            error += torch.square(y - pred).sum()

    val_loss /= num_batches
    val_rmse = math.sqrt (error / size)
    print(f"Val Error: \n RMSE: {val_rmse:>8f}, Val Avg loss: {val_loss:>8f} \n")
In [82]:
loss_function = nn.MSELoss()
alpha_zero = 1e-3
lr_decay_rate = 0.05
optimizer = torch.optim.Adam(model.parameters(), lr=alpha_zero, weight_decay = 0.0)

start_time = time.time()
epochs = 20

for epoch in range(epochs):    
    # Learning rate decay
    lr = alpha_zero / (1 + lr_decay_rate * epoch)
    optimizer.param_groups[0]['lr'] = lr
    
    print(f"Epoch {epoch}\n-------------------------------")
    train_loop(train_dataloader, model, loss_function, optimizer)
    val_loop(val_dataloader, model, loss_function)
    
print("Done!")
print("--- %s seconds ---" % (time.time() - start_time))
Epoch 0
-------------------------------
Train Avg loss: 526.658233 

Val Error: 
 RMSE: 24.791911, Val Avg loss: 617.981371 

Epoch 1
-------------------------------
Train Avg loss: 450.268984 

Val Error: 
 RMSE: 13.000874, Val Avg loss: 183.920892 

Epoch 2
-------------------------------
Train Avg loss: 71.517646 

Val Error: 
 RMSE: 6.462801, Val Avg loss: 40.414214 

Epoch 3
-------------------------------
Train Avg loss: 51.464455 

Val Error: 
 RMSE: 7.363230, Val Avg loss: 52.876593 

Epoch 4
-------------------------------
Train Avg loss: 48.501946 

Val Error: 
 RMSE: 6.721429, Val Avg loss: 44.987722 

Epoch 5
-------------------------------
Train Avg loss: 43.767271 

Val Error: 
 RMSE: 6.564008, Val Avg loss: 42.098041 

Epoch 6
-------------------------------
Train Avg loss: 44.900655 

Val Error: 
 RMSE: 6.955630, Val Avg loss: 47.673742 

Epoch 7
-------------------------------
Train Avg loss: 43.172047 

Val Error: 
 RMSE: 7.796412, Val Avg loss: 59.687594 

Epoch 8
-------------------------------
Train Avg loss: 42.247078 

Val Error: 
 RMSE: 6.326452, Val Avg loss: 39.857030 

Epoch 9
-------------------------------
Train Avg loss: 41.959968 

Val Error: 
 RMSE: 6.360924, Val Avg loss: 40.485428 

Epoch 10
-------------------------------
Train Avg loss: 41.990871 

Val Error: 
 RMSE: 6.341487, Val Avg loss: 43.687218 

Epoch 11
-------------------------------
Train Avg loss: 40.821624 

Val Error: 
 RMSE: 5.961547, Val Avg loss: 34.723103 

Epoch 12
-------------------------------
Train Avg loss: 39.941932 

Val Error: 
 RMSE: 6.217217, Val Avg loss: 40.964421 

Epoch 13
-------------------------------
Train Avg loss: 39.640183 

Val Error: 
 RMSE: 6.840417, Val Avg loss: 45.556415 

Epoch 14
-------------------------------
Train Avg loss: 39.703224 

Val Error: 
 RMSE: 6.162441, Val Avg loss: 36.923182 

Epoch 15
-------------------------------
Train Avg loss: 39.278862 

Val Error: 
 RMSE: 6.341795, Val Avg loss: 42.118928 

Epoch 16
-------------------------------
Train Avg loss: 38.202816 

Val Error: 
 RMSE: 5.808239, Val Avg loss: 32.768902 

Epoch 17
-------------------------------
Train Avg loss: 37.914707 

Val Error: 
 RMSE: 6.034233, Val Avg loss: 35.665258 

Epoch 18
-------------------------------
Train Avg loss: 39.325867 

Val Error: 
 RMSE: 5.858417, Val Avg loss: 35.263573 

Epoch 19
-------------------------------
Train Avg loss: 37.602668 

Val Error: 
 RMSE: 6.298915, Val Avg loss: 39.675915 

Done!
--- 1763.282726764679 seconds ---

Scoring the test Data¶

In [ ]:
# Set the model to eval mode
model.eval()
In [84]:
for flight in flights_test:
    flight_y = flight['y'].iloc[0]
    # this is so that it can be scored multiple times
    flight_x = scaler.transform(flight.drop(['y', 'y_pred'], axis=1, errors='ignore'))
    y_partial_preds = []
    
    i = 0
    break_loop = False
    while True:    
        truncated_X = flight_x[i * int(input_length/2) : i * int(input_length/2) + input_length]
        
        if truncated_X.shape[0] < input_length:
            truncated_X = flight_x[-input_length:]
            break_loop = True
        
        with torch.no_grad():
            X_input = torch.from_numpy(np.transpose(truncated_X).astype(np.float32))
            X_input = X_input.to(device)
            X_input = torch.unsqueeze(X_input, 0)
            pred = model(X_input)
            y_partial_preds.append(pred.item())
        
        if break_loop:
            break
        i += 1
    
    y_pred = sum(y_partial_preds)/len(y_partial_preds)
    flight['y_pred'] = y_pred
In [85]:
flights_test_df = pd.concat(flights_test, axis = 0)
In [86]:
# Put back the flight information for further analysis
A_test_df = pd.DataFrame(A_test, columns = column_names['A_var'])
flights_test_df = pd.concat((flights_test_df, A_test_df), axis = 1)
In [87]:
test_flight_summary = flights_test_df.groupby(['unit', 'cycle', 'hs'], as_index = False)[['y','y_pred']].mean()
In [88]:
test_flight_summary['error'] = test_flight_summary['y'] - test_flight_summary['y_pred'] 
test_flight_summary['error_squared'] = test_flight_summary['error']**2
In [89]:
rmse_test = math.sqrt(sum(test_flight_summary['error_squared'])/len(test_flight_summary))
In [90]:
print(f'RMSE for the test data is {rmse_test}')
Out[90]:
6.376264294260416

Exploration of test errors¶

In [67]:
def plot_RUL(df, unit):
    
    fig, (pred_plot, error_plot) = plt.subplots(1, 2, figsize=(12,5))
    fig.suptitle(f"Unit {int(unit)}")
    
    # Pred vs Y
    pred_plot.plot(df.cycle, df.y)
    pred_plot.plot(df.cycle, df.y_pred)
    pred_plot.title.set_text("Prediction vs Actual RUL")
    
    # Errors
    error_plot.plot(df.cycle, df.error, 'r')
    hs_plot = error_plot.twinx()
    hs_plot.plot(df.cycle, df.hs, 'b+', alpha=.5)    
    error_plot.title.set_text("Error")
In [69]:
for test_unit in test_flight_summary.unit.unique():
    plot_RUL(test_flight_summary[test_flight_summary.unit == test_unit], test_unit)

We observe that the CNN model does has higher error distribution when the engines are performing well, and lower errors when the engines start to develop health issues.

In [ ]:
# TODO: Discussion of Result
In [ ]:
# TODO: Further research - digital twins