Time Series Tips & Tricks

Posted by John Lemaster on January 23, 2020

I just finished a Python time series project. I spent more time on this project, then I had anticipated. I thought that I would share some learner insights as well as the code to save others time doing time series for the first time.

There are a bunch of great resources from Medium blogs to Google search results to get all the code that you need to convert to datetime with pandas so I will not write about that code. Here are some of the tips and tricks and functions that I think could be of value to you.

Oh real quick, I want to make sure that I mention the pandas .get_group method. I have somehow missed this great method with all the Python learning I have been doing for the last several month. It use to build a DataFrame when using groupby. See below for example of the code that I used:

 df2 = df.drop('Month', axis=1).groupby(‘City').get_group("Chicago")

 Here's the generic form of the code from pandas:  GroupBy.get_group(self, name, obj=None) 

 and the Link to Pandas documentation:  https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.get_group.html. 

Okay, here’s the functions that you can use as templates to build your own. In advance, I must recognize a couple of colleagues for their help:

 * James Irving, my Flatiron instructor.  
 * Terry Ollila, one of my Flatiron instructors

The first function can be used over and over to get Time Series (ts) with a column name. By the way, this data as well as all the data for these funtions are from Zillow in a CSV file format that listed home prices by zip codes.

 def get_ts_from_chicago(chicago, group_name):

     ts = chicago.groupby('RegionName').get_group(group_name)
     ts = ts.resample('MS').asfreq()
     ts = ts['MeanValue'].rename(group_name)
     return ts

A function to plot Autocorrelation & Partial Autocorrelation:

 def plot_acf_pacf(ts, figsize=(10,6),lags=15):
     fig,ax = plt.subplots(nrows=2,figsize=figsize)
    plot_acf(ts,ax=ax[0],lags=lags)
    plot_pacf(ts, ax=ax[1],lags=lags) 
    plt.tight_layout()

    for a in ax:
        a.xaxis.set_major_locator(mpl.ticker.MaxNLocator(min_n_ticks=lags, integer=True))
        a.xaxis.grid()

  plot_acf_pacf(ts['MeanValue'])

A function to get the best pdq parameters for 1 zip code

def get_best_params(ans):
 
     df_result = pd.DataFrame(ans[1:], columns=ans[0])
    df_result = df_result.sort_values('AIC', ascending= True).reset_index()
    best_params = df_result.iloc[0]['pdq']
    display(df_result.head())

    return best_params

The last one is a function for graphing a Train, Test, Prediction data as well as having confidence intervals

def graph_sarimax(ts_train, ts_test, output=output):

    start=ts_test.index[0]
    end=ts_test.index[-1]    
    # Getting predictions starting from 2016 and calculate confidence intervals.
    predictions = output.get_prediction(start=start, end= end, dynamic=True)
    ts_pred = predictions.predicted_mean
    ts_conf = predictions.conf_int()  #makes confidence intereval graph

    ts_train.plot(label= 'Training Data')
    ts_test.plot(label= 'Test Data')
    ts_pred.plot(label= 'Prediction')
    plt.fill_between(ts_conf.index, ts_conf.iloc[:,0], ts_conf.iloc[:,1])  #confidence interval
    plt.legend()
    plt.ylabel('Dollar Amounts')
   plt.xlabel('Years')
   plt.title('Time Values')
   plt.show()
  
    return ts_pred

I am hopeful that you can find these examples to help you build your functions for time series coding with Python. I really think that they will provide a really nice start for new coders.