My Template Module Name: James M. Irving Email: james.irving.phd@gmail.com GitHub Profile: https://github.com/jirvingphd

class fsds_100719.jmi.jmi.BlockTimeSeriesSplit(n_splits=5, train_size=None, test_size=None, step_size=None, method='sliding')[source]

Bases: sklearn.model_selection._split._BaseKFold

A variant of sklearn.model_selection.TimeSeriesSplit that keeps train_size and test_size constant across folds. Requires n_splits,train_size,test_size. train_size/test_size can be integer indices or float ratios

split(X, y=None, groups=None)[source]


X ([type]): [description] y ([type], optional): [description]. Defaults to None. groups ([type], optional): [description]. Defaults to None.

class fsds_100719.jmi.jmi.Clock(display_final_time_as_minutes=True, verbose=2)[source]

Bases: object

A clock meant to be used as a timer for functions using local time. Clock.tic() starts the timer, .lap() adds the current laps time to clock._list_lap_times, .toc() stops the timer. If user initiializes with verbose =0, only start and final end times are displays.

If verbose=1, print each lap’s info at the end of each lap. If verbose=2 (default, display instruction line, return datafarme of results.)
toc(label=None, summary=True)[source]

Stop the timer and displays results, appends label to final _list_lap_times entry

class fsds_100719.jmi.jmi.LabelLibrary[source]

Bases: object

A Multi-column version of sklearn LabelEncoder, which fits a LabelEncoder to each column of a df and stores it in the index dictionary where .index[keyword=colname] returns the fit encoder object for that column.

Example: lib =LabelLibrary()

# Be default, lib will fit all columns. lib.fit(df) # Can also specify columns lib.fit(df,columns=[‘A’,’B’])

# Can then transform df_coded = lib.transform(df,[‘A’,’B’]) # Can also use fit_transform df_coded = lib.fit_transform(df,columns=[‘A’,’B’])

# lib.index contains each col’s encoder by col name: col_a_classes = lib.index(‘A’).classes_

fit(df, columns=None)[source]

Creates an encoder object and fits to each columns. Fit encoder is saved in the index dictionary by key=column_name

fit_transform(df, columns=None)[source]
inverse_transform(df, columns=None)[source]
transform(df, columns=None)[source]
class fsds_100719.jmi.jmi.W2vVectorizer(w2v, glove)[source]

Bases: object

From Learn.co Text Classification with Word Embeddings Lab. An sklearn-comaptible class containing the vectors for the fit Word2Vec.

fit(X, y)[source]
fsds_100719.jmi.jmi.add_filtered_col_to_df(df_source, df_to_add_to, list_of_exps, return_filtered_col_names=False)[source]

Takes a dataframe source with columns to copy using df.filter(regexp=(list_of_exps)), with list_of_exps being a list of text expressions to find inside column names.

fsds_100719.jmi.jmi.adf_test(series, title='')[source]

Pass in a time series and an optional title, returns an ADF report # UDEMY COURSE ALTERNATIVE TO STATIONARITY CHECK

fsds_100719.jmi.jmi.apply_stopwords(stopwords_list, text, tokenize=True, return_tokens=False, pattern="([a-zA-Z]+(?:'[a-z]+)?)")[source]

EX: df[‘text_stopped’] = df[‘content’].apply(lambda x: apply_stopwords(stopwords_list,x))

fsds_100719.jmi.jmi.auto_filename_time(prefix='', sep=' ', suffix='', ext='', fname_friendly=True, timeformat='%m-%d-%Y %T')[source]

Generates a filename with a base string + sep+ the current datetime formatted as timeformat. filename = f”{prefix}{sep}{suffix}{sep}{timesuffix}{ext}

fsds_100719.jmi.jmi.big_pandas(user_options=None, verbose=0)[source]

Changes the default pandas display setttings to show all columns and all rows. User may replace settings with a kwd dictionary matching available options.

user_options(dict) : Pandas size parameters for pd.set_options = {
‘display’ : {
‘max_columns’ : None, ‘expand_frame_repr’:False, ‘max_rows’:None, ‘max_info_columns’:500, ‘precision’ : 4,


fsds_100719.jmi.jmi.check_class_balance(df, col='delta_price_class_int', note='', as_percent=True, as_raw=True)[source]
fsds_100719.jmi.jmi.color_scale_columns(df, matplotlib_cmap='Greens', subset=None)[source]

DataFrame Styler: Takes a df, any valid matplotlib colormap column names (matplotlib.org/tutorials/colors/colormaps.html) and returns a dataframe with a gradient colormap applied to column values.

Example: df_styled = color_scale_columns(df,cmap = “YlGn”,subset=[‘Columns’,’to’,’color’])

DataFrame containing columns to style.
Names of columns to color-code.
Any matplotlib colormap. https://matplotlib.org/tutorials/colors/colormaps.html
styled dataframe.

DataFrame Styler: Changes text color to green if value is True Ex: style_df = df.style.applymap(color_true_green)

style_df #to display
fsds_100719.jmi.jmi.compare_word_cloud(text1, label1, text2, label2)[source]

Compares the wordclouds from 2 sets of texts

fsds_100719.jmi.jmi.create_required_folders(full_filenamepath, folder_delim='/', verbose=1)[source]

Accepts a full file name path include folders with ‘/’ as default delimiter. Recursively checks for all sub-folders in filepath and creates those that are missing.

fsds_100719.jmi.jmi.detect_outliers(df, n, features)[source]

Uses Tukey’s method to return outer of interquartile ranges to return indices if outliers in a dataframe. Parameters: df (DataFrame): DataFrame containing columns of features n: default is 0, multiple outlier cutoff

Returns: Index of outliers for .loc

Examples: Outliers_to_drop = detect_outliers(data,2,[“col1”,”col2”]) Returning value df.loc[Outliers_to_drop] # Show the outliers rows data= data.drop(Outliers_to_drop, axis = 0).reset_index(drop=True)

fsds_100719.jmi.jmi.dict_dropdown(dict_to_display, title='Dictionary Contents')[source]

Display the model_params dictionary as a dropdown menu.

fsds_100719.jmi.jmi.disp_df_head_tail(df, n_head=3, n_tail=3, head_capt='df.head', tail_capt='df.tail')[source]

Displays the df.head(n_head) and df.tail(n_tail) and sets captions using df.style


Display the model_params dictionary as a dropdown menu.


Display all input dataframes side by side. Also accept captioned styler df object (df_in = df.style.set_caption(‘caption’) Modified from Source: https://stackoverflow.com/questions/38783027/jupyter-notebook-display-two-pandas-tables-side-by-side

fsds_100719.jmi.jmi.drop_cols(df, list_of_strings_or_regexp, verbose=0)[source]

EDA: Take a df, a list of strings or regular expression and recursively removes all matching column names containing those strings or expressions. # Example: if the df_in columns are [‘price’,’sqft’,’sqft_living’,’sqft15’,’sqft_living15’,’floors’,’bedrooms’] df_out = drop_cols(df_in, [‘sqft’,’bedroom’]) df_out.columns # will output: [‘price’,’floors’]

DF –
Input dataframe to remove columns from.
regex_list –
list of string patterns or regexp to remove.
df_dropped – input df without the dropped columns.

Takes a series and replaces any empty lists with an empty string instead.

fsds_100719.jmi.jmi.evaluate_classification_model(model, X_train, X_test, y_train, y_test, history=None, binary_classes=True, conf_matrix_classes=['Decrease', 'Increase'], normalize_conf_matrix=True, conf_matrix_figsize=(8, 4), save_history=False, history_filename='results/keras_history.png', save_conf_matrix_png=False, conf_mat_filename='results/confusion_matrix.png', save_summary=False, summary_filename='results/model_summary.txt', auto_unique_filenames=True)[source]

Evaluates kera’s model’s performance, plots model’s history,displays classification report, and plots a confusion matrix. conf_matrix_classes are the labels for the matrix. [negative, positive] Returns df of classification report and fig object for confusion matrix’s plot.

fsds_100719.jmi.jmi.evaluate_regression(y_true, y_pred, metrics=None, show_results=False, display_thiels_u_info=False)[source]

Calculates and displays any of the following evaluation metrics: (passed as strings in metrics param) r2, MAE,MSE,RMSE,U if metrics=None:

fsds_100719.jmi.jmi.find_null_idx(df, column=None)[source]

returns the indices of null values found in the series/column. if df is a dataframe and column is none, it returns a dictionary with the column names as a value and null_idx for each column as the values. Example Usage: 1) >> null_idx = get_null_idx(series) >> series_null_removed = series[null_idx] 2) >> null_dict = get_null_idx()

fsds_100719.jmi.jmi.get_attributes(obj, private=False)[source]

Retrieves a list of all non-private attributes (default) from inside of obj. - If private==False: only returns methods whose names do NOT start with a ‘_’

obj (object): Object to retrieve attributes from. private (bool): Whether to retrieve private attributes or public.
list: the names of all of the retrieved attributes.
fsds_100719.jmi.jmi.get_methods(obj, private=False)[source]

Retrieves a list of all non-private methods (default) from inside of obj. - If private==False: only returns methods whose names do NOT start with a ‘_’

obj (object): Object to retrieve methods from. private (bool): Whether to retrieve private methods or public.
list: the names of all of the retrieved methods.
fsds_100719.jmi.jmi.get_methods_attributes_df(obj, include_private=False)[source]

Retrieves all attributes and methods (with docstrings) and returns them in a DataFrame. By default only retrieves non-private methods, unless include_privates==True Args:

obj (object): object to retrieve methods/attributes from include_privates (bool): Whether to include private methods/attributes
Frame: DataFrame with results.
fsds_100719.jmi.jmi.get_time(timeformat='%m-%d-%y_%T%p', raw=False, filename_friendly=False, replacement_seperator='-')[source]

Gets current time in local time zone. if raw: True then raw datetime object returned without formatting. if filename_friendly: replace ‘:’ with replacement_separator

fsds_100719.jmi.jmi.highlight(df, hover_color='gold')[source]

DataFrame Styler: Highlight row when hovering. Accept and valid CSS colorname as hover_color.


DataFrame Styler: Called by highlight to highlight row below cursor. Changes html background color.



fsds_100719.jmi.jmi.html_on(CSS=None, verbose=False)[source]

Applies HTML/CSS styling to all dataframes. ‘CSS’ variable is created by make_CSS() if not supplied. Verbose =True will display the default CSS code used. Any valid CSS key: value pair can be passed.


Ignores all deprecation warnings (future,and pending categories too).

fsds_100719.jmi.jmi.inverse_transform_series(series, scaler)[source]

Takes a series of df column and a fit scaler. Intended for use with make_scaler_library’s dictionary Example Usage: scaler_lib, df_scaled = make_scaler_library(df, transform = True) series_inverse_transformed = inverse_transform_series(df[‘price_data’],scaler_lib[‘price’])


Makes default CSS for html_on function.

fsds_100719.jmi.jmi.make_X_y_timeseries_data(data, x_window=35, verbose=2, as_array=True)[source]

Creates an X and Y time sequence trianing set from a pandas Series. - X_train is a an array with x_window # of samples for each row in X_train - y_train is one value per X_train window: the next time point after the X_window. Verbose determines details printed about the contents and shapes of the data.

# Example Usage: X_train, y_train = make_X_y_timeseries(df[‘price’], x_window= 35) print( X_train[0]]): # returns: arr[X1,X2…X35] print(y_train[0]) # returns X36

fsds_100719.jmi.jmi.make_date_range_slider(start_date, end_date, freq='D')[source]
fsds_100719.jmi.jmi.make_scaler_library(df, transform=False, columns=[])[source]

Takes a df and fits a MinMax scaler to the columns specified (default is to use all columns). Returns a dictionary (scaler_library) with keys = columns, and values = its corresponding fit’s MinMax Scaler

Example Usage: scale_lib, df_scaled = make_scaler_library(df, transform=True)

# to get the inverse_transform of a column with a different name: # use inverse_transform_series scaler = scale_lib[‘price’] # get scaler fit to original column of interest price_column = inverse_transform_series(df[‘price_labels’], scaler) #get the inverse_transformed series back

fsds_100719.jmi.jmi.make_stopwords_list(incl_punc=True, incl_nums=True, add_custom=['http', 'https', '...', '…', '``', 'co', '“', '’', '‘', '”', "n't", "''", 'u', 's', "'s", '|', '\\|', 'amp', "i'm"])[source]
fsds_100719.jmi.jmi.multiplot(df, annot=True, fig_size=None)[source]

EDA: Plots results from df.corr() in a correlation heat map for multicollinearity. Returns fig, ax objects

fsds_100719.jmi.jmi.plot_auc_roc_curve(y_test, y_test_pred)[source]

Takes y_test and y_test_pred from a ML model and uses sklearn roc_curve to plot the AUC-ROC curve.

fsds_100719.jmi.jmi.plot_confusion_matrix(cm, classes=None, normalize=False, cmap=None, title='Confusion Matrix', title_font={'size': 14}, annot_kws={'size': 10, 'weight': 50}, axislabel_font={'size': 14, 'weight': 70}, tick_font={'size': 12, 'weight': 50}, x_rot=45, y_rot=0, fig_kws={'figsize': (5, 5)})[source]

Plots a confusion matrix of either a pre-calculated cm or a tuple of (y_true,y_pred) as cm.

cm (array or tuple): Either a confusion amtrix from sklearn or (y_true,y_pred) tuple classes (list, optional): Names of classes to use. Defaults to integers 0 to len(cm). normalize (bool, optional): Annotate class-percentages instead of counts. Defaults to False. cmap (cmap, optional): colormap to use Defaults to plt.get_cmap(“Blues”). title (str, optional): Plot title. Defaults to ‘Confusion Matrix’. title_font (dict, optional): fontdict for set_title. Defaults to {‘size’:14}. annot_kws (dict, optional): kws for ax.Text annotations. Defaults to {‘size’:10,’weight’:50}. axislabel_font (dict, optional): fontdict for ylabel,xlabel. Defaults to {‘size’:14,’weight’:70}. tick_font (dict, optional): kws for plt.xticks/yticks. Defaults to {‘size’:12,’weight’:50}. x_rot (int, optional): Rotation of x-axis tick labels. Defaults to 45. y_rot (int, optional): Rotation of y-axis tick labels.Defaults to 0. fig_kws (dict, optional): kws for plt.subplots. Defaults to {}.
fig,ax: matplotlib Figure & Axes
fsds_100719.jmi.jmi.plot_decomposition(TS, decomposition, figsize=(12, 8), window_used=None)[source]

Plot the original data and output decomposed components

fsds_100719.jmi.jmi.plot_hist_scat(df, target=None, figsize=(12, 9), fig_style='dark_background', font_dict=None, plot_kwds=None)[source]

EDA: Great summary plots of all columns of a df vs target columne. Shows distplots and regplots for columns im datamframe vs target. Parameters:

df (DataFrame):
DataFrame.describe() columns will be plotted.
target (string):
Name of column containing target variable.assume first column.
figsize (tuple):
Tuple for figsize. Default=(12,9).
Figure style to use (in this context, will not change others in notebook). Default is ‘dark_background’.
A keywork dictionry containing values for font properties under the following keys: - “fontTitle”: font dictioanry for titles , fontAxis, fontTicks

A kew_word dictionary containing any of the following keys for dictionaries containing any valid matplotlib key:value pairs for plotting:

“hist_kws, kde_kws, line_kws,scatter_kws”

Accepts any valid matplotlib key:value pairs passed by searborn to matplotlib. Subplot 1: hist_kws, kde_kws Subplot 2: line_kws,scatter_kws

Figure object.
Subplot axes with format ax[row,col]. Subplot 1 = ax[0,0]; Subplot 2 = ax[0,1]
fsds_100719.jmi.jmi.print_array_info(X, name='Array')[source]

Test function for verifying shapes and data ranges of input arrays

fsds_100719.jmi.jmi.print_docstring_template(style='google', object_type='function', show_url=False, to_clipboard=False)[source]

Prints out docstring template for that is copy/paste ready. May choose ‘google’ or ‘numpy’ style docstrings and templates are available different types (‘class’,’function’,’module_function’).

style (str, optional): Which docstring style to return. Options are ‘google’ and ‘numpy’. Defaults to ‘google’. object_type (str, optional): Which type of template to return. Options are ‘class’,’function’,’module_function’. Defaults to ‘function’. show_url (bool, optional): Whether to display link to reference page for style-type. Defaults to False.
Resets all pandas options back to default state.


Restore the default warnings settings

fsds_100719.jmi.jmi.save_ihelp_menu_to_file(function_list, filename, save_help=False, save_code=True, folder='readme_resources/ihelp_outputs/', as_md=True, as_txt=False, verbose=1)[source]

Accepts a list of functions and uses save_ihelp_to_file with mode=’a’ to combine all outputs. Note: this function REQUIRES a filename

fsds_100719.jmi.jmi.seasonal_decompose_and_plot(ive_df, col='BidClose', freq='H', fill_method='ffill', window=144, model='multiplicative', two_sided=False, plot_components=True)[source]

Perform seasonal_decompose from statsmodels.tsa.seasonal. Plot Output Decomposed Components

fsds_100719.jmi.jmi.thiels_U(ys_true=None, ys_pred=None, display_equation=True, display_table=True)[source]

Calculate’s Thiel’s U metric for forecasting accuracy. Accepts true values and predicted values. Returns Thiel’s U

fsds_100719.jmi.jmi.train_test_val_split(X, y, test_size=0.2, val_size=0.1)[source]

Performs 2 successive train_test_splits to produce a training, testing, and validation dataset

fsds_100719.jmi.jmi.transform_cols_from_library(df, scaler_library, inverse=False, columns=[])[source]

Accepts a df and a scaler_library that was transformed using make_scaler_library. Inverse tansforms listed columns (if columns =[] then all columns) Returns a dataframe with all columns of original df.


Will convert any pixel value of 0 (white) to 255 for wordcloud mask.

fsds_100719.jmi.jmi.undersample_df_to_match_classes(df, class_column='delta_price_class', class_values_to_keep=None, verbose=1)[source]

Resamples (undersamples) input df so that the classes in class_column have equal number of occruances. If class_values_to_keep is None: uses all classes.

fsds_100719.jmi.jmi_WIP module

A collection of functions not yet-ready for the jmi modules

fsds_100719.jmi.jmi_WIP.flat_dict(D, result=None, print_results=True)[source]

Function from Recursive Functions Section of Learn.co v2

D (dict or scalar): The item/list to be tested and unpacked. result (dict, optional): The list to add the contents of L to. Defaults to an empty list. print_results (bool, optional): Controls displaying of output. Defaults to True.
result : flattened list L
fsds_100719.jmi.jmi_WIP.flat_list(L, result=None, print_results=True)[source]

Function from Recursive Functions Section of Learn.co v2

L (list or scalar): The item/list to be tested and unpacked. result (list, optional): The list to add the contents of L to. Defaults to an empty list. print_results (bool, optional): Controls displaying of output. Defaults to True.
result : flattened list L