fsds_100719.jmi package

Submodules

fsds_100719.jmi.jmi module

My Template Module Name: James M. Irving Email: james.irving.phd@gmail.com GitHub Profile: https://github.com/jirvingphd

class fsds_100719.jmi.jmi.BlockTimeSeriesSplit(n_splits=5, train_size=None, test_size=None, step_size=None, method='sliding')[source]

Bases: sklearn.model_selection._split._BaseKFold

A variant of sklearn.model_selection.TimeSeriesSplit that keeps train_size and test_size constant across folds. Requires n_splits,train_size,test_size. train_size/test_size can be integer indices or float ratios

split(X, y=None, groups=None)[source]

[summary]

Args:
X ([type]): [description] y ([type], optional): [description]. Defaults to None. groups ([type], optional): [description]. Defaults to None.
Yields:
[type]: [description]
class fsds_100719.jmi.jmi.Clock(display_final_time_as_minutes=True, verbose=2)[source]

Bases: object

A clock meant to be used as a timer for functions using local time. Clock.tic() starts the timer, .lap() adds the current laps time to clock._list_lap_times, .toc() stops the timer. If user initiializes with verbose =0, only start and final end times are displays.

If verbose=1, print each lap’s info at the end of each lap. If verbose=2 (default, display instruction line, return datafarme of results.)
class datetime(year, month, day[, hour[, minute[, second[, microsecond[, tzinfo]]]]])

Bases: datetime.date

The year, month and day arguments are required. tzinfo may be None, or an instance of a tzinfo subclass. The remaining arguments may be ints.

astimezone()

tz -> convert to local time in new timezone tz

combine()

date, time -> datetime with same date and time fields

ctime()

Return ctime() style string.

date()

Return date object with same year, month and day.

dst()

Return self.tzinfo.dst(self).

fold
fromisoformat()

string -> datetime from datetime.isoformat() output

fromtimestamp()

timestamp[, tz] -> tz’s local time from POSIX timestamp.

hour
isoformat()

[sep] -> string in ISO 8601 format, YYYY-MM-DDT[HH[:MM[:SS[.mmm[uuu]]]]][+HH:MM]. sep is used to separate the year from the time, and defaults to ‘T’. timespec specifies what components of the time to include (allowed values are ‘auto’, ‘hours’, ‘minutes’, ‘seconds’, ‘milliseconds’, and ‘microseconds’).

max = datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)
microsecond
min = datetime.datetime(1, 1, 1, 0, 0)
minute
now()

Returns new datetime object representing current time local to tz.

tz
Timezone object.

If no tz is specified, uses local timezone.

replace()

Return datetime with new specified fields.

resolution = datetime.timedelta(microseconds=1)
second
strptime()

string, format -> new datetime parsed from a string (like time.strptime()).

time()

Return time object with same time but with tzinfo=None.

timestamp()

Return POSIX timestamp as float.

timetuple()

Return time tuple, compatible with time.localtime().

timetz()

Return time object with same time and tzinfo.

tzinfo
tzname()

Return self.tzinfo.tzname(self).

utcfromtimestamp()

Construct a naive UTC datetime from a POSIX timestamp.

utcnow()

Return a new datetime representing UTC day and time.

utcoffset()

Return self.tzinfo.utcoffset(self).

utctimetuple()

Return UTC time tuple, compatible with time.localtime().

get_localzone()

Get the computers configured local timezone, if any.

get_time(local=True)[source]

Returns current time, in local time zone by default (local=True).

lap(label=None)[source]

Records time, duration, and label for current lap. Output display varies with clock verbose level. Calls .mark_lap_list() to document results in clock._list_lap_ times.

mark_lap_list(label=None)[source]

Used internally, appends the current laps’ information when called by .lap() self._lap_times_list_ = [[‘Lap #’ , ‘Start Time’,’Stop Time’, ‘Stop Label’, ‘Duration’]]

summary()[source]

Display dataframe summary table of Clock laps

tic(label=None)[source]

Start the timer and display current time, appends label to the _list_lap_times.

timezone()

Return a datetime.tzinfo implementation for the given timezone

>>> from datetime import datetime, timedelta
>>> utc = timezone('UTC')
>>> eastern = timezone('US/Eastern')
>>> eastern.zone
'US/Eastern'
>>> timezone(unicode('US/Eastern')) is eastern
True
>>> utc_dt = datetime(2002, 10, 27, 6, 0, 0, tzinfo=utc)
>>> loc_dt = utc_dt.astimezone(eastern)
>>> fmt = '%Y-%m-%d %H:%M:%S %Z (%z)'
>>> loc_dt.strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'
>>> (loc_dt - timedelta(minutes=10)).strftime(fmt)
'2002-10-27 00:50:00 EST (-0500)'
>>> eastern.normalize(loc_dt - timedelta(minutes=10)).strftime(fmt)
'2002-10-27 01:50:00 EDT (-0400)'
>>> (loc_dt + timedelta(minutes=10)).strftime(fmt)
'2002-10-27 01:10:00 EST (-0500)'

Raises UnknownTimeZoneError if passed an unknown zone.

>>> try:
...     timezone('Asia/Shangri-La')
... except UnknownTimeZoneError:
...     print('Unknown')
Unknown
>>> try:
...     timezone(unicode('\N{TRADE MARK SIGN}'))
... except UnknownTimeZoneError:
...     print('Unknown')
Unknown
toc(label=None, summary=True)[source]

Stop the timer and displays results, appends label to final _list_lap_times entry

class fsds_100719.jmi.jmi.LabelLibrary[source]

Bases: object

A Multi-column version of sklearn LabelEncoder, which fits a LabelEncoder to each column of a df and stores it in the index dictionary where .index[keyword=colname] returns the fit encoder object for that column.

Example: lib =LabelLibrary()

# Be default, lib will fit all columns. lib.fit(df) # Can also specify columns lib.fit(df,columns=[‘A’,’B’])

# Can then transform df_coded = lib.transform(df,[‘A’,’B’]) # Can also use fit_transform df_coded = lib.fit_transform(df,columns=[‘A’,’B’])

# lib.index contains each col’s encoder by col name: col_a_classes = lib.index(‘A’).classes_

fit(df, columns=None)[source]

Creates an encoder object and fits to each columns. Fit encoder is saved in the index dictionary by key=column_name

fit_transform(df, columns=None)[source]
inverse_transform(df, columns=None)[source]
transform(df, columns=None)[source]
class fsds_100719.jmi.jmi.W2vVectorizer(w2v, glove)[source]

Bases: object

From Learn.co Text Classification with Word Embeddings Lab. An sklearn-comaptible class containing the vectors for the fit Word2Vec.

fit(X, y)[source]
transform(X)[source]
fsds_100719.jmi.jmi.add_filtered_col_to_df(df_source, df_to_add_to, list_of_exps, return_filtered_col_names=False)[source]

Takes a dataframe source with columns to copy using df.filter(regexp=(list_of_exps)), with list_of_exps being a list of text expressions to find inside column names.

fsds_100719.jmi.jmi.adf_test(series, title='')[source]

Pass in a time series and an optional title, returns an ADF report # UDEMY COURSE ALTERNATIVE TO STATIONARITY CHECK

fsds_100719.jmi.jmi.apply_stopwords(stopwords_list, text, tokenize=True, return_tokens=False, pattern="([a-zA-Z]+(?:'[a-z]+)?)")[source]

EX: df[‘text_stopped’] = df[‘content’].apply(lambda x: apply_stopwords(stopwords_list,x))

fsds_100719.jmi.jmi.auto_filename_time(prefix='', sep=' ', suffix='', ext='', fname_friendly=True, timeformat='%m-%d-%Y %T')[source]

Generates a filename with a base string + sep+ the current datetime formatted as timeformat. filename = f”{prefix}{sep}{suffix}{sep}{timesuffix}{ext}

fsds_100719.jmi.jmi.big_pandas(user_options=None, verbose=0)[source]

Changes the default pandas display setttings to show all columns and all rows. User may replace settings with a kwd dictionary matching available options.

Args:
user_options(dict) : Pandas size parameters for pd.set_options = {
‘display’ : {
‘max_columns’ : None, ‘expand_frame_repr’:False, ‘max_rows’:None, ‘max_info_columns’:500, ‘precision’ : 4,

}

fsds_100719.jmi.jmi.check_class_balance(df, col='delta_price_class_int', note='', as_percent=True, as_raw=True)[source]
fsds_100719.jmi.jmi.color_scale_columns(df, matplotlib_cmap='Greens', subset=None)[source]

DataFrame Styler: Takes a df, any valid matplotlib colormap column names (matplotlib.org/tutorials/colors/colormaps.html) and returns a dataframe with a gradient colormap applied to column values.

Example: df_styled = color_scale_columns(df,cmap = “YlGn”,subset=[‘Columns’,’to’,’color’])

df:
DataFrame containing columns to style.
subset:
Names of columns to color-code.
cmap:
Any matplotlib colormap. https://matplotlib.org/tutorials/colors/colormaps.html
df_style:
styled dataframe.
fsds_100719.jmi.jmi.color_true_green(val)[source]

DataFrame Styler: Changes text color to green if value is True Ex: style_df = df.style.applymap(color_true_green)

style_df #to display
fsds_100719.jmi.jmi.compare_word_cloud(text1, label1, text2, label2)[source]

Compares the wordclouds from 2 sets of texts

fsds_100719.jmi.jmi.create_required_folders(full_filenamepath, folder_delim='/', verbose=1)[source]

Accepts a full file name path include folders with ‘/’ as default delimiter. Recursively checks for all sub-folders in filepath and creates those that are missing.

fsds_100719.jmi.jmi.detect_outliers(df, n, features)[source]

Uses Tukey’s method to return outer of interquartile ranges to return indices if outliers in a dataframe. Parameters: df (DataFrame): DataFrame containing columns of features n: default is 0, multiple outlier cutoff

Returns: Index of outliers for .loc

Examples: Outliers_to_drop = detect_outliers(data,2,[“col1”,”col2”]) Returning value df.loc[Outliers_to_drop] # Show the outliers rows data= data.drop(Outliers_to_drop, axis = 0).reset_index(drop=True)

fsds_100719.jmi.jmi.dict_dropdown(dict_to_display, title='Dictionary Contents')[source]

Display the model_params dictionary as a dropdown menu.

fsds_100719.jmi.jmi.disp_df_head_tail(df, n_head=3, n_tail=3, head_capt='df.head', tail_capt='df.tail')[source]

Displays the df.head(n_head) and df.tail(n_tail) and sets captions using df.style

fsds_100719.jmi.jmi.display_dict_dropdown(dict_to_display)[source]

Display the model_params dictionary as a dropdown menu.

fsds_100719.jmi.jmi.display_side_by_side(*args)[source]

Display all input dataframes side by side. Also accept captioned styler df object (df_in = df.style.set_caption(‘caption’) Modified from Source: https://stackoverflow.com/questions/38783027/jupyter-notebook-display-two-pandas-tables-side-by-side

fsds_100719.jmi.jmi.drop_cols(df, list_of_strings_or_regexp, verbose=0)[source]

EDA: Take a df, a list of strings or regular expression and recursively removes all matching column names containing those strings or expressions. # Example: if the df_in columns are [‘price’,’sqft’,’sqft_living’,’sqft15’,’sqft_living15’,’floors’,’bedrooms’] df_out = drop_cols(df_in, [‘sqft’,’bedroom’]) df_out.columns # will output: [‘price’,’floors’]

Parameters:
DF –
Input dataframe to remove columns from.
regex_list –
list of string patterns or regexp to remove.
Returns:
df_dropped – input df without the dropped columns.
fsds_100719.jmi.jmi.empty_lists_to_strings(x)[source]

Takes a series and replaces any empty lists with an empty string instead.

fsds_100719.jmi.jmi.evaluate_classification_model(model, X_train, X_test, y_train, y_test, history=None, binary_classes=True, conf_matrix_classes=['Decrease', 'Increase'], normalize_conf_matrix=True, conf_matrix_figsize=(8, 4), save_history=False, history_filename='results/keras_history.png', save_conf_matrix_png=False, conf_mat_filename='results/confusion_matrix.png', save_summary=False, summary_filename='results/model_summary.txt', auto_unique_filenames=True)[source]

Evaluates kera’s model’s performance, plots model’s history,displays classification report, and plots a confusion matrix. conf_matrix_classes are the labels for the matrix. [negative, positive] Returns df of classification report and fig object for confusion matrix’s plot.

fsds_100719.jmi.jmi.evaluate_regression(y_true, y_pred, metrics=None, show_results=False, display_thiels_u_info=False)[source]

Calculates and displays any of the following evaluation metrics: (passed as strings in metrics param) r2, MAE,MSE,RMSE,U if metrics=None:

metrics=[‘r2’,’RMSE’,’U’]
fsds_100719.jmi.jmi.find_null_idx(df, column=None)[source]

returns the indices of null values found in the series/column. if df is a dataframe and column is none, it returns a dictionary with the column names as a value and null_idx for each column as the values. Example Usage: 1) >> null_idx = get_null_idx(series) >> series_null_removed = series[null_idx] 2) >> null_dict = get_null_idx()

fsds_100719.jmi.jmi.get_attributes(obj, private=False)[source]

Retrieves a list of all non-private attributes (default) from inside of obj. - If private==False: only returns methods whose names do NOT start with a ‘_’

Args:
obj (object): Object to retrieve attributes from. private (bool): Whether to retrieve private attributes or public.
Returns:
list: the names of all of the retrieved attributes.
fsds_100719.jmi.jmi.get_methods(obj, private=False)[source]

Retrieves a list of all non-private methods (default) from inside of obj. - If private==False: only returns methods whose names do NOT start with a ‘_’

Args:
obj (object): Object to retrieve methods from. private (bool): Whether to retrieve private methods or public.
Returns:
list: the names of all of the retrieved methods.
fsds_100719.jmi.jmi.get_methods_attributes_df(obj, include_private=False)[source]

Retrieves all attributes and methods (with docstrings) and returns them in a DataFrame. By default only retrieves non-private methods, unless include_privates==True Args:

obj (object): object to retrieve methods/attributes from include_privates (bool): Whether to include private methods/attributes
Returns:
Frame: DataFrame with results.
fsds_100719.jmi.jmi.get_time(timeformat='%m-%d-%y_%T%p', raw=False, filename_friendly=False, replacement_seperator='-')[source]

Gets current time in local time zone. if raw: True then raw datetime object returned without formatting. if filename_friendly: replace ‘:’ with replacement_separator

fsds_100719.jmi.jmi.highlight(df, hover_color='gold')[source]

DataFrame Styler: Highlight row when hovering. Accept and valid CSS colorname as hover_color.

fsds_100719.jmi.jmi.hover(hover_color='gold')[source]

DataFrame Styler: Called by highlight to highlight row below cursor. Changes html background color.

Parameters:

hover_Color

fsds_100719.jmi.jmi.html_off()[source]
fsds_100719.jmi.jmi.html_on(CSS=None, verbose=False)[source]

Applies HTML/CSS styling to all dataframes. ‘CSS’ variable is created by make_CSS() if not supplied. Verbose =True will display the default CSS code used. Any valid CSS key: value pair can be passed.

fsds_100719.jmi.jmi.ignore_warnings()[source]

Ignores all deprecation warnings (future,and pending categories too).

fsds_100719.jmi.jmi.inverse_transform_series(series, scaler)[source]

Takes a series of df column and a fit scaler. Intended for use with make_scaler_library’s dictionary Example Usage: scaler_lib, df_scaled = make_scaler_library(df, transform = True) series_inverse_transformed = inverse_transform_series(df[‘price_data’],scaler_lib[‘price’])

fsds_100719.jmi.jmi.is_var(name)[source]
fsds_100719.jmi.jmi.make_CSS(show=False)[source]

Makes default CSS for html_on function.

fsds_100719.jmi.jmi.make_X_y_timeseries_data(data, x_window=35, verbose=2, as_array=True)[source]

Creates an X and Y time sequence trianing set from a pandas Series. - X_train is a an array with x_window # of samples for each row in X_train - y_train is one value per X_train window: the next time point after the X_window. Verbose determines details printed about the contents and shapes of the data.

# Example Usage: X_train, y_train = make_X_y_timeseries(df[‘price’], x_window= 35) print( X_train[0]]): # returns: arr[X1,X2…X35] print(y_train[0]) # returns X36

fsds_100719.jmi.jmi.make_date_range_slider(start_date, end_date, freq='D')[source]
fsds_100719.jmi.jmi.make_scaler_library(df, transform=False, columns=[])[source]

Takes a df and fits a MinMax scaler to the columns specified (default is to use all columns). Returns a dictionary (scaler_library) with keys = columns, and values = its corresponding fit’s MinMax Scaler

Example Usage: scale_lib, df_scaled = make_scaler_library(df, transform=True)

# to get the inverse_transform of a column with a different name: # use inverse_transform_series scaler = scale_lib[‘price’] # get scaler fit to original column of interest price_column = inverse_transform_series(df[‘price_labels’], scaler) #get the inverse_transformed series back

fsds_100719.jmi.jmi.make_stopwords_list(incl_punc=True, incl_nums=True, add_custom=['http', 'https', '...', '…', '``', 'co', '“', '’', '‘', '”', "n't", "''", 'u', 's', "'s", '|', '\\|', 'amp', "i'm"])[source]
fsds_100719.jmi.jmi.multiplot(df, annot=True, fig_size=None)[source]

EDA: Plots results from df.corr() in a correlation heat map for multicollinearity. Returns fig, ax objects

fsds_100719.jmi.jmi.open_image_mask(filename)[source]
fsds_100719.jmi.jmi.plot_auc_roc_curve(y_test, y_test_pred)[source]

Takes y_test and y_test_pred from a ML model and uses sklearn roc_curve to plot the AUC-ROC curve.

fsds_100719.jmi.jmi.plot_confusion_matrix(cm, classes=None, normalize=False, cmap=None, title='Confusion Matrix', title_font={'size': 14}, annot_kws={'size': 10, 'weight': 50}, axislabel_font={'size': 14, 'weight': 70}, tick_font={'size': 12, 'weight': 50}, x_rot=45, y_rot=0, fig_kws={'figsize': (5, 5)})[source]

Plots a confusion matrix of either a pre-calculated cm or a tuple of (y_true,y_pred) as cm.

Args:
cm (array or tuple): Either a confusion amtrix from sklearn or (y_true,y_pred) tuple classes (list, optional): Names of classes to use. Defaults to integers 0 to len(cm). normalize (bool, optional): Annotate class-percentages instead of counts. Defaults to False. cmap (cmap, optional): colormap to use Defaults to plt.get_cmap(“Blues”). title (str, optional): Plot title. Defaults to ‘Confusion Matrix’. title_font (dict, optional): fontdict for set_title. Defaults to {‘size’:14}. annot_kws (dict, optional): kws for ax.Text annotations. Defaults to {‘size’:10,’weight’:50}. axislabel_font (dict, optional): fontdict for ylabel,xlabel. Defaults to {‘size’:14,’weight’:70}. tick_font (dict, optional): kws for plt.xticks/yticks. Defaults to {‘size’:12,’weight’:50}. x_rot (int, optional): Rotation of x-axis tick labels. Defaults to 45. y_rot (int, optional): Rotation of y-axis tick labels.Defaults to 0. fig_kws (dict, optional): kws for plt.subplots. Defaults to {}.
Returns:
fig,ax: matplotlib Figure & Axes
fsds_100719.jmi.jmi.plot_decomposition(TS, decomposition, figsize=(12, 8), window_used=None)[source]

Plot the original data and output decomposed components

fsds_100719.jmi.jmi.plot_hist_scat(df, target=None, figsize=(12, 9), fig_style='dark_background', font_dict=None, plot_kwds=None)[source]

EDA: Great summary plots of all columns of a df vs target columne. Shows distplots and regplots for columns im datamframe vs target. Parameters:

df (DataFrame):
DataFrame.describe() columns will be plotted.
target (string):
Name of column containing target variable.assume first column.
figsize (tuple):
Tuple for figsize. Default=(12,9).
fig_style:
Figure style to use (in this context, will not change others in notebook). Default is ‘dark_background’.
font_dict:
A keywork dictionry containing values for font properties under the following keys: - “fontTitle”: font dictioanry for titles , fontAxis, fontTicks
**plot_kwds:

A kew_word dictionary containing any of the following keys for dictionaries containing any valid matplotlib key:value pairs for plotting:

“hist_kws, kde_kws, line_kws,scatter_kws”

Accepts any valid matplotlib key:value pairs passed by searborn to matplotlib. Subplot 1: hist_kws, kde_kws Subplot 2: line_kws,scatter_kws

Returns:
fig:
Figure object.
ax:
Subplot axes with format ax[row,col]. Subplot 1 = ax[0,0]; Subplot 2 = ax[0,1]
fsds_100719.jmi.jmi.print_array_info(X, name='Array')[source]

Test function for verifying shapes and data ranges of input arrays

fsds_100719.jmi.jmi.print_docstring_template(style='google', object_type='function', show_url=False, to_clipboard=False)[source]

Prints out docstring template for that is copy/paste ready. May choose ‘google’ or ‘numpy’ style docstrings and templates are available different types (‘class’,’function’,’module_function’).

Args:
style (str, optional): Which docstring style to return. Options are ‘google’ and ‘numpy’. Defaults to ‘google’. object_type (str, optional): Which type of template to return. Options are ‘class’,’function’,’module_function’. Defaults to ‘function’. show_url (bool, optional): Whether to display link to reference page for style-type. Defaults to False.
Returns:
[type]: [description]
fsds_100719.jmi.jmi.reset_pandas()[source]

Resets all pandas options back to default state.

fsds_100719.jmi.jmi.reset_warnings()[source]

Restore the default warnings settings

fsds_100719.jmi.jmi.save_ihelp_menu_to_file(function_list, filename, save_help=False, save_code=True, folder='readme_resources/ihelp_outputs/', as_md=True, as_txt=False, verbose=1)[source]

Accepts a list of functions and uses save_ihelp_to_file with mode=’a’ to combine all outputs. Note: this function REQUIRES a filename

fsds_100719.jmi.jmi.seasonal_decompose_and_plot(ive_df, col='BidClose', freq='H', fill_method='ffill', window=144, model='multiplicative', two_sided=False, plot_components=True)[source]

Perform seasonal_decompose from statsmodels.tsa.seasonal. Plot Output Decomposed Components

fsds_100719.jmi.jmi.thiels_U(ys_true=None, ys_pred=None, display_equation=True, display_table=True)[source]

Calculate’s Thiel’s U metric for forecasting accuracy. Accepts true values and predicted values. Returns Thiel’s U

fsds_100719.jmi.jmi.train_test_val_split(X, y, test_size=0.2, val_size=0.1)[source]

Performs 2 successive train_test_splits to produce a training, testing, and validation dataset

fsds_100719.jmi.jmi.transform_cols_from_library(df, scaler_library, inverse=False, columns=[])[source]

Accepts a df and a scaler_library that was transformed using make_scaler_library. Inverse tansforms listed columns (if columns =[] then all columns) Returns a dataframe with all columns of original df.

fsds_100719.jmi.jmi.transform_image_mask_white(val)[source]

Will convert any pixel value of 0 (white) to 255 for wordcloud mask.

fsds_100719.jmi.jmi.undersample_df_to_match_classes(df, class_column='delta_price_class', class_values_to_keep=None, verbose=1)[source]

Resamples (undersamples) input df so that the classes in class_column have equal number of occruances. If class_values_to_keep is None: uses all classes.

fsds_100719.jmi.jmi_WIP module

A collection of functions not yet-ready for the jmi modules

fsds_100719.jmi.jmi_WIP.flat_dict(D, result=None, print_results=True)[source]

Function from Recursive Functions Section of Learn.co v2

Args:
D (dict or scalar): The item/list to be tested and unpacked. result (dict, optional): The list to add the contents of L to. Defaults to an empty list. print_results (bool, optional): Controls displaying of output. Defaults to True.
Returns:
result : flattened list L
fsds_100719.jmi.jmi_WIP.flat_list(L, result=None, print_results=True)[source]

Function from Recursive Functions Section of Learn.co v2

Args:
L (list or scalar): The item/list to be tested and unpacked. result (list, optional): The list to add the contents of L to. Defaults to an empty list. print_results (bool, optional): Controls displaying of output. Defaults to True.
Returns:
result : flattened list L