Welcome to fsds_100719’s documentation!

Installation

To install fsds_100719, run this command in your terminal:

# In terminal:
$ pip install -U fsds_100719

# In Jupyter Notebook / Learn.co
$ !pip install -U fsds_100719

This is the preferred method to install fsds_100719, as it will always install the most recent stable release.

If you don’t have pip installed, this Python installation guide can guide you through the process.

Usage

To use fsds_100719 in a project:

import fsds_100719 as fs

To import common modules as their usual handles e.g. pandas as pd, numpy as np,etc.:

from fsds_100719.imports import *

Functions worth importing by name:

# To easily inspect help and source code
from fsds_100719 import ihelp

#If you're import funcs from a local file.
from fsds_100719 import reload

You can load just your cohort or your own module as fs:

import fsds_100719.ft.jirvingphd as fs
# or
import fsds_100719.ft as fs

fsds_100719.ds package

A shared collection of tools for general use.

fsds_100719.ds.add_dir_to_path(abs_path=None, rel_path=None, verbose=True)[source]

Adds the provided path (or current directory if None provided) to sys.path.

Args:
path (str): folder to add to path (May need to be absolute). rel_path (str): relative folder path to be converted to absolute and added. verbose (bool): Controls display of success/failure messages. Default =True
fsds_100719.ds.arr2series(array, series_index=None, series_name='array')[source]

Converts an array into a named series.

Args:

array (numpy array): Array to transform. series_index (list, optional): List of values to be used as index.

Defaults to None, a numerical index.

series_name (str, optional): Name for series. Defaults to ‘array’.

Returns:
converted_array: Pandas Series with the name and index specified.
fsds_100719.ds.capture_text(txt)[source]

Uses StringIO and sys.stdout to capture print statements.

Args:
txt (str): pass string or command to display a string to capture
Returns:
txt_out (str): captured print statement
fsds_100719.ds.check_column(panda_obj, columns=None, nlargest='all')[source]

Prints column name, dataype, # and % of null values, and unique values for the nlargest # of rows (by valuecount_. it will only print results for those columns ******** Params: panda_object: pandas DataFrame or Series columns: list containing names of columns (strings)

Returns: None
prints values only
fsds_100719.ds.check_df_for_columns(df, columns=None)[source]

Checks df for presence of columns.

df: pd.DataFrame to find columns in columns: str or list of str. column names

fsds_100719.ds.check_null(df, columns=None, show_df=False)[source]

Iterates through columns and checks for null values and displays # and % of column. Params: ************** df: pandas DataFrame

columns: list of columns to check ******> Returns: displayed dataframe

fsds_100719.ds.check_numeric(df, columns=None, unique_check=False, return_list=False, show_df=False)[source]

Iterates through columns and checks for possible numeric features labeled as objects. Params: ************** df: pandas DataFrame

unique_check: bool. (default=True)
If true, distplays interactive interface for checking unique values in columns.
return_list: bool, (default=False)
If True, returns a list of column names with possible numeric types.

******> Returns: dataframe displayed (always), list of column names if return_list=True

fsds_100719.ds.check_unique(df, columns=None)[source]

Prints unique values for all columns in dataframe. If passed list of columns, it will only print results for those columns 8************ > Params: df: pandas DataFrame, or pd.Series columns: list containing names of columns (strings)

Returns: None
prints values only
fsds_100719.ds.column_report(df, index_col=None, sort_column='iloc', ascending=True, interactive=False, return_df=False)[source]

Displays a DataFrame summary of each column’s: - name, iloc, dtypes, null value count & %, # of 0’s, min, max,med,mean, etc

Args:
df (DataFrame): df to report index_col (column to set as index, str): Defaults to None. sort_column (str, optional): [description]. Defaults to ‘iloc’. ascending (bool, optional): [description]. Defaults to True. as_df (bool, optional): [description]. Defaults to False. interactive (bool, optional): [description]. Defaults to False. return_df (bool, optional): [description]. Defaults to False.
Returns:
column_report (df): Non-styled version of displayed df report
fsds_100719.ds.column_report_qgrid(df, index_col=None, sort_column='iloc', ascending=True, format_dict=None, as_df=False, as_interactive_df=False, show_and_return=True, as_qgrid=True, qgrid_options=None, qgrid_column_options=None, qgrid_col_defs=None, qgrid_callback=None)[source]

Returns a datafarme summary of the columns, their dtype, a summary dataframe with the column name, column dtypes, and a decision_map dictionary of datatype. [!] Please note if qgrid does not display properly, enter this into your terminal and restart your temrinal.

‘jupyter nbextension enable –py –sys-prefix qgrid’# required for qgrid ‘jupyter nbextension enable –py –sys-prefix widgetsnbextension’ # only required if you have not enabled the ipywidgets nbextension yet
Default qgrid options:
default_grid_options={

# SlickGrid options ‘fullWidthRows’: True, ‘syncColumnCellResize’: True, ‘forceFitColumns’: True, ‘defaultColumnWidth’: 50, ‘rowHeight’: 25, ‘enableColumnReorder’: True, ‘enableTextSelectionOnCells’: True, ‘editable’: True, ‘autoEdit’: False, ‘explicitInitialization’: True,

# Qgrid options ‘maxVisibleRows’: 30, ‘minVisibleRows’: 8, ‘sortable’: True, ‘filterable’: True, ‘highlightSelectedCell’: True, ‘highlightSelectedRow’: True

}

fsds_100719.ds.compare_duplicates(df1, df2, to_drop=True, verbose=True, return_names_list=False)[source]

Compare two dfs for duplicate columns, drop if to_drop=True, useful to us before concatenating when dtypes are different between matching column names and df.drop_duplicates is not an option. Params: ——————– df1, df2 : pandas dataframe suspected of having matching columns to_drop : bool, (default=True)

If True will give the option of dropping columns one at a time from either column.
verbose: bool (default=True)
If True prints column names and types, set to false and return_names list=True if only desire a list of column names and no interactive interface.
return_names_list: bool (default=False),
If True, will return a list of all duplicate column names.

Returns: List of column names if return_names_list=True, else nothing.

fsds_100719.ds.display_side_by_side(*args)[source]

Display all input dataframes side by side. Also accept captioned styler df object (df_in = df.style.set_caption(‘caption’) Modified from Source: https://stackoverflow.com/questions/38783027/jupyter-notebook-display-two-pandas-tables-side-by-side

fsds_100719.ds.find_outliers_zscore(col)[source]

Use scipy to calcualte absoliute Z-scores and return boolean series where True indicates it is an outlier Args:

col (Series): a series/column from your DataFrame
Returns:
idx_outliers (Series): series of True/False for each row in col

Ex: >> idx_outs = find_outliers(df[‘bedrooms’]) >> df_clean = df.loc[idx_outs==False]

fsds_100719.ds.get_source_code_markdown(function)[source]

Retrieves the source code as a string and appends the markdown python syntax notation

fsds_100719.ds.ihelp(function_or_mod, show_help=True, show_code=True, return_code=False, markdown=True, file_location=False)[source]

Call on any module or functon to display the object’s help command printout AND/OR soruce code displayed as Markdown using Python-syntax

fsds_100719.ds.ihelp_menu(function_list, box_style='warning', to_embed=False)[source]

Creates a widget menu of the source code and and help documentation of the functions in function_list.

Args:
function_list (list): list of function object or string names of loaded function. to_embed (bool, optional): Returns interface (layout,output) if True. Defaults to False. to_file (bool, optional): Save . Defaults to False. json_file (str, optional): [description]. Defaults to ‘ihelp_output.txt’.
Returns:
full_layout (ipywidgets GridBox): Layout of interface. output ()
fsds_100719.ds.inspect_df(df, n_rows=3, verbose=True)[source]

EDA: Show all pandas inspection tables. Displays df.head(), df.info(), df.describe(). By default also runs check_null and check_numeric to inspect columns for null values and to check string columns to detect numeric values. (If verbose==True) Parameters:

df(dataframe):
dataframe to inspect
n_rows:
number of header rows to show (Default=3).
verbose:
If verbose==True (default), check_null and check_numeric.

Ex: inspect_df(df,n_rows=4)

fsds_100719.ds.inspect_variables(local_vars=None, sort_col='size', exclude_funcs_mods=True, top_n=10, return_df=False, always_display=True, show_how_to_delete=False, print_names=False)[source]

Displays a dataframe of all variables and their size in memory, with the largest variables at the top.

Args:
local_vars (locals(): Must call locals() as first argument. sort_col (str, optional): column to sort by. Defaults to ‘size’. top_n (int, optional): how many vars to show. Defaults to 10. return_df (bool, optional): If True, return df instead of just showing df.Defaults to False. always_display (bool, optional): Display df even if returned. Defaults to True. show_how_to_delete (bool, optional): Prints out code to copy-paste into cell to del vars. Defaults to False. print_names (bool, optional): [description]. Defaults to False.
Raises:
Exception: if locals() not passed as first arg

Example Usage: # Must pass in local variables >> inspect_variables(locals()) # To see command to delete list of vars” >> inspect_variables(locals(),show_how_to_delete=True)

fsds_100719.ds.is_var(name)[source]
fsds_100719.ds.list2df(list, index_col=None, caption=None, return_df=True, df_kwds={})[source]

Quick turn an appened list with a header (row[0]) into a pretty dataframe.

Args
list (list of lists): index_col (string): name of column to set as index; None (Default) has integer index. set_caption (string): show_and_return (bool):

EXAMPLE USE: >> list_results = [[“Test”,”N”,”p-val”]]

# … run test and append list of result values …

>> list_results.append([test_Name,length(data),p])

## Displays styled dataframe if caption: >> df = list2df(list_results, index_col=”Test”,

set_caption=”Stat Test for Significance”)
fsds_100719.ds.reload(mod)[source]
Reloads the module from file without restarting kernel.
Args:
mod (loaded mod or list of mod objects): name or handle of package (i.e.,[ pd, fs,np])
Returns:
reload each model.

Example: # You pass in whatever name you imported as. import my_functions_from_file as mf # after editing the source file: # mf.reload(mf)

fsds_100719.ds.save_ihelp_to_file(function, save_help=False, save_code=True, as_md=False, as_txt=True, folder='readme_resources/ihelp_outputs/', filename=None, file_mode='w')[source]

Saves the string representation of the ihelp source code as markdown. Filename should NOT have an extension. .txt or .md will be added based on as_md/as_txt. If filename is None, function name is used.

fsds_100719.ds.show_del_me_code(called_by_inspect_vars=False)[source]

Prints code to copy and paste into a cell to delete vars using a list of their names. Companion function inspect_variables(locals(),print_names=True) will provide var names tocopy/paste

fsds_100719.ds.show_off_vs_code()[source]

fsds_100719.ft package

A collection of submodules by online-ds-ft-100719. Maintained by James Irving (GitHub: jirvingphd) james.irving@flatironschool.com

fsds_100719.pt package

A collection of submodules by online-ds-pt-100719. Maintained by James Irving (GitHub: jirvingphd) james.irving@flatironschool.com

fsds_100719.pt.placeholder2()[source]

fsds_100719.jmi package

Indices and tables