Welcome to fsds_100719’s documentation!¶
Installation¶
To install fsds_100719, run this command in your terminal:
# In terminal:
$ pip install -U fsds_100719
# In Jupyter Notebook / Learn.co
$ !pip install -U fsds_100719
This is the preferred method to install fsds_100719, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
Usage¶
To use fsds_100719 in a project:
import fsds_100719 as fs
To import common modules as their usual handles e.g. pandas as pd, numpy as np,etc.:
from fsds_100719.imports import *
Functions worth importing by name:
# To easily inspect help and source code
from fsds_100719 import ihelp
#If you're import funcs from a local file.
from fsds_100719 import reload
You can load just your cohort or your own module as fs:
import fsds_100719.ft.jirvingphd as fs
# or
import fsds_100719.ft as fs
fsds_100719.ds package¶
A shared collection of tools for general use.
-
fsds_100719.ds.
add_dir_to_path
(abs_path=None, rel_path=None, verbose=True)[source]¶ Adds the provided path (or current directory if None provided) to sys.path.
- Args:
- path (str): folder to add to path (May need to be absolute). rel_path (str): relative folder path to be converted to absolute and added. verbose (bool): Controls display of success/failure messages. Default =True
-
fsds_100719.ds.
arr2series
(array, series_index=None, series_name='array')[source]¶ Converts an array into a named series.
- Args:
array (numpy array): Array to transform. series_index (list, optional): List of values to be used as index.
Defaults to None, a numerical index.series_name (str, optional): Name for series. Defaults to ‘array’.
- Returns:
- converted_array: Pandas Series with the name and index specified.
-
fsds_100719.ds.
capture_text
(txt)[source]¶ Uses StringIO and sys.stdout to capture print statements.
- Args:
- txt (str): pass string or command to display a string to capture
- Returns:
- txt_out (str): captured print statement
-
fsds_100719.ds.
check_column
(panda_obj, columns=None, nlargest='all')[source]¶ Prints column name, dataype, # and % of null values, and unique values for the nlargest # of rows (by valuecount_. it will only print results for those columns ******** Params: panda_object: pandas DataFrame or Series columns: list containing names of columns (strings)
- Returns: None
- prints values only
-
fsds_100719.ds.
check_df_for_columns
(df, columns=None)[source]¶ Checks df for presence of columns.
df: pd.DataFrame to find columns in columns: str or list of str. column names
-
fsds_100719.ds.
check_null
(df, columns=None, show_df=False)[source]¶ Iterates through columns and checks for null values and displays # and % of column. Params: ************** df: pandas DataFrame
columns: list of columns to check ******> Returns: displayed dataframe
-
fsds_100719.ds.
check_numeric
(df, columns=None, unique_check=False, return_list=False, show_df=False)[source]¶ Iterates through columns and checks for possible numeric features labeled as objects. Params: ************** df: pandas DataFrame
- unique_check: bool. (default=True)
- If true, distplays interactive interface for checking unique values in columns.
- return_list: bool, (default=False)
- If True, returns a list of column names with possible numeric types.
******> Returns: dataframe displayed (always), list of column names if return_list=True
-
fsds_100719.ds.
check_unique
(df, columns=None)[source]¶ Prints unique values for all columns in dataframe. If passed list of columns, it will only print results for those columns 8************ > Params: df: pandas DataFrame, or pd.Series columns: list containing names of columns (strings)
- Returns: None
- prints values only
-
fsds_100719.ds.
column_report
(df, index_col=None, sort_column='iloc', ascending=True, interactive=False, return_df=False)[source]¶ Displays a DataFrame summary of each column’s: - name, iloc, dtypes, null value count & %, # of 0’s, min, max,med,mean, etc
- Args:
- df (DataFrame): df to report index_col (column to set as index, str): Defaults to None. sort_column (str, optional): [description]. Defaults to ‘iloc’. ascending (bool, optional): [description]. Defaults to True. as_df (bool, optional): [description]. Defaults to False. interactive (bool, optional): [description]. Defaults to False. return_df (bool, optional): [description]. Defaults to False.
- Returns:
- column_report (df): Non-styled version of displayed df report
-
fsds_100719.ds.
column_report_qgrid
(df, index_col=None, sort_column='iloc', ascending=True, format_dict=None, as_df=False, as_interactive_df=False, show_and_return=True, as_qgrid=True, qgrid_options=None, qgrid_column_options=None, qgrid_col_defs=None, qgrid_callback=None)[source]¶ Returns a datafarme summary of the columns, their dtype, a summary dataframe with the column name, column dtypes, and a decision_map dictionary of datatype. [!] Please note if qgrid does not display properly, enter this into your terminal and restart your temrinal.
‘jupyter nbextension enable –py –sys-prefix qgrid’# required for qgrid ‘jupyter nbextension enable –py –sys-prefix widgetsnbextension’ # only required if you have not enabled the ipywidgets nbextension yet- Default qgrid options:
- default_grid_options={
# SlickGrid options ‘fullWidthRows’: True, ‘syncColumnCellResize’: True, ‘forceFitColumns’: True, ‘defaultColumnWidth’: 50, ‘rowHeight’: 25, ‘enableColumnReorder’: True, ‘enableTextSelectionOnCells’: True, ‘editable’: True, ‘autoEdit’: False, ‘explicitInitialization’: True,
# Qgrid options ‘maxVisibleRows’: 30, ‘minVisibleRows’: 8, ‘sortable’: True, ‘filterable’: True, ‘highlightSelectedCell’: True, ‘highlightSelectedRow’: True
}
-
fsds_100719.ds.
compare_duplicates
(df1, df2, to_drop=True, verbose=True, return_names_list=False)[source]¶ Compare two dfs for duplicate columns, drop if to_drop=True, useful to us before concatenating when dtypes are different between matching column names and df.drop_duplicates is not an option. Params: ——————– df1, df2 : pandas dataframe suspected of having matching columns to_drop : bool, (default=True)
If True will give the option of dropping columns one at a time from either column.- verbose: bool (default=True)
- If True prints column names and types, set to false and return_names list=True if only desire a list of column names and no interactive interface.
- return_names_list: bool (default=False),
- If True, will return a list of all duplicate column names.
Returns: List of column names if return_names_list=True, else nothing.
-
fsds_100719.ds.
display_side_by_side
(*args)[source]¶ Display all input dataframes side by side. Also accept captioned styler df object (df_in = df.style.set_caption(‘caption’) Modified from Source: https://stackoverflow.com/questions/38783027/jupyter-notebook-display-two-pandas-tables-side-by-side
-
fsds_100719.ds.
find_outliers_zscore
(col)[source]¶ Use scipy to calcualte absoliute Z-scores and return boolean series where True indicates it is an outlier Args:
col (Series): a series/column from your DataFrame- Returns:
- idx_outliers (Series): series of True/False for each row in col
Ex: >> idx_outs = find_outliers(df[‘bedrooms’]) >> df_clean = df.loc[idx_outs==False]
-
fsds_100719.ds.
get_source_code_markdown
(function)[source]¶ Retrieves the source code as a string and appends the markdown python syntax notation
-
fsds_100719.ds.
ihelp
(function_or_mod, show_help=True, show_code=True, return_code=False, markdown=True, file_location=False)[source]¶ Call on any module or functon to display the object’s help command printout AND/OR soruce code displayed as Markdown using Python-syntax
Creates a widget menu of the source code and and help documentation of the functions in function_list.
- Args:
- function_list (list): list of function object or string names of loaded function. to_embed (bool, optional): Returns interface (layout,output) if True. Defaults to False. to_file (bool, optional): Save . Defaults to False. json_file (str, optional): [description]. Defaults to ‘ihelp_output.txt’.
- Returns:
- full_layout (ipywidgets GridBox): Layout of interface. output ()
-
fsds_100719.ds.
inspect_df
(df, n_rows=3, verbose=True)[source]¶ EDA: Show all pandas inspection tables. Displays df.head(), df.info(), df.describe(). By default also runs check_null and check_numeric to inspect columns for null values and to check string columns to detect numeric values. (If verbose==True) Parameters:
- df(dataframe):
- dataframe to inspect
- n_rows:
- number of header rows to show (Default=3).
- verbose:
- If verbose==True (default), check_null and check_numeric.
Ex: inspect_df(df,n_rows=4)
-
fsds_100719.ds.
inspect_variables
(local_vars=None, sort_col='size', exclude_funcs_mods=True, top_n=10, return_df=False, always_display=True, show_how_to_delete=False, print_names=False)[source]¶ Displays a dataframe of all variables and their size in memory, with the largest variables at the top.
- Args:
- local_vars (locals(): Must call locals() as first argument. sort_col (str, optional): column to sort by. Defaults to ‘size’. top_n (int, optional): how many vars to show. Defaults to 10. return_df (bool, optional): If True, return df instead of just showing df.Defaults to False. always_display (bool, optional): Display df even if returned. Defaults to True. show_how_to_delete (bool, optional): Prints out code to copy-paste into cell to del vars. Defaults to False. print_names (bool, optional): [description]. Defaults to False.
- Raises:
- Exception: if locals() not passed as first arg
Example Usage: # Must pass in local variables >> inspect_variables(locals()) # To see command to delete list of vars” >> inspect_variables(locals(),show_how_to_delete=True)
-
fsds_100719.ds.
list2df
(list, index_col=None, caption=None, return_df=True, df_kwds={})[source]¶ Quick turn an appened list with a header (row[0]) into a pretty dataframe.
- Args
- list (list of lists): index_col (string): name of column to set as index; None (Default) has integer index. set_caption (string): show_and_return (bool):
EXAMPLE USE: >> list_results = [[“Test”,”N”,”p-val”]]
# … run test and append list of result values …
>> list_results.append([test_Name,length(data),p])
## Displays styled dataframe if caption: >> df = list2df(list_results, index_col=”Test”,
set_caption=”Stat Test for Significance”)
-
fsds_100719.ds.
reload
(mod)[source]¶ - Reloads the module from file without restarting kernel.
- Args:
- mod (loaded mod or list of mod objects): name or handle of package (i.e.,[ pd, fs,np])
- Returns:
- reload each model.
Example: # You pass in whatever name you imported as. import my_functions_from_file as mf # after editing the source file: # mf.reload(mf)
-
fsds_100719.ds.
save_ihelp_to_file
(function, save_help=False, save_code=True, as_md=False, as_txt=True, folder='readme_resources/ihelp_outputs/', filename=None, file_mode='w')[source]¶ Saves the string representation of the ihelp source code as markdown. Filename should NOT have an extension. .txt or .md will be added based on as_md/as_txt. If filename is None, function name is used.
fsds_100719.ft package¶
A collection of submodules by online-ds-ft-100719. Maintained by James Irving (GitHub: jirvingphd) james.irving@flatironschool.com
fsds_100719.pt package¶
A collection of submodules by online-ds-pt-100719. Maintained by James Irving (GitHub: jirvingphd) james.irving@flatironschool.com