Pandas
Collection for all my ressources, snippets and tricks on pandas
Pandas Pipe
Pandas pipe functionality allows to write clean data preperation steps. Instead of having varaibles flying around like df1, df2 ,... the pipe chains a series of function calls on on dataframe. The mental model goes along like this
df -> apply function -> apply function -> ...
By seperating out each step as a function this has the advantage that you can save theem in a seperate python file where you can test them with unitests. Below is a small example which illustrates the functionality. Note it might be the case that the pipeline changes the original dataframe thats why the first step in the pipeline returns just a copy (there is probably a better way to do it) secondly it should b possible to use the logging module to get a better insight of what the pipeline steps do.
list_df = pd.read_html("https://de.wikipedia.org/wiki/Liste_der_L%C3%A4nder_nach_Bruttoinlandsprodukt?oldformat=true")
def deal_first_col(df_pipe):
df_pipe.columns = ['drop','Land','BIP in MIO US $ 2018', 'veränderung']
return df_pipe.iloc[:, 1:]
def make_copy(df_pipe):
return df_pipe.copy()
def set_dtypes(df_pipe, dtype_dict):
df_pipe['veränderung'] = df_pipe['veränderung'].str.replace(r",", ".")
df_pipe['veränderung'] = df_pipe['veränderung'].str.replace(r"\xa0", "")
df_pipe['veränderung'] = df_pipe['veränderung'].str.replace(r"%", "")
df_pipe['veränderung'] = df_pipe['veränderung'].str.replace("−", "-")
df_pipe['BIP in MIO US $ 2018'] = df_pipe['BIP in MIO US $ 2018'].str.replace(r".", "")
return df_pipe.astype(dtype_dict)
df = list_df[0]
(df
.pipe(make_copy)
.pipe(deal_first_col)
.dropna()
.pipe(set_dtypes, {'BIP in MIO US $ 2018': int,
'veränderung': float})
)
Sources
- [x] https://calmcode.io/pandas-pipe/end.html
- [ ] https://www.dataschool.io/python-pandas-tips-and-tricks/
- [ ] https://www.kaggle.com/python10pm/pandas-100-tricks
-
[ ] https://github.com/BrendaHali/python_cheat_sheets/blob/master/pandas-cheat-sheet.ipynb
-
[ ] siuba
-
[ ] ibis
-
[ ] https://twitter.com/data_cheeves/status/1183464943149965312
-
[ ] https://github.com/jmcarpenter2/swifter/blob/master/examples/swifter_apply_examples.ipynb
-
[ ] https://twitter.com/jschwabish/status/1290323581881266177
- [ ] https://twitter.com/TedPetrou/status/1282378990561439746
- [ ] https://www.allthesnippets.com/search/
- [ ] https://github.com/yhat/pandasql/