Filtering objects from a workdir

Learning python, how to filter

Introduction

Usually, if you work with in data science field, you need to create several objects, like list, DataFrame, matrix, etc. Besides, you have to handle some files to read and make copies. For that and other actions, sometime you have to select some objects from your workdir.

The goal in this post is to show some options about how to filter objects from your workdir.

Listing and Filtering your objects

There are several options to list the objects from a workdir. We will use the packages os and glob.

First of all, load the packages:

import os
import glob

You can check your workdir:

# check workdir
os.getcwd()

or se your workdir:

# set your workdir
os.chdir('/home/rafa//1710_SR_EVAPO/1807_G/')

If you want see all files in your workdir:

>>> os.listdir()
['CORR_TEST_fm940glt.png', 'ET_Fev_Maio_Agosto_OK.xlsm', 'T_sup_Fev_Maio_Agosto_OK.xlsm', 'H_final_Fev_Maio_Agosto_OK.xlsm', 'variveisdissertao.zip', '.#DATA_TIDY.py', 'Saldo_24h_Fev_Maio_Agosto_OK.xlsm', 'CORR_TEST_tmg43ws.png', 'Saldo_inst_Fev_Maio_Agosto_ok.xlsm', 'Fluxo_calor_solo_Fev_Maio_Agosto_OK.xlsm', 'Calor_latente_Fev_Maio_Agosto_ok.xlsm', 'DATA_TIDY.py', 'NDVI_Fev_Maio_Agosto_OK.xlsm', 'Albedo_Fev_Maio_Agosto_OK.xlsm', 'DATA_ANALYSIS.py', 'IAF_Fev_Maio_Agosto_OK.xlsm']

If you want all xlsm files in your workdir, one option is:

>>> glob.glob('*.xlsm')
['ET_Fev_Maio_Agosto_OK.xlsm', 'T_sup_Fev_Maio_Agosto_OK.xlsm', 'H_final_Fev_Maio_Agosto_OK.xlsm', 'Saldo_24h_Fev_Maio_Agosto_OK.xlsm', 'Saldo_inst_Fev_Maio_Agosto_ok.xlsm', 'Fluxo_calor_solo_Fev_Maio_Agosto_OK.xlsm', 'Calor_latente_Fev_Maio_Agosto_ok.xlsm', 'NDVI_Fev_Maio_Agosto_OK.xlsm', 'Albedo_Fev_Maio_Agosto_OK.xlsm', 'IAF_Fev_Maio_Agosto_OK.xlsm']

Let’s suppose that you need to filter all .xlsm files that has T_ in the name:

>>> res = list(filter(lambda k: 'T_' in k, lst_xlsm))
>>> print(res)
['ET_Fev_Maio_Agosto_OK.xlsm', 'T_sup_Fev_Maio_Agosto_OK.xlsm']

If, just in case, you want to filter all .xlsm files that has NO T_ in the name (note the not):

lst_xlsm = glob.glob('*.xlsm')
>>> res = list(filter(lambda k: not 'T_' in k, lst_xlsm))
>>> print(res)
['H_final_Fev_Maio_Agosto_OK.xlsm', 'Saldo_24h_Fev_Maio_Agosto_OK.xlsm', 'Saldo_inst_Fev_Maio_Agosto_ok.xlsm', 'Fluxo_calor_solo_Fev_Maio_Agosto_OK.xlsm', 'Calor_latente_Fev_Maio_Agosto_ok.xlsm', 'NDVI_Fev_Maio_Agosto_OK.xlsm', 'Albedo_Fev_Maio_Agosto_OK.xlsm', 'IAF_Fev_Maio_Agosto_OK.xlsm']

Another usual case in data science a list of objects to filter. If you use dir() you get all your environment content:

>>> dir()
['__PYTHON_EL_native_completion_setup', '__annotations__', '__builtins__', '__code', '__doc__', '__loader__', '__name__', '__package__', '__pyfile', '__spec__', 'codecs', 'df_ET', 'df_ETaug', 'df_ETfev', 'df_ETmay', 'df_H', 'df_HEATFLOW', 'df_HEATFLOWaug', 'df_HEATFLOWfev', 'df_HEATFLOWmay', 'df_Haug', 'df_Hfev', 'df_Hmay', 'df_IAF', 'df_IAFaug', 'df_IAFfev', 'df_IAFmay', 'df_LE', 'df_LEaug', 'df_LEfev', 'df_LEmay', 'df_RN24', 'df_RN24aug', 'df_RN24fev', 'df_RN24may', 'df_RNins', 'df_RNinsaug', 'df_RNinsfev', 'df_RNinsmay', 'df_TsupK', 'df_TsupKaug', 'df_TsupKfev', 'df_TsupKmay', 'df_albaug', 'df_albedo', 'df_albfev', 'df_albmay', 'df_fulldata1807G', 'df_fullfm940glt', 'df_fullfm940glt_pivot', 'df_fulltmg43ws', 'df_fulltmg43ws_pivot', 'glob', 'lst_xlsm', 'np', 'os', 'pd', 'res']

If you need to get all objects with df_ in the begin:

>>> lst_ob = dir()
>>> res = [k for k in lst_ob if 'df_' in k]
>>> print(res)
['df_ET', 'df_ETaug', 'df_ETfev', 'df_ETmay', 'df_H', 'df_HEATFLOW', 'df_HEATFLOWaug', 'df_HEATFLOWfev', 'df_HEATFLOWmay', 'df_Haug', 'df_Hfev', 'df_Hmay', 'df_IAF', 'df_IAFaug', 'df_IAFfev', 'df_IAFmay', 'df_LE', 'df_LEaug', 'df_LEfev', 'df_LEmay', 'df_RN24', 'df_RN24aug', 'df_RN24fev', 'df_RN24may', 'df_RNins', 'df_RNinsaug', 'df_RNinsfev', 'df_RNinsmay', 'df_TsupK', 'df_TsupKaug', 'df_TsupKfev', 'df_TsupKmay', 'df_albaug', 'df_albedo', 'df_albfev', 'df_albmay', 'df_fulldata1807G', 'df_fullfm940glt', 'df_fullfm940glt_pivot', 'df_fulltmg43ws', 'df_fulltmg43ws_pivot']

Conclusion

If you keep organized the files names and the objects names you can filter them easily. We only showed some options to filter lists. You can create more ways to achieve the same result. Let me know you know another one.

Best regards

References

stackoverflow

eli-bendersky

itay-maman

 Share!

 
comments powered by Disqus