Lets Run through a scenario I have a csv where specific columns have nulls or Empty Values.
using python pandas lets read the csv
import pandas as pd t1 = pd.read_csv('E:/Blog/table3.csv',delimiter=',', encoding='iso-8859-1',low_memory=False)
empid address phoneno weight comment name age 0 1 namkeknkfg hfkfkf,ijfkdf 1234.0 64.50 NaN sohail 27.0 1 2 knfkngfkn, dnfd 1111.0 95.40 NaN adam 27.0 2 3 knfkngfkn, fgfsga 4878.0 96.40 NaN gfg 28.0 3 4 sds, hfsk5 7471.0 97.40 NaN ble1 29.0 4 5 ghgh, eryw4 NaN 98.40 NaN Blah II 30.0 5 6 knfkngfkn, jdrhw 4565.0 99.40 hey cdjd NaN 6 7 dsdsd, sgegs 4545.0 100.40 NaN NaN 32.0 7 8 kykt, zvae 4999.0 95.99 NaN jshd 33.0 8 9 dyhsyrd, gsd 1000.0 95.78 you ksnkd 34.0 9 10 ioyud, zdd 2000.0 95.12 NaN ksdm 35.0
Let’s assume we want to replace a float with 0 and a string with None, we will have to replace a specific column we want with whatever we want to replace, if we had 20 columns that could be null that means we will have to define 20 columns with what value we want to replace with.
First let’s identify columns that have null in them:
tnull = t1.isna().any().to_dict()
Output: {’empid’: False, ‘address’: False, ‘phoneno’: True, ‘weight’: False, ‘comment’: True, ‘name’: True, ‘age’: True}
Next let’s define our null replacing values based on the datatype:
nullcols = { 'float64': 10000, 'object': 'None' }
Indicating if we have a float64 lets replace it with 10000, if we have object lets replace it with None.
Now let’s define an empty dictionary where we will define columns and values to replace with
getnullcols = {}
Now with the below code we can achieve it.
import pandas as pd t1 = pd.read_csv('E:/Blog/table3.csv',delimiter=',', encoding='iso-8859-1',low_memory=False) print(t1) tnull = t1.isna().any().to_dict() t1colsdt = t1.dtypes t1cldt = t1colsdt.to_dict() print(tnull) print(t1cldt) columnkeys = {} nullcols = { 'float64': 10000, 'object': 'None' } getnullcols = {} for key,value in tnull.items(): if str(value) == 'True': columnkeys.update({key: t1cldt[key]}) print(columnkeys) for key,value in columnkeys.items(): for k,v in nullcols.items(): if k == value: getnullcols.update({key: v}) print(getnullcols) values = getnullcols print(t1.fillna(value=values))
Output
empid address phoneno weight comment name age 0 1 namkeknkfg hfkfkf,ijfkdf 1234.0 64.50 None sohail 27.0 1 2 knfkngfkn, dnfd 1111.0 95.40 None adam 27.0 2 3 knfkngfkn, fgfsga 4878.0 96.40 None gfg 28.0 3 4 sds, hfsk5 7471.0 97.40 None ble1 29.0 4 5 ghgh, eryw4 10000.0 98.40 None Blah II 30.0 5 6 knfkngfkn, jdrhw 4565.0 99.40 hey cdjd 10000.0 6 7 dsdsd, sgegs 4545.0 100.40 None None 32.0 7 8 kykt, zvae 4999.0 95.99 None jshd 33.0 8 9 dyhsyrd, gsd 1000.0 95.78 you ksnkd 34.0 9 10 ioyud, zdd 2000.0 95.12 None ksdm 35.0
In case if you have any questions or if there exists a better way please do comment below
Thanks
Sohail Izebhijie
Helical IT Solutions Pvt Ltd
Best Open Source Business Intelligence Software Helical Insight Here