Filling NA of n Columns with Nulls Based on DataType using Pandas

Posted on by By Sohail, in Uncategorized | 0

Lets Run through a scenario I have a csv where specific columns have nulls or Empty Values.
using python pandas lets read the csv

import pandas as pd
t1 = pd.read_csv('E:/Blog/table3.csv',delimiter=',', encoding='iso-8859-1',low_memory=False)
     empid                   address  phoneno  weight comment     name   age
0      1    namkeknkfg hfkfkf,ijfkdf  1234.0  64.50     NaN       sohail  27.0
1      2           knfkngfkn, dnfd   1111.0   95.40     NaN         adam  27.0
2      3         knfkngfkn, fgfsga   4878.0   96.40     NaN          gfg  28.0
3      4                sds, hfsk5   7471.0   97.40     NaN         ble1  29.0
4      5               ghgh, eryw4      NaN   98.40     NaN      Blah II  30.0
5      6          knfkngfkn, jdrhw   4565.0   99.40     hey        cdjd   NaN
6      7              dsdsd, sgegs   4545.0  100.40     NaN          NaN  32.0
7      8                kykt, zvae   4999.0   95.99     NaN         jshd  33.0
8      9              dyhsyrd, gsd   1000.0   95.78     you        ksnkd  34.0
9     10                ioyud, zdd   2000.0   95.12     NaN         ksdm  35.0

Let’s assume we want to replace a float with 0 and a string with None, we will have to replace a specific column we want with whatever we want to replace, if we had 20 columns that could be null that means we will have to define 20 columns with what value we want to replace with.

First let’s identify columns that have null in them:

tnull = t1.isna().any().to_dict()

Output: {’empid’: False, ‘address’: False, ‘phoneno’: True, ‘weight’: False, ‘comment’: True, ‘name’: True, ‘age’: True}

Next let’s define our null replacing values based on the datatype:

nullcols = {
    'float64': 10000,
    'object': 'None'
}

Indicating if we have a float64 lets replace it with 10000, if we have object lets replace it with None.

Now let’s define an empty dictionary where we will define columns and values to replace with

getnullcols = {}

Now with the below code we can achieve it.

import pandas as pd
t1 = pd.read_csv('E:/Blog/table3.csv',delimiter=',', encoding='iso-8859-1',low_memory=False)
print(t1)
tnull = t1.isna().any().to_dict()
t1colsdt = t1.dtypes
t1cldt = t1colsdt.to_dict()
print(tnull)
print(t1cldt)
columnkeys = {}


nullcols = {
    'float64': 10000,
    'object': 'None'
}

getnullcols = {}

for key,value in tnull.items():
    if str(value) == 'True':
        columnkeys.update({key: t1cldt[key]})
print(columnkeys)


for key,value in columnkeys.items():
    for k,v in nullcols.items():
        if k == value:
            getnullcols.update({key: v})
print(getnullcols)
values = getnullcols
print(t1.fillna(value=values))

Output

    empid                   address  phoneno  weight comment     name      age
0      1  namkeknkfg hfkfkf,ijfkdf   1234.0   64.50    None   sohail     27.0
1      2           knfkngfkn, dnfd   1111.0   95.40    None     adam     27.0
2      3         knfkngfkn, fgfsga   4878.0   96.40    None      gfg     28.0
3      4                sds, hfsk5   7471.0   97.40    None     ble1     29.0
4      5               ghgh, eryw4  10000.0   98.40    None  Blah II     30.0
5      6          knfkngfkn, jdrhw   4565.0   99.40     hey     cdjd  10000.0
6      7              dsdsd, sgegs   4545.0  100.40    None     None     32.0
7      8                kykt, zvae   4999.0   95.99    None     jshd     33.0
8      9              dyhsyrd, gsd   1000.0   95.78     you    ksnkd     34.0
9     10                ioyud, zdd   2000.0   95.12    None     ksdm     35.0

In case if you have any questions or if there exists a better way please do comment below

Thanks
Sohail Izebhijie
Helical IT Solutions Pvt Ltd

logo

Best Open Source Business Intelligence Software Helical Insight Here

logo

A Business Intelligence Framework


logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments