Filling NA of n Columns with Nulls Based on DataType using Pandas

Posted on May 17, 2019 by By Sohail, in Uncategorized | 0

Lets Run through a scenario I have a csv where specific columns have nulls or Empty Values.
using python pandas lets read the csv

import pandas as pd
t1 = pd.read_csv('E:/Blog/table3.csv',delimiter=',', encoding='iso-8859-1',low_memory=False)

     empid                   address  phoneno  weight comment     name   age
0      1    namkeknkfg hfkfkf,ijfkdf  1234.0  64.50     NaN       sohail  27.0
1      2           knfkngfkn, dnfd   1111.0   95.40     NaN         adam  27.0
2      3         knfkngfkn, fgfsga   4878.0   96.40     NaN          gfg  28.0
3      4                sds, hfsk5   7471.0   97.40     NaN         ble1  29.0
4      5               ghgh, eryw4      NaN   98.40     NaN      Blah II  30.0
5      6          knfkngfkn, jdrhw   4565.0   99.40     hey        cdjd   NaN
6      7              dsdsd, sgegs   4545.0  100.40     NaN          NaN  32.0
7      8                kykt, zvae   4999.0   95.99     NaN         jshd  33.0
8      9              dyhsyrd, gsd   1000.0   95.78     you        ksnkd  34.0
9     10                ioyud, zdd   2000.0   95.12     NaN         ksdm  35.0

Let’s assume we want to replace a float with 0 and a string with None, we will have to replace a specific column we want with whatever we want to replace, if we had 20 columns that could be null that means we will have to define 20 columns with what value we want to replace with.

First let’s identify columns that have null in them:

tnull = t1.isna().any().to_dict()

Output: {’empid’: False, ‘address’: False, ‘phoneno’: True, ‘weight’: False, ‘comment’: True, ‘name’: True, ‘age’: True}

Next let’s define our null replacing values based on the datatype:

nullcols = {
    'float64': 10000,
    'object': 'None'
}

Indicating if we have a float64 lets replace it with 10000, if we have object lets replace it with None.

Now let’s define an empty dictionary where we will define columns and values to replace with

getnullcols = {}

Now with the below code we can achieve it.

import pandas as pd
t1 = pd.read_csv('E:/Blog/table3.csv',delimiter=',', encoding='iso-8859-1',low_memory=False)
print(t1)
tnull = t1.isna().any().to_dict()
t1colsdt = t1.dtypes
t1cldt = t1colsdt.to_dict()
print(tnull)
print(t1cldt)
columnkeys = {}


nullcols = {
    'float64': 10000,
    'object': 'None'
}

getnullcols = {}

for key,value in tnull.items():
    if str(value) == 'True':
        columnkeys.update({key: t1cldt[key]})
print(columnkeys)


for key,value in columnkeys.items():
    for k,v in nullcols.items():
        if k == value:
            getnullcols.update({key: v})
print(getnullcols)
values = getnullcols
print(t1.fillna(value=values))

Output

    empid                   address  phoneno  weight comment     name      age
0      1  namkeknkfg hfkfkf,ijfkdf   1234.0   64.50    None   sohail     27.0
1      2           knfkngfkn, dnfd   1111.0   95.40    None     adam     27.0
2      3         knfkngfkn, fgfsga   4878.0   96.40    None      gfg     28.0
3      4                sds, hfsk5   7471.0   97.40    None     ble1     29.0
4      5               ghgh, eryw4  10000.0   98.40    None  Blah II     30.0
5      6          knfkngfkn, jdrhw   4565.0   99.40     hey     cdjd  10000.0
6      7              dsdsd, sgegs   4545.0  100.40    None     None     32.0
7      8                kykt, zvae   4999.0   95.99    None     jshd     33.0
8      9              dyhsyrd, gsd   1000.0   95.78     you    ksnkd     34.0
9     10                ioyud, zdd   2000.0   95.12    None     ksdm     35.0

In case if you have any questions or if there exists a better way please do comment below

Thanks
Sohail Izebhijie
Helical IT Solutions Pvt Ltd

Best Open Source Business Intelligence Software Helical Insight Here

A Business Intelligence Framework

Best Open Source Business Intelligence Software Helical Insight is Here

A Business Intelligence Framework

0 0 votes

Article Rating

0 Comments

Inline Feedbacks

View all comments

You might also like..

Business Intelligence

Installation of Firebird db

By admin

Steps to install firebird db 1. Go to google and type firebird in search box and then click on first link. License aggrement 2. Click on downloads and then install Firebird latest version(5.0.0). 3. It will navigate to the below...

Software Testing

Defect Life Cycle

By admin

This blog explains about the complete life cycle of a bug and different status of bug from the stage it was identified,fixed,retest and close. What is Defect life cycle? Defect life cycle is the life cycle of a defect or...

Software Testing

Different Levels of Testing in Software Testing

By admin

What are the Levels of Software Testing? In this blog,we are going to understand the various levels of software testing In Software Testing,we have four different levels of testing,which are as mentioned below: Unit Testing Integration Testing System Testing Acceptance...

About Helical IT Solutions Pvt Ltd

Location

Contact Us

Search what you are looking for..

Filling NA of n Columns with Nulls Based on DataType using Pandas

Posted on May 17, 2019 by By Sohail, in Uncategorized | 0

A Business Intelligence Framework

A Business Intelligence Framework

You might also like..

Business Intelligence

Installation of Firebird db

By admin

Software Testing

Defect Life Cycle

By admin

Software Testing

Different Levels of Testing in Software Testing

By admin

Contact Form