machine learning - feeding categorical data to classifier -
suppose have dataset in following format:
col1 col2 col3 col4 col5 (to predicted) 12 13 4 primary 12 1 15 2 secondary 13 5 7 8 primary 18 14 12 44 college 6
col5 needs predicted test data using col1, col2, col3 , col4
during training, col1, col2, col3 can feeded such in array classifier how feed col4. aware categorical , need converted numeric type, after assigning number, still remain nominal type.
so if primary=1, secondary=2 , college=3, numbers 1,2 , 3 cant compared per magnitude because still labels, no numerical significance.
so how should proceed after step... should normalized ? or further should done ?
you should use 1 hot encoding in such cases. every possible categorial value creates new binary feature.
Comments
Post a Comment