python - How to split single column of pandas dataframe into multiple columns with group? -
i new python pandas. have 1 dataframe below:
df = pd.dataframe({'name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'], 'age': ['25', '22','21','32','37','26','24','30']}) print df name age 0 football 25 1 ramesh 22 2 suresh 21 3 pankaj 32 4 cricket 37 5 rakesh 26 6 mohit 24 7 mahesh 30
"name" column contains "sports name" , "sport person name" also. want split 2 different columns below:
expected output:
sports_name sport_person_name age football ramesh 25 suresh 22 pankaj 32 cricket rakesh 26 mohit 24 mahesh 30
if make groupby on "name" column i'm not getting expected output , straight-forward output because no duplicates in "name" column. need use can expected output?
edit : if don't want hardcode sports names
df = pd.dataframe({'name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'], 'age': ['', '22','21','32','','26','24','30']}) df = df.replace('', np.nan, regex=true) nan_rows = df[df.isnull().t.any().t] sports = nan_rows['name'].tolist() df['sports_name'] = df['name'].where(df['name'].isin(sports)).ffill() d = {'name':'sport_person_name'} df = df[df['sports_name'] != df['name']].reset_index(drop=true).rename(columns=d) df = df[['sports_name','sport_person_name','age']] print (df)
i checked except "name" column rows contains nan values in rest of columns , sports names. created list of sports names , make use of below solutions create sports_name , sports_person_name columns.
you can use:
#define list of sports sports = ['football','cricket'] #create nans if no sport in name, forward filling nans df['sports_name'] = df['name'].where(df['name'].isin(sports)).ffill() #remove same values in columns sports_name , name, rename column d = {'name':'sport_person_name'} df = df[df['sports_name'] != df['name']].reset_index(drop=true).rename(columns=d) #change order of columns df = df[['sports_name','sport_person_name','age']] print (df) sports_name sport_person_name age 0 football ramesh 22 1 football suresh 21 2 football pankaj 32 3 cricket rakesh 26 4 cricket mohit 24 5 cricket mahesh 30
similar solution dataframe.insert
- reorder not necessary:
#define list of sports sports = ['football','cricket'] #rename column dict d = {'name':'sport_person_name'} df = df.rename(columns=d) #create nans if no sport in name, forward filling nans df.insert(0, 'sports_name', df['sport_person_name'].where(df['sport_person_name'].isin(sports)).ffill()) #remove same values in columns sports_name , name df = df[df['sports_name'] != df['sport_person_name']].reset_index(drop=true) print (df) sports_name sport_person_name age 0 football ramesh 22 1 football suresh 21 2 football pankaj 32 3 cricket rakesh 26 4 cricket mohit 24 5 cricket mahesh 30
if want 1 value of sport add limit=1
ffill
, replace nan
s empty string:
sports = ['football','cricket'] df['sports_name'] = df['name'].where(df['name'].isin(sports)).ffill(limit=1).fillna('') d = {'name':'sport_person_name'} df = df[df['sports_name'] != df['name']].reset_index(drop=true).rename(columns=d) df = df[['sports_name','sport_person_name','age']] print (df) sports_name sport_person_name age 0 football ramesh 22 1 suresh 21 2 pankaj 32 3 cricket rakesh 26 4 mohit 24 5 mahesh 30
Comments
Post a Comment