python - How to split single column of pandas dataframe into multiple columns with group? -


i new python pandas. have 1 dataframe below:

df = pd.dataframe({'name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],                'age': ['25', '22','21','32','37','26','24','30']}) print df         name age 0  football  25 1    ramesh  22 2    suresh  21 3    pankaj  32 4   cricket  37 5    rakesh  26 6     mohit  24 7    mahesh  30 

"name" column contains "sports name" , "sport person name" also. want split 2 different columns below:

expected output:

sports_name sport_person_name age football    ramesh            25             suresh            22             pankaj            32 cricket     rakesh            26             mohit             24             mahesh            30 

if make groupby on "name" column i'm not getting expected output , straight-forward output because no duplicates in "name" column. need use can expected output?

edit : if don't want hardcode sports names

df = pd.dataframe({'name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],            'age': ['', '22','21','32','','26','24','30']})  df = df.replace('', np.nan, regex=true)  nan_rows = df[df.isnull().t.any().t] sports = nan_rows['name'].tolist()  df['sports_name'] = df['name'].where(df['name'].isin(sports)).ffill() d = {'name':'sport_person_name'} df = df[df['sports_name'] != df['name']].reset_index(drop=true).rename(columns=d) df = df[['sports_name','sport_person_name','age']] print (df) 

i checked except "name" column rows contains nan values in rest of columns , sports names. created list of sports names , make use of below solutions create sports_name , sports_person_name columns.

you can use:

#define list of sports sports = ['football','cricket'] #create nans if no sport in name, forward filling nans df['sports_name'] = df['name'].where(df['name'].isin(sports)).ffill() #remove same values in columns sports_name , name, rename column d = {'name':'sport_person_name'} df = df[df['sports_name'] != df['name']].reset_index(drop=true).rename(columns=d) #change order of columns df = df[['sports_name','sport_person_name','age']] print (df)   sports_name sport_person_name age 0    football            ramesh  22 1    football            suresh  21 2    football            pankaj  32 3     cricket            rakesh  26 4     cricket             mohit  24 5     cricket            mahesh  30 

similar solution dataframe.insert - reorder not necessary:

#define list of sports sports = ['football','cricket'] #rename column dict d = {'name':'sport_person_name'} df = df.rename(columns=d) #create nans if no sport in name, forward filling nans df.insert(0, 'sports_name', df['sport_person_name'].where(df['sport_person_name'].isin(sports)).ffill()) #remove same values in columns sports_name , name df = df[df['sports_name'] != df['sport_person_name']].reset_index(drop=true) print (df)   sports_name sport_person_name age 0    football            ramesh  22 1    football            suresh  21 2    football            pankaj  32 3     cricket            rakesh  26 4     cricket             mohit  24 5     cricket            mahesh  30 

if want 1 value of sport add limit=1 ffill , replace nans empty string:

sports = ['football','cricket'] df['sports_name'] = df['name'].where(df['name'].isin(sports)).ffill(limit=1).fillna('') d = {'name':'sport_person_name'} df = df[df['sports_name'] != df['name']].reset_index(drop=true).rename(columns=d) df = df[['sports_name','sport_person_name','age']] print (df)   sports_name sport_person_name age 0    football            ramesh  22 1                        suresh  21 2                        pankaj  32 3     cricket            rakesh  26 4                         mohit  24 5                        mahesh  30 

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -