regex - How to efficiently replace partial strings in pandas? -
objective: reformat contents of pandas dataframe based on has been provided me.
i looking change each column following style:
i using following code produce style need, not efficient:
lt = [] in patterns['components'][0]: x in i.split('__'): lt.append(x) lt[1].replace('(','').replace(', ',' < '+str(lt[0])+' ≤ ').replace(']','')
i have attempted pandas replace no avail - throws no errors , seems ignore aiming do.
source df:
in [37]: df out[37]: components outcome 0 (quantity__(0.0, 16199.0]) (unitprice__(-1055.648, 3947.558]) 1 (unitprice__(-1055.648, 3947.558]) (quantity__(0.0, 16199.0])
solution:
in [38]: cols = ['components','outcome'] ...: df[cols] = df[cols].replace(r'\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\).*', ...: r'\2 < \1 <= \3', ...: regex=true)
result:
in [39]: df out[39]: components outcome 0 0.0 < quantity <= 16199.0 -1055.648 < unitprice <= 3947.558 1 -1055.648 < unitprice <= 3947.558 0.0 < quantity <= 16199.0
update:
in [113]: df out[113]: components outcome 0 (quantity__(0.0, 16199.0]) (unitprice__(-1055.648, 3947.558]) 1 (unitprice__(-1055.648, 3947.558]) (quantity__(0.0, 16199.0]) in [114]: cols = ['components','outcome'] in [115]: pat = r'\s*\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\)\s*' in [116]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=true) in [117]: df out[117]: components outcome 0 0.0 < quantity <= 16199.0 -1055.648 < unitprice <= 3947.558 1 -1055.648 < unitprice <= 3947.558 0.0 < quantity <= 16199.0
or witout parentheses:
in [119]: df out[119]: components outcome 0 quantity__(0.0, 16199.0]) unitprice__(-1055.648, 3947.558] 1 unitprice__(-1055.648, 3947.558] quantity__(0.0, 16199.0] in [120]: pat = r'([^_]*)__\(([^,\s]+),\s*([^\]]+)\]' in [121]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=true) in [122]: df out[122]: components outcome 0 0.0 < quantity <= 16199.0) -1055.648 < unitprice <= 3947.558 1 -1055.648 < unitprice <= 3947.558 0.0 < quantity <= 16199.0
Comments
Post a Comment