python - How to convert string series into integer -
one of columns of pandas data frame contains values such 0, 'a', 'b'. column parsed string. want convert integer 0, 1, 2. how can this?
here's initial data:
df = pd.dataframe({'col': [0, 'a', 'b', 'a']}) >>> df col 0 0 1 2 b 3
you can create dictionary of items you'd replace:
d = {'a': 1, 'b': 2}
then, apply get
column, returning original value if not in dictionary:
df['col'] = df.col.apply(lambda x: d.get(x, x)) >>> df df col 0 0 1 1 2 2 3 1
@edchum if of unique items contained in series in dictionary keys, .map(d)
more 5 times fast. however, missing value appears nan
. using lambda
function get
on dictionary appears have virtually identical performance.
%%timeit df = pd.dataframe({'col': [0, 'a', 'b', 'a'] * 100000}) df['col'] = df.col.map(d) 10 loops, best of 3: 33.3 ms per loop >>> df.head() col 0 nan 1 1 2 2 3 1 4 nan %%timeit df = pd.dataframe({'col': [0, 'a', 'b', 'a'] * 100000}) df['col'] = df.col.apply(lambda x: d.get(x, x)) 10 loops, best of 3: 188 ms per loop %%timeit df = pd.dataframe({'col': [0, 'a', 'b', 'a'] * 100000}) df['col'] = df.col.map(lambda x: d.get(x, x)) 10 loops, best of 3: 188 ms per loop in [64]: %timeit df['col'] = df.col.map(d) 10 loops, best of 3: 36.1 ms per loop
and here's crazy part. tested few lines of code earlier , got different results:
%%timeit df = pd.dataframe({'col': [0, 'a', 'b', 'a'] * 100000}) df['col'] = df.col.map(d) 10 loops, best of 3: 33.4 ms per loop >>> df.head() col 0 0 1 1 2 2 3 1 4 0 >>> pd.__version__ '0.16.2'
Comments
Post a Comment