python - How to convert string series into integer -


one of columns of pandas data frame contains values such 0, 'a', 'b'. column parsed string. want convert integer 0, 1, 2. how can this?

here's initial data:

df = pd.dataframe({'col': [0, 'a', 'b', 'a']})  >>> df   col 0   0 1   2   b 3   

you can create dictionary of items you'd replace:

d = {'a': 1, 'b': 2} 

then, apply get column, returning original value if not in dictionary:

df['col'] = df.col.apply(lambda x: d.get(x, x)) >>> df df    col 0    0 1    1 2    2 3    1 

@edchum if of unique items contained in series in dictionary keys, .map(d) more 5 times fast. however, missing value appears nan. using lambda function get on dictionary appears have virtually identical performance.

%%timeit df = pd.dataframe({'col': [0, 'a', 'b', 'a'] * 100000}) df['col'] = df.col.map(d)  10 loops, best of 3: 33.3 ms per loop  >>> df.head()    col 0  nan 1    1 2    2 3    1 4  nan %%timeit df = pd.dataframe({'col': [0, 'a', 'b', 'a'] * 100000}) df['col'] = df.col.apply(lambda x: d.get(x, x))  10 loops, best of 3: 188 ms per loop  %%timeit df = pd.dataframe({'col': [0, 'a', 'b', 'a'] * 100000}) df['col'] = df.col.map(lambda x: d.get(x, x))  10 loops, best of 3: 188 ms per loop  in [64]: %timeit df['col'] = df.col.map(d) 10 loops, best of 3: 36.1 ms per loop 

and here's crazy part. tested few lines of code earlier , got different results:

%%timeit df = pd.dataframe({'col': [0, 'a', 'b', 'a'] * 100000}) df['col'] = df.col.map(d)  10 loops, best of 3: 33.4 ms per loop  >>> df.head()    col 0    0 1    1 2    2 3    1 4    0  >>> pd.__version__ '0.16.2' 

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -