python - Find the difference between successive timestamps separately for each day using pandas -


i have dataframe datetime index:

>>> d.head() out[29]:                           value  time                                                                                                   2017-04-02 21:11:00.221  1114.73    2017-04-03 00:01:00.221  1114.73  2017-04-03 00:01:01.345  1114.73  2017-04-03 00:01:02.701  1114.10   

i want successive differences in index times, each day separately. doing incomplete:

d['datetime']= d.index d['datetime_diff']=d['datetime'].diff() 

this gives me difference between successive index timestamps, doesnt start afresh each day. can separate date datetime, groupby on date , calculate timediffs each date. there no set first , last time each day.

after getting these timediffs, intend stats mean, median, count etc.

is there better way this? guess reduces different problem of marking first value on each day. can first value on each day using group-by, doesn't solve issue instead of retrieving first value, need easy way label first value.

use pd.timegrouper, , group frequency of 1d

diff = df.groupby(pd.timegrouper(freq='1d')).diff() diff                           value time                           2017-04-02 21:11:00.221    nan 2017-04-03 00:01:00.221    nan 2017-04-03 00:01:01.345   0.00 2017-04-03 00:01:02.701  -0.63 

if df.time not of datetime type, you'll need convert it:

df.index = pd.to_datetime(df.index) 

to difference of index only, there's simpler way - first, reset_index, groupby , call .diff on column. can use pd.grouper key=time this.

diff = df.reset_index().groupby(pd.grouper(key='time', freq='1d')).time.diff() diff  0               nat 1               nat 2   00:00:01.124000 3   00:00:01.356000 name: time, dtype: timedelta64[ns] 

as aside, if interested in day-wise stats, can groupby , call .describe:

g = df.groupby(pd.grouper(level=0, freq='1d')) g.describe()             value                                                          \            count     mean       std      min       25%      50%      75%    time                                                                        2017-04-02   1.0  1114.73       nan  1114.73  1114.730  1114.73  1114.73    2017-04-03   3.0  1114.52  0.363731  1114.10  1114.415  1114.73  1114.73                      max   time                  2017-04-02  1114.73   2017-04-03  1114.73  

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -