sql - calculate weighted average for each day and id based on time intervals in PostgreSQL -
i have table in postgresql database looks this:
stid | e5 | e10 | diesel | date -----+------+------+--------+------------------------ e850 | 1300 | 1400 | 1500 | 2016-05-02 05:30:01+02 e850 | 1400 | 1500 | 1700 | 2016-05-02 08:30:01+02 e850 | 1300 | 1400 | 1500 | 2016-05-02 21:00:01+02 e850 | 1200 | 1300 | 1350 | 2016-05-03 10:30:01+02 e850 | 1300 | 1400 | 1500 | 2016-05-03 21:00:01+02 954d | 1200 | 1100 | 1300 | 2016-05-02 03:30:01+02 954d | 1300 | 1100 | 1300 | 2016-05-02 15:00:01+02 954d | 1400 | 1800 | 1400 | 2016-05-02 22:30:01+02 954d | 1700 | 1900 | 1400 | 2016-05-03 09:30:01+02 954d | 1500 | 1900 | 1200 | 2016-05-03 23:30:01+02
so have unique id's (stid), prices (e5,e10,diesel) , timestamp (date) indicates when price introduced. want calculate average price per day , stid, weighted duration price charged. , want take period between 8 , 8 pm account.
to calculate weighted average price of e5 stid e850 , date 2016-05-02 between 8 , 8 pm following:
(1300 * 1800 + 1400 * 41400) / 43200 = 1395.83333 1300 price set @ 5:30 , 1800 duration in seconds between 8 , 8:30 am. 1400 price set @ 8:30 , 41400 duration in seconds between 8:30 , 8 pm.
in end have table looks this:
stid | date | average_e5 | average_e10 | average_diesel -----+------------+------------+-------------+--------------- e850 | 2016-05-02 | 1395.83333 | 1495.83333 | 1691.66667 e850 | 2016-05-03 | 1220.83333 | 1320.83333 | 1381.25 954d | 2016-05-02 | 1241.66667 | 1100 | 1300 954d | 2016-05-03 | 1662.5 | 1887.5 | 1400
what able achieve far, calculate duration each price charged, using query:
select stid, e5, e10, diesel, date, nextdate, extract(epoch (nextdate - date)) ( select stid, e5, e10, diesel, date, ( select min(date) mytable t2 t2.stid = t1.stid , t2.date > t1.date ) nextdate mytable t1 ) t;
the table , query can found here rextester
but i'm having trouble figure out way calculate weighted average time restriction. thought adding dummy prices @ 8 , 8 pm before calculating durations, don't know how this.
since dataset pretty big, in time efficient way.
i did not needed cte, make more readable:
t=# ( select * , case when date < date_trunc('day', date) + '8 hours'::interval date_trunc('day', date) + '8 hours'::interval when date > date_trunc('day', date) + '20 hours'::interval date_trunc('day', date) + '20 hours'::interval else date end d , date_trunc('day', date) dt mytable ) , b ( select stid, e5, e10, diesel,date,d, dt , extract(epoch lead(d) on (partition stid,dt order stid,d) - d) diff ) select stid, e5,date,d, diff,sum(e5*diff*1.0) on (partition stid,dt)/sum(diff) on (partition stid,dt) e5_weight_avg b order stid desc, date; stid | e5 | date | d | diff | e5_weight_avg ------+---------+---------------------+---------------------+-------+------------------ e850 | 1300.00 | 2016-05-02 05:30:01 | 2016-05-02 08:00:00 | 1801 | 1395.83101851852 e850 | 1400.00 | 2016-05-02 08:30:01 | 2016-05-02 08:30:01 | 41399 | 1395.83101851852 e850 | 1300.00 | 2016-05-02 21:00:01 | 2016-05-02 20:00:00 | | 1395.83101851852 e850 | 1200.00 | 2016-05-03 10:30:01 | 2016-05-03 10:30:01 | 34199 | 1200 e850 | 1300.00 | 2016-05-03 21:00:01 | 2016-05-03 20:00:00 | | 1200 954d | 1200.00 | 2016-05-02 03:30:01 | 2016-05-02 08:00:00 | 25201 | 1241.66435185185 954d | 1300.00 | 2016-05-02 15:00:01 | 2016-05-02 15:00:01 | 17999 | 1241.66435185185 954d | 1400.00 | 2016-05-02 22:30:01 | 2016-05-02 20:00:00 | | 1241.66435185185 954d | 1700.00 | 2016-05-03 09:30:01 | 2016-05-03 09:30:01 | 37799 | 1700 954d | 1500.00 | 2016-05-03 23:30:01 | 2016-05-03 20:00:00 | | 1700 (10 rows)
and thus, skipping middle steps:
t=# ( select * , case when date < date_trunc('day', date) + '8 hours'::interval date_trunc('day', date) + '8 hours'::interval when date > date_trunc('day', date) + '20 hours'::interval date_trunc('day', date) + '20 hours'::interval else date end d , date_trunc('day', date) dt mytable ) , b ( select stid, e5, e10, diesel,date,d, dt , extract(epoch lead(d) on (partition stid,dt order stid,d) - d) diff ) select distinct stid, dt,sum(e5*diff*1.0) on (partition stid,dt)/sum(diff) on (partition stid,dt) e5_weight_avg b order stid desc, dt; stid | dt | e5_weight_avg ------+---------------------+------------------ e850 | 2016-05-02 00:00:00 | 1395.83101851852 e850 | 2016-05-03 00:00:00 | 1200 954d | 2016-05-02 00:00:00 | 1241.66435185185 954d | 2016-05-03 00:00:00 | 1700 (4 rows)
Comments
Post a Comment