在Postgresql中统计累计总数
我使用count
和group by
来获取每天注册的用户数量:
SELECT created_at, COUNT(email) FROM subscriptions GROUP BY created at;
结果:
created_at count ----------------- 04-04-2011 100 05-04-2011 50 06-04-2011 50 07-04-2011 300
我想每天都得到用户的累计总数。 我如何得到这个?
created_at count ----------------- 04-04-2011 100 05-04-2011 150 06-04-2011 200 07-04-2011 500
对于较大的数据集, 窗口函数是执行这些查询的最有效的方法 – 表格将只被扫描一次,而不是每个date一次,就像自连接一样。 它也看起来更简单。 🙂 PostgreSQL 8.4及以上版本支持窗口function。
这是它的样子:
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM subscriptions GROUP BY created_at;
这里OVER
创build窗口; ORDER BY created_at
意味着它必须总结在created_at
顺序的计数。
编辑:如果你想在一天内删除重复的电子邮件,你可以使用sum(count(distinct email))
。 不幸的是,这不会删除跨越不同date的重复。
如果你想删除所有重复,我认为最简单的是使用子查询和DISTINCT ON
。 这将电子邮件属性的最早的date(因为我按升序sortingcreated_at,它会select最早的一个):
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM ( SELECT DISTINCT ON (email) created_at, email FROM subscriptions ORDER BY email, created_at ) AS subq GROUP BY created_at;
如果你在(email, created_at)
上创build一个索引,这个查询不应该太慢。
(如果你想testing,这是我如何创build示例数据集)
create table subscriptions as select date '2000-04-04' + (i/10000)::int as created_at, 'foofoobar@foobar.com' || (i%700000)::text as email from generate_series(1,1000000) i; create index on subscriptions (email, created_at);
使用:
SELECT a.created_at, (SELECT COUNT(b.email) FROM SUBSCRIPTIONS b WHERE b.created_at <= a.created_at) AS count FROM SUBSCRIPTIONS a
SELECT s1.created_at, COUNT(s2.email) AS cumul_count FROM subscriptions s1 INNER JOIN subscriptions s2 ON s1.created_at >= s2.created_at GROUP BY s1.created_at
我假设你每天只需要一行,而且你还想显示没有任何订阅的日子(假设没有人订阅某个date,那么你是否想在前一天的余额中显示该date?)。 如果是这种情况,可以使用'with'function:
with recursive serialdates(adate) as ( select cast('2011-04-04' as date) union all select adate + 1 from serialdates where adate < cast('2011-04-07' as date) ) select D.adate, ( select count(distinct email) from subscriptions where created_at between date_trunc('month', D.adate) and D.adate ) from serialdates D
最好的方法是有一个日历表:日历(datedate,月份int,季度int,一半int,一周int,一年int)
然后,你可以join这个表格来为你需要的领域做总结。