使用Tablefunc在多列上旋转
有没有人用tablefunc
在多个variables上转动,而不是只使用行名 ? 文档说明 :
对于具有相同row_name值的所有行,“额外”列预计是相同的。
我不知道如何做到这一点,没有结合我想枢轴转动的列(我非常怀疑会给我我需要的速度)。 一个可能的方法是将实体数字化,并将其添加到localt中,毫秒数,但这似乎是一个不稳定的方式来进行。
我编辑了对这个问题的回应中使用的数据: PostgreSQL交叉表查询 。
CREATE TEMP TABLE t4 ( timeof timestamp ,entity character ,status integer ,ct integer); INSERT INTO t4 VALUES ('2012-01-01', 'a', 1, 1) ,('2012-01-01', 'a', 0, 2) ,('2012-01-02', 'b', 1, 3) ,('2012-01-02', 'c', 0, 4); SELECT * FROM crosstab( 'SELECT timeof, entity, status, ct FROM t4 ORDER BY 1,2,3' ,$$VALUES (1::text), (0::text)$$) AS ct ("Section" timestamp, "Attribute" character, "1" int, "0" int);
返回:
Section | 属性| 1 | 0 --------------------------- + ----------- + --- + --- 2012-01-01 00:00:00 | a | 1 | 2 2012-01-02 00:00:00 | b | 3 | 4
因此,正如文档所述, 额外的列又名'属性'被假定为相同的每个行的名称又名'部分'。 因此,即使“实体”对于“timeof”值也具有“c”值,它报告第二行的b 。
期望的输出:
Section | Attribute | 1 | 0 --------------------------+-----------+---+--- 2012-01-01 00:00:00 | a | 1 | 2 2012-01-02 00:00:00 | b | 3 | 2012-01-02 00:00:00 | c | | 4
任何想法或参考?
多一点背景:我可能需要为数十亿行做这个,我正在testing这种数据存储在长和宽的格式,看看我是否可以使用tablefunc
从长到宽的格式比常规聚合函数更有效。
我将每分钟为大约300个实体进行大约100次测量。 通常,我们需要比较给定实体的给定秒的不同测量结果,因此我们需要经常采用宽格式。 而且,对特定实体进行的测量也是高度可变的。
编辑:我发现这个资源: http : //www.postgresonline.com/journal/categories/24-tablefunc 。
您的查询的问题是, b
和c
共享相同的时间戳2012-01-02 00:00:00
,你有第一个timestamp
列timeof
你的查询,所以 – 即使你加上大胆的重点 – b
和c
是只是在同一组中的额外列2012-01-02 00:00:00
。 自从(引用手册)以来只返回第一个( b
) :
row_name
列必须是第一个。category
和value
列的顺序必须是最后两列。row_name
和category
之间的任何列都被视为“extra”。 对于具有相同row_name
值的所有行,“额外”列预计是相同的。
大胆重视我的。
只是恢复前两列的顺序,使entity
的行名,它的工作原理如下:
SELECT * FROM crosstab( 'SELECT entity, timeof, status, ct FROM t4 ORDER BY 1' ,'VALUES (1), (0)') AS ct ( "Attribute" character ,"Section" timestamp ,"status_1" int ,"status_0" int);
当然, entity
必须是唯一的。
重申
-
row_name
首先 - (可选) 下一列
-
category
(由第二个参数定义)和value
最后一个 。
从每个row_name
分区的第一行填充额外的列。 其他行的值将被忽略,每个row_name
只有一列要填充。 通常这些对于一个row_name
每一行都是row_name
,但这取决于你。
在我原来的问题中,我应该使用这个样本数据:
CREATE TEMP TABLE t4 ( timeof date ,entity integer ,status integer ,ct integer); INSERT INTO t4 VALUES ('2012-01-01', 1, 1, 1) ,('2012-01-01', 1, 0, 2) ,('2012-01-01', 3, 0, 3) ,('2012-01-02', 2, 1, 4) ,('2012-01-02', 3, 1, 5) ,('2012-01-02', 3, 0, 6);
有了这个,我必须在时间和实体上进行转换。 由于tablefunc
仅使用一列作为旋转,所以您需要find一种方法在该列中填充两个维度。 ( http://www.postgresonline.com/journal/categories/24-tablefunc )。 我跟数组一样,就像在那个链接上的例子一样。
SELECT (timestamp 'epoch' + row_name[1] * INTERVAL '1 second')::date as localt, row_name[2] As entity, status1, status0 FROM crosstab('SELECT ARRAY[extract(epoch from timeof), entity] as row_name, status, ct FROM t4 ORDER BY timeof, entity, status' ,$$VALUES (1::text), (0::text)$$) as ct (row_name integer[], status1 int, status0 int)
FWIW,我尝试使用一个字符数组,目前看起来这是我的设置更快; 9.2.3 Postgresql。
这是结果和期望的输出。
localt | entity | status1 | status0 --------------------------+---------+-------- 2012-01-01 | 1 | 1 | 2 2012-01-01 | 3 | | 3 2012-01-02 | 2 | 4 | 2012-01-02 | 3 | 5 | 6
我很好奇这是如何执行一个更大的数据集,并会在以后的date报告。
好的,所以我把它放在我的使用案例的桌子上。 要么我做错了,或者交叉表不适合我的使用。
首先我做了一些类似的数据:
CREATE TABLE public.test ( id serial primary key, msrmnt integer, entity integer, localt timestamp, val double precision ); CREATE INDEX ix_test_msrmnt ON public.test (msrmnt); CREATE INDEX ix_public_test_201201_entity ON public.test (entity); CREATE INDEX ix_public_test_201201_localt ON public.test (localt); insert into public.test (msrmnt, entity, localt, val) select * from( SELECT msrmnt, entity, localt, random() as val FROM generate_series('2012-01-01'::timestamp, '2012-01-01 23:59:00'::timestamp, interval '1 minutes') as localt join (select * FROM generate_series(1, 50, 1) as msrmnt) as msrmnt on 1=1 join (select * FROM generate_series(1, 200, 1) as entity) as entity on 1=1) as data;
然后我跑了几次crosstab代码:
explain analyze SELECT (timestamp 'epoch' + row_name[1] * INTERVAL '1 second')::date As localt, row_name[2] as entity ,msrmnt01,msrmnt02,msrmnt03,msrmnt04,msrmnt05,msrmnt06,msrmnt07,msrmnt08,msrmnt09,msrmnt10 ,msrmnt11,msrmnt12,msrmnt13,msrmnt14,msrmnt15,msrmnt16,msrmnt17,msrmnt18,msrmnt19,msrmnt20 ,msrmnt21,msrmnt22,msrmnt23,msrmnt24,msrmnt25,msrmnt26,msrmnt27,msrmnt28,msrmnt29,msrmnt30 ,msrmnt31,msrmnt32,msrmnt33,msrmnt34,msrmnt35,msrmnt36,msrmnt37,msrmnt38,msrmnt39,msrmnt40 ,msrmnt41,msrmnt42,msrmnt43,msrmnt44,msrmnt45,msrmnt46,msrmnt47,msrmnt48,msrmnt49,msrmnt50 FROM crosstab('SELECT ARRAY[extract(epoch from localt), entity] as row_name, msrmnt, val FROM public.test ORDER BY localt, entity, msrmnt',$$VALUES ( 1::text),( 2::text),( 3::text),( 4::text),( 5::text),( 6::text),( 7::text),( 8::text),( 9::text),(10::text) ,(11::text),(12::text),(13::text),(14::text),(15::text),(16::text),(17::text),(18::text),(19::text),(20::text) ,(21::text),(22::text),(23::text),(24::text),(25::text),(26::text),(27::text),(28::text),(29::text),(30::text) ,(31::text),(32::text),(33::text),(34::text),(35::text),(36::text),(37::text),(38::text),(39::text),(40::text) ,(41::text),(42::text),(43::text),(44::text),(45::text),(46::text),(47::text),(48::text),(49::text),(50::text)$$) as ct (row_name integer[],msrmnt01 double precision, msrmnt02 double precision,msrmnt03 double precision, msrmnt04 double precision,msrmnt05 double precision, msrmnt06 double precision,msrmnt07 double precision, msrmnt08 double precision,msrmnt09 double precision, msrmnt10 double precision ,msrmnt11 double precision, msrmnt12 double precision,msrmnt13 double precision, msrmnt14 double precision,msrmnt15 double precision, msrmnt16 double precision,msrmnt17 double precision, msrmnt18 double precision,msrmnt19 double precision, msrmnt20 double precision ,msrmnt21 double precision, msrmnt22 double precision,msrmnt23 double precision, msrmnt24 double precision,msrmnt25 double precision, msrmnt26 double precision,msrmnt27 double precision, msrmnt28 double precision,msrmnt29 double precision, msrmnt30 double precision ,msrmnt31 double precision, msrmnt32 double precision,msrmnt33 double precision, msrmnt34 double precision,msrmnt35 double precision, msrmnt36 double precision,msrmnt37 double precision, msrmnt38 double precision,msrmnt39 double precision, msrmnt40 double precision ,msrmnt41 double precision, msrmnt42 double precision,msrmnt43 double precision, msrmnt44 double precision,msrmnt45 double precision, msrmnt46 double precision,msrmnt47 double precision, msrmnt48 double precision,msrmnt49 double precision, msrmnt50 double precision) limit 1000
在第三次尝试中获得这个:
QUERY PLAN Limit (cost=0.00..20.00 rows=1000 width=432) (actual time=110236.673..110237.667 rows=1000 loops=1) -> Function Scan on crosstab ct (cost=0.00..20.00 rows=1000 width=432) (actual time=110236.672..110237.598 rows=1000 loops=1) Total runtime: 110699.598 ms
然后我跑了几次标准的解决scheme:
explain analyze select localt, entity, max(case when msrmnt = 1 then val else null end) as msrmnt01 ,max(case when msrmnt = 2 then val else null end) as msrmnt02 ,max(case when msrmnt = 3 then val else null end) as msrmnt03 ,max(case when msrmnt = 4 then val else null end) as msrmnt04 ,max(case when msrmnt = 5 then val else null end) as msrmnt05 ,max(case when msrmnt = 6 then val else null end) as msrmnt06 ,max(case when msrmnt = 7 then val else null end) as msrmnt07 ,max(case when msrmnt = 8 then val else null end) as msrmnt08 ,max(case when msrmnt = 9 then val else null end) as msrmnt09 ,max(case when msrmnt = 10 then val else null end) as msrmnt10 ,max(case when msrmnt = 11 then val else null end) as msrmnt11 ,max(case when msrmnt = 12 then val else null end) as msrmnt12 ,max(case when msrmnt = 13 then val else null end) as msrmnt13 ,max(case when msrmnt = 14 then val else null end) as msrmnt14 ,max(case when msrmnt = 15 then val else null end) as msrmnt15 ,max(case when msrmnt = 16 then val else null end) as msrmnt16 ,max(case when msrmnt = 17 then val else null end) as msrmnt17 ,max(case when msrmnt = 18 then val else null end) as msrmnt18 ,max(case when msrmnt = 19 then val else null end) as msrmnt19 ,max(case when msrmnt = 20 then val else null end) as msrmnt20 ,max(case when msrmnt = 21 then val else null end) as msrmnt21 ,max(case when msrmnt = 22 then val else null end) as msrmnt22 ,max(case when msrmnt = 23 then val else null end) as msrmnt23 ,max(case when msrmnt = 24 then val else null end) as msrmnt24 ,max(case when msrmnt = 25 then val else null end) as msrmnt25 ,max(case when msrmnt = 26 then val else null end) as msrmnt26 ,max(case when msrmnt = 27 then val else null end) as msrmnt27 ,max(case when msrmnt = 28 then val else null end) as msrmnt28 ,max(case when msrmnt = 29 then val else null end) as msrmnt29 ,max(case when msrmnt = 30 then val else null end) as msrmnt30 ,max(case when msrmnt = 31 then val else null end) as msrmnt31 ,max(case when msrmnt = 32 then val else null end) as msrmnt32 ,max(case when msrmnt = 33 then val else null end) as msrmnt33 ,max(case when msrmnt = 34 then val else null end) as msrmnt34 ,max(case when msrmnt = 35 then val else null end) as msrmnt35 ,max(case when msrmnt = 36 then val else null end) as msrmnt36 ,max(case when msrmnt = 37 then val else null end) as msrmnt37 ,max(case when msrmnt = 38 then val else null end) as msrmnt38 ,max(case when msrmnt = 39 then val else null end) as msrmnt39 ,max(case when msrmnt = 40 then val else null end) as msrmnt40 ,max(case when msrmnt = 41 then val else null end) as msrmnt41 ,max(case when msrmnt = 42 then val else null end) as msrmnt42 ,max(case when msrmnt = 43 then val else null end) as msrmnt43 ,max(case when msrmnt = 44 then val else null end) as msrmnt44 ,max(case when msrmnt = 45 then val else null end) as msrmnt45 ,max(case when msrmnt = 46 then val else null end) as msrmnt46 ,max(case when msrmnt = 47 then val else null end) as msrmnt47 ,max(case when msrmnt = 48 then val else null end) as msrmnt48 ,max(case when msrmnt = 49 then val else null end) as msrmnt49 ,max(case when msrmnt = 50 then val else null end) as msrmnt50 from sample group by localt, entity limit 1000
在第三次尝试中获得这个:
QUERY PLAN Limit (cost=2257339.69..2270224.77 rows=1000 width=24) (actual time=19795.984..20090.626 rows=1000 loops=1) -> GroupAggregate (cost=2257339.69..5968242.35 rows=288000 width=24) (actual time=19795.983..20090.496 rows=1000 loops=1) -> Sort (cost=2257339.69..2293339.91 rows=14400088 width=24) (actual time=19795.626..19808.820 rows=50001 loops=1) Sort Key: localt Sort Method: external merge Disk: 478568kB -> Seq Scan on sample (cost=0.00..249883.88 rows=14400088 width=24) (actual time=0.013..2245.247 rows=14400000 loops=1) Total runtime: 20197.565 ms
所以,就我的情况而言,迄今为止,交叉表不是解决scheme。 而这只是我有多年的一天。 事实上,尽pipe事实上为实体做了哪些测量是可变的,并且引入了新的测量,但我可能不得不使用宽格式(非标准化)表格。但是我不会在这里进行讨论。
这是我使用Postgres 9.2.3的一些设置:
name setting max_connections 100 shared_buffers 2097152 effective_cache_size 6291456 maintenance_work_mem 1048576 work_mem 262144