创build和填充数字表格的最佳方法是什么?
我见过很多不同的方式来创build和填充数字表。 但是,创build和填充它的最佳方式是什么? “最好的”被定义为从最重要的到最不重要的:
- 使用最佳索引创build表
- 行生成速度最快
- 用于创build和填充的简单代码
如果您不知道数字表是什么,请看这里: 为什么我应该考虑使用辅助数字表?
这里是一些从networking上取得的代码示例,以及对这个问题的答案。
对于每种方法,我已经修改了原始代码,以便每个代码使用相同的表和列:NumbersTest和Number,具有10,000行或尽可能接近。 另外,我提供了到原产地的链接。
这里的方法1是一个非常慢的循环方法
平均13.01秒
跑了3次最高,这里是几秒钟的时间:12.42,13.60
DROP TABLE NumbersTest DECLARE @RunDate datetime SET @RunDate=GETDATE() CREATE TABLE NumbersTest(Number INT IDENTITY(1,1)) SET NOCOUNT ON WHILE COALESCE(SCOPE_IDENTITY(), 0) < 100000 BEGIN INSERT dbo.NumbersTest DEFAULT VALUES END SET NOCOUNT OFF -- Add a primary key/clustered index to the numbers table ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number) PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds' SELECT COUNT(*) FROM NumbersTest
方法2在这里是一个更快的循环
平均1.1658秒
跑了11次最高,这里是几秒钟的时间:1.117,1.140,1.203,1.170,1.17.3,1.1516,1.203,1.153,1.17.3,1.170
DROP TABLE NumbersTest DECLARE @RunDate datetime SET @RunDate=GETDATE() CREATE TABLE NumbersTest (Number INT NOT NULL); DECLARE @i INT; SELECT @i = 1; SET NOCOUNT ON WHILE @i <= 10000 BEGIN INSERT INTO dbo.NumbersTest(Number) VALUES (@i); SELECT @i = @i + 1; END; SET NOCOUNT OFF ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number) PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds' SELECT COUNT(*) FROM NumbersTest
方法3这里是一个基于这里代码的单个INSERT
平均488.6毫秒
跑了11次最高,这里是毫秒数:686,673,623,686,343,343,376,360,343,453
DROP TABLE NumbersTest DECLARE @RunDate datetime SET @RunDate=GETDATE() CREATE TABLE NumbersTest (Number int not null) ;WITH Nums(Number) AS (SELECT 1 AS Number UNION ALL SELECT Number+1 FROM Nums where Number<10000 ) insert into NumbersTest(Number) select Number from Nums option(maxrecursion 10000) ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number) PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds' SELECT COUNT(*) FROM NumbersTest
这里的方法4是从这里平均348.3毫秒的“半循环”方法(由于代码中间的“GO”很难获得良好的时序,所以任何build议都将被赞赏)
跑了11次最高,这里是以毫秒为单位的时间:356,360,283,346,360,376,326,373,330,373
DROP TABLE NumbersTest DROP TABLE #RunDate CREATE TABLE #RunDate (RunDate datetime) INSERT INTO #RunDate VALUES(GETDATE()) CREATE TABLE NumbersTest (Number int NOT NULL); INSERT NumbersTest values (1); GO --required INSERT NumbersTest SELECT Number + (SELECT COUNT(*) FROM NumbersTest) FROM NumbersTest GO 14 --will create 16384 total rows ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number) SELECT CONVERT(varchar(20),datediff(ms,RunDate,GETDATE()))+' milliseconds' FROM #RunDate SELECT COUNT(*) FROM NumbersTest
这里的方法5是Philip Kelley的答案中的一个INSERT
平均92.7毫秒
跑了11次最高,这里是以毫秒为单位的时间:80,96,96,93,110,110,80,76,93,93
DROP TABLE NumbersTest DECLARE @RunDate datetime SET @RunDate=GETDATE() CREATE TABLE NumbersTest (Number int not null) ;WITH Pass0 as (select 1 as C union all select 1), --2 rows Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows --I removed Pass5, since I'm only populating the Numbers table to 10,000 Tally as (select row_number() over(order by C) as Number from Pass4) INSERT NumbersTest (Number) SELECT Number FROM Tally WHERE Number <= 10000 ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number) PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds' SELECT COUNT(*) FROM NumbersTest
这里的方法6是来自Mladen Prajdic答案的单个INSERT
平均82.3毫秒
跑了11次最高,这里是毫秒数:80,80,93,76,93,63,93,76,93,76
DROP TABLE NumbersTest DECLARE @RunDate datetime SET @RunDate=GETDATE() CREATE TABLE NumbersTest (Number int not null) INSERT INTO NumbersTest(Number) SELECT TOP 10000 row_number() over(order by t1.number) as N FROM master..spt_values t1 CROSS JOIN master..spt_values t2 ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number); PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds' SELECT COUNT(*) FROM NumbersTest
这里的方法7是基于这里代码的单个INSERT
平均56.3毫秒
跑了11次最高,这里是毫秒的时间:63,50,63,46,60,63,63,46,63,46
DROP TABLE NumbersTest DECLARE @RunDate datetime SET @RunDate=GETDATE() SELECT TOP 10000 IDENTITY(int,1,1) AS Number INTO NumbersTest FROM sys.objects s1 --use sys.columns if you don't get enough rows returned to generate all the numbers you need CROSS JOIN sys.objects s2 --use sys.columns if you don't get enough rows returned to generate all the numbers you need ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number) PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds' SELECT COUNT(*) FROM NumbersTest
看完所有这些方法之后,我真的很喜欢Method 7,它是最快的,代码也相当简单。
我用这个速度太快了:
insert into Numbers(N) select top 1000000 row_number() over(order by t1.number) as N from master..spt_values t1 cross join master..spt_values t2
如果您只是在SQL Server Management Studio或sqlcmd中执行此操作,则可以使用批处理分隔符允许您重复批处理的事实:
CREATE TABLE Number (N INT IDENTITY(1,1) PRIMARY KEY NOT NULL); GO INSERT INTO Number DEFAULT VALUES; GO 100000
这将在Numbers
表中插入100000条logging。
这很慢。 它与KM KM答案中的方法1相比,这是最慢的例子。 但是,它就像代码灯一样。 通过在插入批处理之后添加主键约束,您可以稍稍加快速度。
我从下面的模板开始,这个模板来自于Itzik Ben-Gan的例行印刷:
;WITH Pass0 as (select 1 as C union all select 1), --2 rows Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows Tally as (select row_number() over(order by C) as Number from Pass5) select Number from Tally where Number <= 1000000
“WHERE N <= 1000000”子句将输出限制在1到100万之间,并且可以很容易地调整到所需的范围。
由于这是一个WITH子句,因此可以将其应用到INSERT … SELECT中,如下所示:
-- Sample use: create one million rows CREATE TABLE dbo.Example (ExampleId int not null) DECLARE @RowsToCreate int SET @RowsToCreate = 1000000 -- "Table of numbers" data generator, as per Itzik Ben-Gan (from multiple sources) ;WITH Pass0 as (select 1 as C union all select 1), --2 rows Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows Tally as (select row_number() over(order by C) as Number from Pass5) INSERT Example (ExampleId) select Number from Tally where Number <= @RowsToCreate
索引表build立后,将是索引它的最快方法。
哦,我把它称为“理货”表。 我认为这是一个常见的术语,你可以通过Googlesearchfind大量的技巧和例子。
我使用数字表格主要是在BIRT中进行报告,而不必dynamic创buildlogging集。
我的date也是一样,从过去的10年到今后的10年(以及更详细的报告)。 即使您的“真实”数据表没有数据,也可以获得所有date的值。
我有一个脚本,我用它来创build这些东西,像(这是从内存中):
drop table numbers; commit; create table numbers (n integer primary key); commit; insert into numbers values (0); commit; insert into numbers select n+1 from numbers; commit; insert into numbers select n+2 from numbers; commit; insert into numbers select n+4 from numbers; commit; insert into numbers select n+8 from numbers; commit; insert into numbers select n+16 from numbers; commit; insert into numbers select n+32 from numbers; commit; insert into numbers select n+64 from numbers; commit;
行数加倍,因此生成真正巨大的表格并不需要太多。
我不确定我是否同意你的观点,因为你只创build一次,所以创build速度很重要。 这个成本在所有访问中被分摊,使得这个时间非常微不足道。
一些build议的方法是基于系统对象(例如'sys.objects')。 他们假设这些系统对象包含足够的logging来生成我们的数字。
我不会基于任何不属于我的应用程序的东西,而我不能完全控制它。 例如:这些sys表的内容可能会改变,在SQL的新版本中这些表可能不再有效。
作为一个解决scheme,我们可以用logging创build我们自己的表格。 然后,我们使用那个而不是这些系统相关的对象(如果我们知道范围,否则我们可以去做交叉连接)。
基于CTE的解决scheme工作正常,但它有嵌套循环相关的限制。
对于任何正在寻找Azure解决scheme的人
SET NOCOUNT ON CREATE TABLE Numbers (n bigint PRIMARY KEY) GO DECLARE @numbers table(number int); WITH numbers(number) as ( SELECT 1 AS number UNION all SELECT number+1 FROM numbers WHERE number<10000 ) INSERT INTO @numbers(number) SELECT number FROM numbers OPTION(maxrecursion 10000) INSERT INTO Numbers(n) SELECT number FROM @numbers
来自sql azure团队博客http://azure.microsoft.com/blog/2010/09/16/create-a-numbers-table-in-sql-azure/
这里有几个额外的方法:
方法1
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL DROP TABLE dbo.Numbers GO CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY); GO DECLARE @i int = 1; INSERT INTO dbo.Numbers (Number) VALUES (1),(2); WHILE 2*@i < 1048576 BEGIN INSERT INTO dbo.Numbers (Number) SELECT Number + 2*@i FROM dbo.Numbers; SET @i = @@ROWCOUNT; END GO SELECT COUNT(*) FROM Numbers AS RowCownt --1048576 rows
方法2
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL DROP TABLE dbo.Numbers GO CREATE TABLE dbo.Numbers (Number int NOT NULL PRIMARY KEY); GO DECLARE @i INT = 0; INSERT INTO dbo.Numbers (Number) VALUES (1); WHILE @i <= 9 BEGIN INSERT INTO dbo.Numbers (Number) SELECT N.Number + POWER(4, @i) * D.Digit FROM dbo.Numbers AS N CROSS JOIN (VALUES(1),(2),(3)) AS D(Digit) ORDER BY D.Digit, N.Number SET @i = @i + 1; END GO SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows
方法3
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL DROP TABLE dbo.Numbers GO CREATE TABLE Numbers (Number int identity NOT NULL PRIMARY KEY, T bit NULL); WITH T1(T) AS (SELECT T FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS T(T)) --10 rows ,T2(T) AS (SELECT AT FROM T1 AS A CROSS JOIN T1 AS B CROSS JOIN T1 AS C) --1,000 rows ,T3(T) AS (SELECT AT FROM T2 AS A CROSS JOIN T2 AS B CROSS JOIN T2 AS C) --1,000,000,000 rows INSERT INTO dbo.Numbers(T) SELECT TOP (1048576) NULL FROM T3; ALTER TABLE Numbers DROP COLUMN T; GO SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows
方法4 ,取自Alex Kuznetsov的“ 防御性数据库编程”一书
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL DROP TABLE dbo.Numbers GO CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY); GO DECLARE @i INT = 1 ; INSERT INTO dbo.Numbers (Number) VALUES (1); WHILE @i < 524289 --1048576 BEGIN; INSERT INTO dbo.Numbers (Number) SELECT Number + @i FROM dbo.Numbers; SET @i = @i * 2 ; END GO SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows
方法5取自Erland Sommarskog 在SQL Server 2005和更高版本中的数组和列表
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL DROP TABLE dbo.Numbers GO CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY); GO WITH digits (d) AS ( SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 0) INSERT INTO Numbers (Number) SELECT Number FROM (SELECT id + ii.d * 10 + iii.d * 100 + iv.d * 1000 + vd * 10000 + vi.d * 100000 AS Number FROM digits i CROSS JOIN digits ii CROSS JOIN digits iii CROSS JOIN digits iv CROSS JOIN digits v CROSS JOIN digits vi) AS Numbers WHERE Number > 0 GO SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --999999 rows
概要:
在这5种方法中,方法3似乎是最快的。
以下是我使用SQL Server 2008中引入的表值构造函数创build的一个简短快速的内存中解决scheme:
--1,000,000 rows. Either add/remove CROSS JOINs, or use TOP clause to modify this ;WITH v AS (SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(z)) SELECT N FROM (SELECT ROW_NUMBER() OVER (ORDER BY v1.z)-1 N FROM v v1 CROSS JOIN v v2 CROSS JOIN v v3 CROSS JOIN v v4 CROSS JOIN v v5 CROSS JOIN v v6) Nums
请注意,这可以快速计算,或者(甚至更好)存储在永久表中(只需在SELECT N
段之后添加INTO
子句),并使用N
字段上的主键来提高效率。