我怎样才能用SQL中的随机数填充一列? 我在每一行都得到相同的价值
UPDATE CattleProds SET SheepTherapy=(ROUND((RAND()* 10000),0)) WHERE SheepTherapy IS NULL
如果我然后做一个SELECT我看到我的随机数在每一行是相同的 。 任何想法如何产生独特的随机数字?
而不是rand()
,使用newid()
,这是重新计算结果中的每一行。 通常的方法是使用校验和的模数。 请注意, checksum(newid())
可能会产生-2,147,483,648并导致abs()
上的整数溢出,因此我们需要在校验和返回值上使用模数,然后将其转换为绝对值。
UPDATE CattleProds SET SheepTherapy = abs(checksum(NewId()) % 10000) WHERE SheepTherapy IS NULL
这会产生一个0到9999之间的随机数。
如果你在SQL Server 2008上,你也可以使用
CRYPT_GEN_RANDOM(2) % 10000
这似乎比较简单(它也是每行评估一次,因为newid
是 – 如下所示)
DECLARE @foo TABLE (col1 FLOAT) INSERT INTO @foo SELECT 1 UNION SELECT 2 UPDATE @foo SET col1 = CRYPT_GEN_RANDOM(2) % 10000 SELECT * FROM @foo
返回(2个随机可能不同的数字)
col1 ---------------------- 9693 8573
仔细研究这个无法解释的问题,我能想到的唯一正当理由是,因为产生的随机数在0-65535之间,不能被10,000整除,所以一些数字会略微超出代表数。 解决这个问题的方法是将其包装在一个标量UDF中,该标量UDF丢弃超过60,000的任何数字,并recursion调用自己以获取replace数字。
CREATE FUNCTION dbo.RandomNumber() RETURNS INT AS BEGIN DECLARE @Result INT SET @Result = CRYPT_GEN_RANDOM(2) RETURN CASE WHEN @Result < 60000 OR @@NESTLEVEL = 32 THEN @Result % 10000 ELSE dbo.RandomNumber() END END
虽然我喜欢使用CHECKSUM,但我觉得更好的方法是使用NEWID(),只是因为你不需要通过复杂的math来生成简单的数字。
ROUND( 1000 *RAND(convert(varbinary, newid())), 0)
你可以用你想设置的数字来代替1000,你可以用加号来创build一个范围,假设你想要一个100到200之间的随机数,你可以这样做:
100 + ROUND( 100 *RAND(convert(varbinary, newid())), 0)
把它放在你的查询中:
UPDATE CattleProds SET SheepTherapy= ROUND( 1000 *RAND(convert(varbinary, newid())), 0) WHERE SheepTherapy IS NULL
我testing了两套基于RAND()的随机方法,每个方法产生100,000,000行。 为了平整字段,输出是一个0-1之间的浮点数以模仿RAND()。 大部分代码都在testing基础架构,所以我总结了这里的algorithm:
-- Try #1 used (CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val -- Try #2 used RAND(Checksum(NewId())) -- and to have a baseline to compare output with I used RAND() -- this required executing 100000000 separate insert statements
使用CRYPT_GEN_RANDOM显然是最随机的,因为当从一组10 ^ 18个数字中选取10 ^ 8个数字时,只有一个.000000001%的机会看到甚至是1个重复。 IOW我们不应该看到任何重复,这没有! 这个集合花了44秒在我的笔记本电脑上生成。
Cnt Pct ----- ---- 1 100.000000 --No duplicates
SQL Server执行时间:CPU时间= 134795毫秒,经过时间= 39274毫秒。
IF OBJECT_ID('tempdb..#T0') IS NOT NULL DROP TABLE #T0; GO WITH L0 AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c)) -- 2^4 ,L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B) -- 2^8 ,L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B) -- 2^16 ,L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B) -- 2^32 SELECT TOP 100000000 (CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val INTO #T0 FROM L3; WITH x AS ( SELECT Val,COUNT(*) Cnt FROM #T0 GROUP BY Val ) SELECT x.Cnt,COUNT(*)/(SELECT COUNT(*)/100 FROM #T0) Pct FROM X GROUP BY x.Cnt;
在随机的情况下,这个方法的速度差不多快了15倍,只用了23秒就能生成1亿个数字。
Cnt Pct ---- ---- 1 95.450254 -- only 95% unique is absolutely horrible 2 02.222167 -- If this line were the only problem I'd say DON'T USE THIS! 3 00.034582 4 00.000409 -- 409 numbers appeared 4 times 5 00.000006 -- 6 numbers actually appeared 5 times
SQL Server执行时间:CPU时间= 77156毫秒,经过时间= 24613毫秒。
IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1; GO WITH L0 AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c)) -- 2^4 ,L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B) -- 2^8 ,L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B) -- 2^16 ,L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B) -- 2^32 SELECT TOP 100000000 RAND(Checksum(NewId())) AS Val INTO #T1 FROM L3; WITH x AS ( SELECT Val,COUNT(*) Cnt FROM #T1 GROUP BY Val ) SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T1) Pct FROM X GROUP BY x.Cnt;
RAND()本身对于基于集合的生成是没有用的,因此生成用于比较随机性的基线花费了6个多小时,并且必须重新启动几次才能最终得到正确数量的输出行。 它也似乎是随机性留下了很多不尽人意的地方,虽然它比使用校验和(newid())来重新设置每一行更好。
Cnt Pct ---- ---- 1 99.768020 2 00.115840 3 00.000100 -- at least there were comparitively few values returned 3 times
由于重新启动,无法捕获执行时间。
IF OBJECT_ID('tempdb..#T2') IS NOT NULL DROP TABLE #T2; GO CREATE TABLE #T2 (Val FLOAT); GO SET NOCOUNT ON; GO INSERT INTO #T2(Val) VALUES(RAND()); GO 100000000 WITH x AS ( SELECT Val,COUNT(*) Cnt FROM #T2 GROUP BY Val ) SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T2) Pct FROM X GROUP BY x.Cnt;