T-SQLselect查询以删除非数字字符
我在可变alpha长度的列中弄脏了数据。 我只是想剥离任何不是0-9的东西。
我不想运行一个函数或过程。 我有一个类似的脚本,只是在文本之后抓取数值,它看起来像这样:
Update TableName set ColumntoUpdate=cast(replace(Columnofdirtydata,'Alpha #','') as int) where Columnofdirtydata like 'Alpha #%' And ColumntoUpdate is Null
我认为这将工作得很好,直到我发现我认为的一些数据字段只是格式Alpha#12345789不是…
需要剥离的数据的例子
AB ABCDE # 123 ABCDE# 123 AB: ABC# 123
我只想要123.所有的数据字段确实都有#号码之前的#号。
我尝试了子string和PatIndex,但我没有得到正确的语法或东西。 任何人有任何build议来解决这个问题的最佳方法?
谢谢!
看到这个博客文章从SQL Server中的string中提取数字。 下面是一个在你的例子中使用string的示例:
DECLARE @textval NVARCHAR(30) SET @textval = 'AB ABCDE # 123' SELECT LEFT(SUBSTRING(@textval, PATINDEX('%[0-9.-]%', @textval), 8000), PATINDEX('%[^0-9.-]%', SUBSTRING(@textval, PATINDEX('%[0-9.-]%', @textval), 8000) + 'X') -1)
你可以使用东西和patindex 。
stuff(Col, 1, patindex('%[0-9]%', Col)-1, '')
SQL小提琴
如果在数字之间可能存在一些字符(例如数千个分隔符),则可以尝试以下方法:
declare @table table (DirtyCol varchar(100)) insert into @table values ('AB ABCDE # 123') ,('ABCDE# 123') ,('AB: ABC# 123') ,('AB#') ,('AB # 1 000 000') ,('AB # 1`234`567') ,('AB # (9)(876)(543)') ;with tally as (select top (100) N=row_number() over (order by @@spid) from sys.all_columns), data as ( select DirtyCol, Col from @table cross apply ( select (select C + '' from (select N, substring(DirtyCol, N, 1) C from tally where N<=datalength(DirtyCol)) [1] where C between '0' and '9' order by N for xml path('')) ) p (Col) where p.Col is not NULL ) select DirtyCol, cast(Col as int) IntCol from data
输出是:
DirtyCol IntCol --------------------- ------- AB ABCDE # 123 123 ABCDE# 123 123 AB: ABC# 123 123 AB # 1 000 000 1000000 AB # 1`234`567 1234567 AB # (9)(876)(543) 9876543
要更新,请添加ColToUpdate
以selectdata
ColToUpdate
列表:
;with num as (...), data as ( select ColToUpdate, /*DirtyCol, */Col from ... ) update data set ColToUpdate = cast(Col as int)
这适用于我:
CREATE FUNCTION [dbo].[StripNonNumerics] ( @Temp varchar(255) ) RETURNS varchar(255) AS Begin Declare @KeepValues as varchar(50) Set @KeepValues = '%[^0-9]%' While PatIndex(@KeepValues, @Temp) > 0 Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '') Return @Temp End
然后像这样调用函数来查看被清理的东西旁边的原始东西:
SELECT Something, dbo.StripNonNumerics(Something) FROM TableA
如果你的服务器支持TRANSLATEfunction(在sql server上可用,在sql server 2017+上也是sql azure),这是一个很好的解决scheme。
首先,它用@字符replace任何非数字字符。 然后,它删除所有的@字符。 您可能需要添加您知道可能出现在TRANSLATE调用的第二个参数中的其他字符。
select REPLACE(TRANSLATE([Col], 'abcdefghijklmnopqrstuvwxyz+()- ,#+', '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'), '@', '')
Declare @MainTable table(id int identity(1,1),TextField varchar(100)) INSERT INTO @MainTable (TextField) VALUES ('6B32E') declare @i int=1 Declare @originalWord varchar(100)='' WHile @i<=(Select count(*) from @MainTable) BEGIN Select @originalWord=TextField from @MainTable where id=@i Declare @r varchar(max) ='', @len int ,@c char(1), @x int = 0 Select @len = len(@originalWord) declare @pn varchar(100)=@originalWord while @x <= @len begin Select @c = SUBSTRING(@pn,@x,1) if(@c!='') BEGIN if ISNUMERIC(@c) = 0 and @c <> '-' BEGIN Select @r = cast(@r as varchar) + cast(replace((SELECT ASCII(@c)-64),'-','') as varchar) end ELSE BEGIN Select @r = @r + @c END END Select @x = @x +1 END Select @r Set @i=@i+1 END
为了补充肯的答案,这将处理逗号,空格和括号
--Handles parentheses, commas, spaces, hyphens.. declare @table table (c varchar(256)) insert into @table values ('This is a test 111-222-3344'), ('Some Sample Text (111)-222-3344'), ('Hello there 111222 3344 / How are you?'), ('Hello there 111 222 3344 ? How are you?'), ('Hello there 111 222 3344. How are you?') select replace(LEFT(SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000), PATINDEX('%[^0-9.-]%', SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000) + 'X') -1),'.','') from @table
这是一个从string中提取所有数字的版本; 即因为I'm 35 years old; I was born in 1982. The average family has 2.4 children.
I'm 35 years old; I was born in 1982. The average family has 2.4 children.
这将返回35198224
。 也就是说,你已经有了可能已经被格式化为代码的数字数据(例如#123,456,789
123-00005
/ 123-00005
),但是如果你想要提取特定数字(例如数字/只是数字字符)从文本。 也只能处理数字; 所以不会返回负号( -
)或句号.
)。
declare @table table (id bigint not null identity (1,1), data nvarchar(max)) insert @table (data) values ('hello 123 its 45613 then') --outputs: 12345613 ,('1 some other string 98 example 4') --outputs: 1984 ,('AB ABCDE # 123') --outputs: 123 ,('ABCDE# 123') --outputs: 123 ,('AB: ABC# 123') --outputs: 123 ; with NonNumerics as ( select id , data original --the below line replaces all digits with blanks , replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(data,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','') nonNumeric from @table ) --each iteration of the below CTE removes another non-numeric character from the original string, putting the result into the numerics column , Numerics as ( select id , replace(original, substring(nonNumeric,1,1), '') numerics , replace(nonNumeric, substring(nonNumeric,1,1), '') charsToreplace , len(replace(nonNumeric, substring(nonNumeric,1,1), '')) charsRemaining from NonNumerics union all select id , replace(numerics, substring(charsToreplace,1,1), '') numerics , replace(charsToreplace, substring(charsToreplace,1,1), '') charsToreplace , len(replace(charsToreplace, substring(charsToreplace,1,1), '')) charsRemaining from Numerics where charsRemaining > 0 ) --we select only those strings with `charsRemaining=0`; ie the rows for which all non-numeric characters have been removed; there should be 1 row returned for every 1 row in the original data set. select * from Numerics where charsRemaining = 0
这段代码的工作原理是从给定的string中删除所有的数字(即我们想要的字符),将它们replace为空格。 然后通过原始string(包括数字)去除所有剩下的字符(即非数字字符),从而只留下数字。
我们这样做的原因是两个步骤,而不是仅仅删除所有非数字字符,只有10个数字,而有很多可能的字符; 所以更换小单比较快; 然后给我们一个实际存在于string中的非数字字符的列表,所以我们可以replace那个小的集合。
该方法使用recursionSQL,使用通用表expression式(CTE)。
我已经为此创build了一个函数
Create FUNCTION RemoveCharacters (@text varchar(30)) RETURNS VARCHAR(30) AS BEGIN declare @index as int declare @newtexval as varchar(30) set @index = (select PATINDEX('%[AZ.-/?]%', @text)) if (@index =0) begin return @text end else begin set @newtexval = (select STUFF ( @text , @index , 1 , '' )) return dbo.RemoveCharacters(@newtexval) end return 0 END GO
答案是:
DECLARE @t TABLE (tVal VARCHAR(100)) INSERT INTO @t VALUES('123') INSERT INTO @t VALUES('123S') INSERT INTO @t VALUES('A123,123') INSERT INTO @t VALUES('a123..A123') ;WITH cte (original, tVal, n) AS ( SELECT t.tVal AS original, LOWER(t.tVal) AS tVal, 65 AS n FROM @t AS t UNION ALL SELECT tVal AS original, CAST(REPLACE(LOWER(tVal), LOWER(CHAR(n)), '') AS VARCHAR(100)), n + 1 FROM cte WHERE n <= 90 ) SELECT t1.tVal AS OldVal, t.tval AS NewVal FROM ( SELECT original, tVal, ROW_NUMBER() OVER(PARTITION BY tVal + original ORDER BY original) AS Sl FROM cte WHERE PATINDEX('%[az]%', tVal) = 0 ) t INNER JOIN @t t1 ON t.original = t1.tVal WHERE t.sl = 1
create function fn_GetNumbersOnly(@pn varchar(100)) Returns varchar(max) AS BEGIN Declare @r varchar(max) ='', @len int ,@c char(1), @x int = 0 Select @len = len(@pn) while @x <= @len begin Select @c = SUBSTRING(@pn,@x,1) if ISNUMERIC(@c) = 1 and @c <> '-' Select @r = @r + @c Select @x = @x +1 end return @r