如何从SQL Server中的string中去除所有非字母字符?
你怎么能删除string中不是字母的所有字符?
那么非字母数字呢?
这是否必须是一个自定义的function,还是有更多一般化的解决scheme?
试试这个function:
Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000)) Returns VarChar(1000) AS Begin Declare @KeepValues as varchar(50) Set @KeepValues = '%[^az]%' While PatIndex(@KeepValues, @Temp) > 0 Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '') Return @Temp End
像这样调用它:
Select dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl')
一旦理解了代码,就会发现将其更改为删除其他字符也相对简单。 你甚至可以使这个dynamic足以通过你的search模式。
希望它有帮助。
G Mastros 真棒答案的参数化版本:
CREATE FUNCTION [dbo].[fn_StripCharacters] ( @String NVARCHAR(MAX), @MatchExpression VARCHAR(255) ) RETURNS NVARCHAR(MAX) AS BEGIN SET @MatchExpression = '%['+@MatchExpression+']%' WHILE PatIndex(@MatchExpression, @String) > 0 SET @String = Stuff(@String, PatIndex(@MatchExpression, @String), 1, '') RETURN @String END
仅按字母顺序:
SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', '^a-z')
仅数字:
SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', '^0-9')
仅限字母数字:
SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', '^a-z0-9')
非字母数字:
SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', 'a-z0-9')
我知道SQL在string操作上很糟糕,但是我不认为这会很困难。 这是一个简单的function,可以去掉string中的所有数字。 会有更好的方法来做到这一点,但这是一个开始。
CREATE FUNCTION dbo.AlphaOnly ( @String varchar(100) ) RETURNS varchar(100) AS BEGIN RETURN ( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( @String, '9', ''), '8', ''), '7', ''), '6', ''), '5', ''), '4', ''), '3', ''), '2', ''), '1', ''), '0', '') ) END GO -- ================== DECLARE @t TABLE ( ColID int, ColString varchar(50) ) INSERT INTO @t VALUES (1, 'abc1234567890') SELECT ColID, ColString, dbo.AlphaOnly(ColString) FROM @t
产量
ColID ColString ----- ------------- --- 1 abc1234567890 abc
第2轮 – 数据驱动的黑名单
-- ============================================ -- Create a table of blacklist characters -- ============================================ IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.CharacterBlacklist')) DROP TABLE dbo.CharacterBlacklist GO CREATE TABLE dbo.CharacterBlacklist ( CharID int IDENTITY, DisallowedCharacter nchar(1) NOT NULL ) GO INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'0') INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'1') INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'2') INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'3') INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'4') INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'5') INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'6') INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'7') INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'8') INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'9') GO -- ==================================== IF EXISTS (SELECT * FROM sys.objects WHERE [object_id] = OBJECT_ID('dbo.StripBlacklistCharacters')) DROP FUNCTION dbo.StripBlacklistCharacters GO CREATE FUNCTION dbo.StripBlacklistCharacters ( @String nvarchar(100) ) RETURNS varchar(100) AS BEGIN DECLARE @blacklistCt int DECLARE @ct int DECLARE @c nchar(1) SELECT @blacklistCt = COUNT(*) FROM dbo.CharacterBlacklist SET @ct = 0 WHILE @ct < @blacklistCt BEGIN SET @ct = @ct + 1 SELECT @String = REPLACE(@String, DisallowedCharacter, N'') FROM dbo.CharacterBlacklist WHERE CharID = @ct END RETURN (@String) END GO -- ==================================== DECLARE @s nvarchar(24) SET @s = N'abc1234def5678ghi90jkl' SELECT @s AS OriginalString, dbo.StripBlacklistCharacters(@s) AS ResultString
产量
OriginalString ResultString ------------------------ ------------ abc1234def5678ghi90jkl abcdefghijkl
我对读者的挑战:你能使这个更高效吗? 怎么使用recursion?
如果你像我一样,没有权限添加function到你的生产数据,但仍然想要执行这种过滤,这里是一个纯粹的SQL解决scheme,使用PIVOT表来重新组合滤波片段。
NB我把表格硬编码为40个字符,如果你有更长的string进行过滤,你将不得不增加更多。
SET CONCAT_NULL_YIELDS_NULL OFF; with ToBeScrubbed as ( select 1 as id, '*SOME 222@ !@* #* BOGUS !@*&! DATA' as ColumnToScrub ), Scrubbed as ( select P.Number as ValueOrder, isnull ( substring ( t.ColumnToScrub , number , 1 ) , '' ) as ScrubbedValue, t.id from ToBeScrubbed t left join master..spt_values P on P.number between 1 and len(t.ColumnToScrub) and type ='P' where PatIndex('%[^az]%', substring(t.ColumnToScrub,P.number,1) ) = 0 ) SELECT id, [1]+ [2]+ [3]+ [4]+ [5]+ [6]+ [7]+ [8] +[9] +[10] + [11]+ [12]+ [13]+ [14]+ [15]+ [16]+ [17]+ [18] +[19] +[20] + [21]+ [22]+ [23]+ [24]+ [25]+ [26]+ [27]+ [28] +[29] +[30] + [31]+ [32]+ [33]+ [34]+ [35]+ [36]+ [37]+ [38] +[39] +[40] as ScrubbedData FROM ( select * from Scrubbed ) src PIVOT ( MAX(ScrubbedValue) FOR ValueOrder IN ( [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40] ) ) pvt
这是一个非常笨重的方式,把所有你不想出来的angular色。 问题是你必须指定你不想要的字符。 如果一个新angular色进入你,它会通过,除非你把它添加到列表中。
好处是你不必创build一个特殊的function。 我没有写权限,所以这使我能够从一个简单的查询运行。
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( p.Name ,'®','') ,'©','') ,'ö','o') ,'ë','e') ,'ä','a') ,'ü','u') ,'ú','u') ,'í','i') ,'ï','i') ,'™','') ,'é','e') ,'²','2') ,'è','e') ,'—','-') ,'–','-') ,'ó','o') ,'•',' ') ,'…','.') ,'ô','o') ,'â','a') ,'á','a') ,'ê','e') ,'è','e') ,''',' ') ,'·',' ') ,'à','a') ,'å','a') ,'ã','a') ,'’',' ') ,'a€s','as') ,'ø','o') ,'ñ','n') ,'î','i') ,'ç','c') ,'Ç','C') ,'Ã','A') ,'”','"') ,'“','"') ,'Á','A') ,'¢','c') ,'Ã','A') ,'Å','A') ,'¶','S') ,'×','x') ,'†','') ,'š','') ,'¤','') ,'µ','') ,'õ','') ,'€','') ,''','') ,'Õ','') ,'ð','') ,'Ò','') ,'¨','') ,'º','') ,'°','') ,'ì','') ,'ƒ','') ,'ÿ','') ,'ß','') ,'«','') ,'»','') ,'Æ','') ,'¬','') ,'Ù','') ,'ý','') ,'û','') ,'|','') as Name
这个解决scheme受Allen先生的解决scheme的启发,需要一个整数的Numbers
表格(如果您想要以良好的性能进行严肃的查询操作,您应该手头有这个表格)。 它不需要CTE。 您可以更改NOT IN (...)
expression式以排除特定字符,或将其更改为IN (...)
或LIKE
expression式以仅保留某些字符。
SELECT ( SELECT SUBSTRING([YourString], N, 1) FROM dbo.Numbers WHERE N > 0 AND N <= CONVERT(INT, LEN([YourString])) AND SUBSTRING([YourString], N, 1) NOT IN ('(',')',',','.') FOR XML PATH('') ) AS [YourStringTransformed] FROM ...
查看了所有给定的解决scheme后,我认为必须有一个纯粹的SQL方法,它不需要函数或CTE / XML查询,也不需要维护嵌套的REPLACE语句。 这是我的解决scheme:
SELECT x ,CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 1, 1) + '%' THEN '' ELSE SUBSTRING(x, 1, 1) END + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 2, 1) + '%' THEN '' ELSE SUBSTRING(x, 2, 1) END + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 3, 1) + '%' THEN '' ELSE SUBSTRING(x, 3, 1) END + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 4, 1) + '%' THEN '' ELSE SUBSTRING(x, 4, 1) END + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 5, 1) + '%' THEN '' ELSE SUBSTRING(x, 5, 1) END + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 6, 1) + '%' THEN '' ELSE SUBSTRING(x, 6, 1) END -- Keep adding rows until you reach the column size AS stripped_column FROM (SELECT column_to_strip AS x ,'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS a FROM my_table) a
这样做的好处是有效字符包含在子查询中的一个string中,以便于重新configuration不同的字符集。
缺点是您必须为每个字符添加一行SQL,直到列的大小为止。 为了使这个任务更简单,我只使用下面的Powershell脚本,如果是VARCHAR(64)这个例子:
1..64 | % { " + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, {0}, 1) + '%' THEN '' ELSE SUBSTRING(x, {0}, 1) END" -f $_ } | clip.exe
这是另一种使用iTVF
删除非字母字符的iTVF
。 首先,你需要一个基于模式的string拆分器。 这里是从Dwain营的文章 :
-- PatternSplitCM will split a string based on a pattern of the form -- supported by LIKE and PATINDEX -- -- Created by: Chris Morris 12-Oct-2012 CREATE FUNCTION [dbo].[PatternSplitCM] ( @List VARCHAR(8000) = NULL ,@Pattern VARCHAR(50) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH numbers AS ( SELECT TOP(ISNULL(DATALENGTH(@List), 0)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n) ) SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)), Item = SUBSTRING(@List,MIN(n),1+MAX(n)-MIN(n)), [Matched] FROM ( SELECT n, y.[Matched], Grouper = n - ROW_NUMBER() OVER(ORDER BY y.[Matched],n) FROM numbers CROSS APPLY ( SELECT [Matched] = CASE WHEN SUBSTRING(@List,n,1) LIKE @Pattern THEN 1 ELSE 0 END ) y ) d GROUP BY [Matched], Grouper
既然你有一个基于模式的分离器,你需要分割匹配模式的string:
[az]
然后连接它们以获得所需的结果:
SELECT * FROM tbl t CROSS APPLY( SELECT Item + '' FROM dbo.PatternSplitCM(t.str, '[az]') WHERE Matched = 1 ORDER BY ItemNumber FOR XML PATH('') ) x (a)
样品
结果:
| Id | str | a | |----|------------------|----------------| | 1 | test“te d'abc | testtedabc | | 2 | anr¤a | anra | | 3 | gs-re-C“te d'ab | gsreCtedab | | 4 | M‚fe, DF | MfeDF | | 5 | R™temd | Rtemd | | 6 | ™jad”ji | jadji | | 7 | Cje y ret¢n | Cjeyretn | | 8 | J™kl™balu | Jklbalu | | 9 | le“ne-iokd | leneiokd | | 10 | liode-Pyr‚n‚ie | liodePyrnie | | 11 | V„s G”ta | VsGta | | 12 | Sƒo Paulo | SoPaulo | | 13 | vAstra gAtaland | vAstragAtaland | | 14 | ¥uble / Bio-Bio | ubleBioBio | | 15 | U“pl™n/ds VAsb-y | UplndsVAsby |
我把这个放在PatIndex被调用的两个地方。
PatIndex('%[^A-Za-z0-9]%', @Temp)
为RemoveNonAlphaCharacters上面的自定义函数并将其重命名为RemoveNonAlphaNumericCharacters
这是一个解决scheme,不需要创build一个函数或列出要replace的所有字符的实例。 它使用recursionWITH语句和PATINDEX来查找不需要的字符。 它将replace列中的所有不需要的字符 – 最多可包含任何给定string中的100个唯一的坏字符。 (EG“ABC123DEF234”将包含4个错误的字符1,2,3和4)100限制是WITH语句允许的最大recursion次数,但是这不会限制要处理的行数,只受限于可用内存。
如果您不想要DISTINCT结果,则可以从代码中删除这两个选项。
-- Create some test data: SELECT * INTO #testData FROM (VALUES ('ABC DEF,Kl(p)'),('123H,J,234'),('ABCD EFG')) as t(TXT) -- Actual query: -- Remove non-alpha chars: '%[^AZ]%' -- Remove non-alphanumeric chars: '%[^A-Z0-9]%' DECLARE @BadCharacterPattern VARCHAR(250) = '%[^AZ]%'; WITH recurMain as ( SELECT DISTINCT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex FROM #testData UNION ALL SELECT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex FROM ( SELECT CASE WHEN BadCharIndex > 0 THEN REPLACE(TXT, SUBSTRING(TXT, BadCharIndex, 1), '') ELSE TXT END AS TXT FROM recurMain WHERE BadCharIndex > 0 ) badCharFinder ) SELECT DISTINCT TXT FROM recurMain WHERE BadCharIndex = 0;
– 首先创build一个function
CREATE FUNCTION [dbo].[GetNumericonly] (@strAlphaNumeric VARCHAR(256)) RETURNS VARCHAR(256) AS BEGIN DECLARE @intAlpha INT SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric) BEGIN WHILE @intAlpha > 0 BEGIN SET @strAlphaNumeric = STUFF(@strAlphaNumeric, @intAlpha, 1, '' ) SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric ) END END RETURN ISNULL(@strAlphaNumeric,0) END
现在调用这个函数
select [dbo].[GetNumericonly]('Abhi12shek23jaiswal')
其结果就像
1223
相信与否,在我的系统中,这个丑陋的function比G马斯特罗的优雅performance更好。
CREATE FUNCTION dbo.RemoveSpecialChar (@s VARCHAR(256)) RETURNS VARCHAR(256) WITH SCHEMABINDING BEGIN IF @s IS NULL RETURN NULL DECLARE @s2 VARCHAR(256) = '', @l INT = LEN(@s), @p INT = 1 WHILE @p <= @l BEGIN DECLARE @c INT SET @c = ASCII(SUBSTRING(@s, @p, 1)) IF @c BETWEEN 48 AND 57 OR @c BETWEEN 65 AND 90 OR @c BETWEEN 97 AND 122 SET @s2 = @s2 + CHAR(@c) SET @p = @p + 1 END IF LEN(@s2) = 0 RETURN NULL RETURN @s2
使用CTE生成的数字表来检查每个字符,然后FOR XML来连接到一个保持值的string,你可以…
CREATE FUNCTION [dbo].[PatRemove]( @pattern varchar(50), @expression varchar(8000) ) RETURNS varchar(8000) AS BEGIN WITH d(d) AS (SELECT d FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) digits(d)), nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM d d1, d d2, d d3, d d4), chars(c) AS (SELECT SUBSTRING(@expression, n, 1) FROM nums WHERE n <= LEN(@expression)) SELECT @expression = (SELECT c AS [text()] FROM chars WHERE c NOT LIKE @pattern FOR XML PATH('')); RETURN @expression; END
DECLARE @vchVAlue NVARCHAR(255) = 'SWP, Lettering Position 1: 4 Ω, 2: 8 Ω, 3: 16 Ω, 4: , 5: , 6: , Voltage Selector, Solder, 6, Step switch, : w/o fuseholder ' WHILE PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))) > 0 BEGIN SELECT @vchVAlue = STUFF(@vchVAlue,PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))),1,' ') END SELECT @vchVAlue
这种方式不适合我,因为我试图保持阿拉伯文字母,我试图取代正则expression式,但也没有工作。 我写了另一种方法在ASCII级别工作,因为这是我唯一的select,它的工作。
Create function [dbo].[RemoveNonAlphaCharacters] (@s varchar(4000)) returns varchar(4000) with schemabinding begin if @s is null return null declare @s2 varchar(4000) set @s2 = '' declare @l int set @l = len(@s) declare @p int set @p = 1 while @p <= @l begin declare @c int set @c = ascii(substring(@s, @p, 1)) if @c between 48 and 57 or @c between 65 and 90 or @c between 97 and 122 or @c between 165 and 253 or @c between 32 and 33 set @s2 = @s2 + char(@c) set @p = @p + 1 end if len(@s2) = 0 return null return @s2 end
走
从性能的angular度来看,我会使用内联函数:
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE FUNCTION [dbo].[udf_RemoveNumericCharsFromString] ( @List NVARCHAR(4000) ) RETURNS TABLE AS RETURN WITH GetNums AS ( SELECT TOP(ISNULL(DATALENGTH(@List), 0)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM (VALUES (0),(0),(0),(0)) d (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n) ) SELECT StrOut = ''+ (SELECT Chr FROM GetNums CROSS APPLY (SELECT SUBSTRING(@List , n,1)) X(Chr) WHERE Chr LIKE '%[^0-9]%' ORDER BY N FOR XML PATH (''),TYPE).value('.','NVARCHAR(MAX)') /*How to Use SELECT StrOut FROM dbo.udf_RemoveNumericCharsFromString ('vv45--9gut') Result: vv--gut */
我已经在标准部分和更新部分中使用这个函数创build了下面的函数和一个更新查询。 它已经清除了TAB和ENTER。
Update Table Set FieldName=dbo.fnClearString(FieldName) WHERE [dbo].[fnClearString](FieldName)<>FieldName CREATE FUNCTION [dbo].[fnClearString] ( @MyString nvarchar(255) ) RETURNS nvarchar(255) AS BEGIN Set @MyString=REPLACE(@MyString, SUBSTRING(@MyString, PATINDEX('%[^a-zA-Z0-9 .]%', @MyString), 1), '') Set @MyString=Replace(@MyString,CHAR(13),'') --Enter Set @MyString=Replace(@MyString,CHAR(9),'') --Tab RETURN @MyString END GO
虽然post有点老了,但我想说一下。 问题我以上的解决scheme是,它不会过滤出像ç,ë,ï等字符。我调整了一个函数如下(我只用了一个80 varcharstring来节省内存):
create FUNCTION dbo.udf_Cleanchars (@InputString varchar(80)) RETURNS varchar(80) AS BEGIN declare @return varchar(80) , @length int , @counter int , @cur_char char(1) SET @return = '' SET @length = 0 SET @counter = 1 SET @length = LEN(@InputString) IF @length > 0 BEGIN WHILE @counter <= @length BEGIN SET @cur_char = SUBSTRING(@InputString, @counter, 1) IF ((ascii(@cur_char) in (32,44,46)) or (ascii(@cur_char) between 48 and 57) or (ascii(@cur_char) between 65 and 90) or (ascii(@cur_char) between 97 and 122)) BEGIN SET @return = @return + @cur_char END SET @counter = @counter + 1 END END RETURN @return END
我只是发现这是内置于Oracle 10g,如果这是你正在使用的。 我不得不去掉所有的特殊字符进行电话号码比较。
regexp_replace(c.phone, '[^0-9]', '')