在Oracle中将string拆分为多行
我知道这已经在一定程度上用PHP和MYSQL来回答,但是我想知道是否有人能够教我在Oracle 10g(最好)和11g中将一个string(逗号分隔)分成多行的最简单的方法。
表格如下:
Name | Project | Error 108 test Err1, Err2, Err3 109 test2 Err1
我想创build以下内容:
Name | Project | Error 108 Test Err 1 108 Test Err 2 108 Test Err 3 109 Test2 Err1
我已经看到了一些有关堆栈的潜在解决scheme,但它们只占一列(以逗号分隔的string)。 任何帮助将不胜感激。
使用大型数据集时,接受的答案性能较差。
这可能是一个改进的方式(也与正则expression式和连接):
with temp as ( select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual union all select 109, 'test2', 'Err1' from dual ) select distinct t.name, t.project, trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) as error from temp t, table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels order by name
正则expression式是一个奇妙的事情:)
with temp as ( select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual union all select 109, 'test2', 'Err1' from dual ) SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str FROM (SELECT Name, Project, Error str FROM temp) t CONNECT BY instr(str, ',', 1, level - 1) > 0 order by Name
以下两点有很大的区别:
- 分割单个分隔string
- 在表中分割多行的分隔string。
如果不限制行,那么CONNECT BY子句将产生多行 ,不会给出所需的输出。
- 对于单个分隔string,请查看将单个逗号分隔的string拆分为多行
- 为了在表格中拆分分隔的string,请查看在表格中拆分逗号分隔的string
除了正则expression式之外 ,还有其他一些替代方法正在使用:
- XMLTable的
- MODEL子句
build立
SQL> CREATE TABLE t ( 2 ID NUMBER GENERATED ALWAYS AS IDENTITY, 3 text VARCHAR2(100) 4 ); Table created. SQL> SQL> INSERT INTO t (text) VALUES ('word1, word2, word3'); 1 row created. SQL> INSERT INTO t (text) VALUES ('word4, word5, word6'); 1 row created. SQL> INSERT INTO t (text) VALUES ('word7, word8, word9'); 1 row created. SQL> COMMIT; Commit complete. SQL> SQL> SELECT * FROM t; ID TEXT ---------- ---------------------------------------------- 1 word1, word2, word3 2 word4, word5, word6 3 word7, word8, word9 SQL>
使用XMLTABLE :
SQL> SELECT id, 2 trim(COLUMN_VALUE) text 3 FROM t, 4 xmltable(('"' 5 || REPLACE(text, ',', '","') 6 || '"')) 7 / ID TEXT ---------- ------------------------ 1 word1 1 word2 1 word3 2 word4 2 word5 2 word6 3 word7 3 word8 3 word9 9 rows selected. SQL>
使用MODEL子句:
SQL> WITH 2 model_param AS 3 ( 4 SELECT id, 5 text AS orig_str , 6 ',' 7 || text 8 || ',' AS mod_str , 9 1 AS start_pos , 10 Length(text) AS end_pos , 11 (Length(text) - Length(Replace(text, ','))) + 1 AS element_count , 12 0 AS element_no , 13 ROWNUM AS rn 14 FROM t ) 15 SELECT id, 16 trim(Substr(mod_str, start_pos, end_pos-start_pos)) text 17 FROM ( 18 SELECT * 19 FROM model_param MODEL PARTITION BY (id, rn, orig_str, mod_str) 20 DIMENSION BY (element_no) 21 MEASURES (start_pos, end_pos, element_count) 22 RULES ITERATE (2000) 23 UNTIL (ITERATION_NUMBER+1 = element_count[0]) 24 ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1, 25 end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) ) 26 ) 27 WHERE element_no != 0 28 ORDER BY mod_str , 29 element_no 30 / ID TEXT ---------- -------------------------------------------------- 1 word1 1 word2 1 word3 2 word4 2 word5 2 word6 3 word7 3 word8 3 word9 9 rows selected. SQL>
还有更多相同的例子:
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab FROM dual CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1 / SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab FROM dual CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1 /
此外,可以使用DBMS_UTILITY.comma_to_table&table_to_comma: http : //www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table
我想提出一个使用PIPELINED表函数的不同方法。 这有点类似于XMLTABLE的技术,除了你提供你自己的自定义函数来分割string:
-- Create a collection type to hold the results CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30); / -- Split the string according to the specified delimiter CREATE OR REPLACE FUNCTION str2tbl ( p_string VARCHAR2, p_delimiter CHAR DEFAULT ',' ) RETURN typ_str2tbl_nst PIPELINED AS l_tmp VARCHAR2(32000) := p_string || p_delimiter; l_pos NUMBER; BEGIN LOOP l_pos := INSTR( l_tmp, p_delimiter ); EXIT WHEN NVL( l_pos, 0 ) = 0; PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) ); l_tmp := SUBSTR( l_tmp, l_pos+1 ); END LOOP; END str2tbl; / -- The problem solution SELECT name, project, TRIM(COLUMN_VALUE) error FROM t, TABLE(str2tbl(error));
结果:
NAME PROJECT ERROR ---------- ---------- -------------------- 108 test Err1 108 test Err2 108 test Err3 109 test2 Err1
这种方法的问题是,优化器通常不会知道表函数的基数,所以不得不猜测。 这可能会对您的执行计划造成潜在的危害,因此可以扩展此解决scheme以提供优化程序的执行统计信息。
您可以通过在上面的查询上运行EXPLAIN PLAN来看到这个优化器的估计值:
Execution Plan ---------------------------------------------------------- Plan hash value: 2402555806 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 16336 | 366K| 59 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 16336 | 366K| 59 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 | | 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 8168 | 16336 | 28 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------
即使集合只有3个值,优化程序也会估计8168行(默认值)。 这看起来似乎无关紧要,但对于优化者来说,决定一个次优的计划可能已经足够了。
解决scheme是使用优化器扩展为集合提供统计信息:
-- Create the optimizer interface to the str2tbl function CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT ( dummy NUMBER, STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList ) RETURN NUMBER, STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo, p_stats OUT SYS.ODCITabFuncStats, p_args IN SYS.ODCIArgDescList, p_string IN VARCHAR2, p_delimiter IN CHAR DEFAULT ',' ) RETURN NUMBER ); / -- Optimizer interface implementation CREATE OR REPLACE TYPE BODY typ_str2tbl_stats AS STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList ) RETURN NUMBER AS BEGIN p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') ); RETURN ODCIConst.SUCCESS; END ODCIGetInterfaces; -- This function is responsible for returning the cardinality estimate STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo, p_stats OUT SYS.ODCITabFuncStats, p_args IN SYS.ODCIArgDescList, p_string IN VARCHAR2, p_delimiter IN CHAR DEFAULT ',' ) RETURN NUMBER AS BEGIN -- I'm using basically half the string lenght as an estimator for its cardinality p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) ); RETURN ODCIConst.SUCCESS; END ODCIStatsTableFunction; END; / -- Associate our optimizer extension with the PIPELINED function ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats;
testing生成的执行计划:
Execution Plan ---------------------------------------------------------- Plan hash value: 2402555806 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 23 | 59 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 1 | 23 | 59 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 | | 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 1 | 2 | 28 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------
正如你所看到的,上面的计划的基数不再是8196的猜测值。 这仍然是不正确的,因为我们传递一个列而不是一个string文字到函数。
在这个特定的情况下,对function代码进行一些调整是很有必要的,但是我认为这里的整体概念已经被很好地解释了。
此答案中使用的str2tbl函数最初是由Tom Kyte开发的: https ://asktom.oracle.com/pls/asktom/f?p =100 :11:0::::P11_QUESTION_ID:110612348061
关联统计和对象types的概念可以通过阅读这篇文章进一步探讨: http : //www.oracle-developer.net/display.php?id=427
这里描述的技术在10g +中工作。
直到Oracle 11i之前,才添加REGEXP_COUNT。 这是从Art的解决scheme中采用的Oracle 10g解决scheme。
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab FROM dual CONNECT BY LEVEL <= LENGTH('Err1, Err2, Err3') - LENGTH(REPLACE('Err1, Err2, Err3', ',', '')) + 1;
我认为我通过和正则expression式函数连接的最佳方式
select regexp_substr(Error ,'[^,]+', 1, level) from your_table connect by regexp_substr(Error , '[^,]+', 1, level) is not null;
资源
不使用连接或正则expression式 :
with mytable as ( select 108 name, 'test' project, 'Err1,Err2,Err3' error from dual union all select 109, 'test2', 'Err1' from dual ) ,x as ( select name ,project ,','||error||',' error from mytable ) ,iter as (SELECT rownum AS pos FROM all_objects ) select x.name,x.project ,SUBSTR(x.error ,INSTR(x.error, ',', 1, iter.pos) + 1 ,INSTR(x.error, ',', 1, iter.pos + 1)-INSTR(x.error, ',', 1, iter.pos)-1 ) error from x, iter where iter.pos < = (LENGTH(x.error) - LENGTH(REPLACE(x.error, ','))) - 1;
我想添加另一种方法。 这一个使用recursion查询,我没有看到其他的答案。 自11gR2以来,它得到了Oracle的支持。
with cte0 as ( select phone_number x from hr.employees ), cte1(xstr,xrest,xremoved) as ( select x, x, null from cte0 union all select xstr, case when instr(xrest,'.') = 0 then null else substr(xrest,instr(xrest,'.')+1) end, case when instr(xrest,'.') = 0 then xrest else substr(xrest,1,instr(xrest,'.') - 1) end from cte1 where xrest is not null ) select xstr, xremoved from cte1 where xremoved is not null order by xstr
分裂的angular色非常灵活。 只需在INSTR
调用中更改它。
我已经使用DBMS_UTILITY.comma_to _table函数实际上它的工作代码如下
declare l_tablen BINARY_INTEGER; l_tab DBMS_UTILITY.uncl_array; cursor cur is select * from qwer; rec cur%rowtype; begin open cur; loop fetch cur into rec; exit when cur%notfound; DBMS_UTILITY.comma_to_table ( list => rec.val, tablen => l_tablen, tab => l_tab); FOR i IN 1 .. l_tablen LOOP DBMS_OUTPUT.put_line(i || ' : ' || l_tab(i)); END LOOP; end loop; close cur; end;
我用我自己的表和列的名字
CREATE FUNCTION dbo.BreakStringIntoRows (@CommadelimitedString varchar(1000)) RETURNS @Result TABLE (Column1 VARCHAR(100)) AS BEGIN DECLARE @IntLocation INT WHILE (CHARINDEX(',', @CommadelimitedString, 0) > 0) BEGIN SET @IntLocation = CHARINDEX(',', @CommadelimitedString, 0) INSERT INTO @Result (Column1) --LTRIM and RTRIM to ensure blank spaces are removed SELECT RTRIM(LTRIM(SUBSTRING(@CommadelimitedString, 0, @IntLocation))) SET @CommadelimitedString = STUFF(@CommadelimitedString, 1, @IntLocation, '') END INSERT INTO @Result (Column1) SELECT RTRIM(LTRIM(@CommadelimitedString))--LTRIM and RTRIM to ensure blank spaces are removed RETURN END GO --Using the UDF to convert comma separated values into rows SELECT * FROM dbo.BreakStringIntoRows('Apple,Banana,Orange') SELECT * FROM dbo.BreakStringIntoRows('Apple , Banana, Orange')