正则expression式匹配外括号

我需要一个正则expression式来select两个外括号之间的所有文本。

示例： some text(text here(possible text)text(possible text(more text)))end text

结果:( (text here(possible text)text(possible text(more text)))

我一直在努力几个小时，但是请注意，我的正则expression知识并不是我想要的:-)所以任何帮助都会被感激地收到。

正则expression式是工作的错误工具，因为你正在处理嵌套结构，即recursion。

但有一个简单的algorithm来做到这一点，我在这个答案中描述了一个前面的问题。

你可以使用正则expression式recursion ：

 \(([^()]|(?R))*\)

我想添加这个答案快速参考。随时更新。

.NET Regex使用平衡组。

 \((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)

c用作深度计数器。

演示Regexstorm.com

堆栈溢出 ：使用RegEx来平衡匹配括号
Wes的令人费解的博客 ：与.NET正则expression式匹配的平衡构造
Greg Reinacker的博客 ：正则expression式中的嵌套构造

PCRE使用recursion模式。

 \((?>[^)(]+|(?R))*\)

演示在regex101 ; 或者没有改变：

 \((?>[^)(]*(?R)?)*\)

演示在regex101 。该图案被粘贴在代表(?0) (?R) (?0) 。

Perl，PHP，Notepad ++， R ： perl = TRUE ， Python ：用于Perl行为的正则包 (?V1) 。

Ruby使用子expression式调用。

用Ruby 2.0 \g<0>可以用来调用全模式。

 \((?>[^)(]+|\g<0>)*\)

在Rubular上演示 ; Ruby 1.9仅支持捕获组recursion ：

 (\((?>[^)(]+|\g<1>)*\))

Rubular演示（自Ruby 1.9.3以来的primefaces分组）

JavaScript API :: XRegExp.matchRecursive

 XRegExp.matchRecursive(str, '\\(', '\\)', 'g');

JS，Java和其他正则expression式，无recursion最多2层嵌套：

 \((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)

演示在regex101 。更深的嵌套需要添加到模式。
在不平衡圆括号上失败的速度会降低+量词。

_{参考 – 这个正则expression式是什么意思？}

rexegg.com – recursion正则expression式
Regular-Expressions.info – 正则expression式recursion

 [^\(]*(\(.*\))[^\)]*

[^\(]*匹配在string开头不是左括号的所有内容， (\(.*\))捕获括号内所需的子string， [^\)]*匹配所有不是'在string末尾的右括号。请注意，此expression式不会尝试匹配括号; 一个简单的parsing器（见德曼的答案）会更适合这个。

实际上可以使用.NET正则expression式来实现，但这并不是微不足道的，所以仔细阅读。

你可以在这里阅读一篇不错的文章。您也可能需要阅读.NET正则expression式。你可以从这里开始阅读。

使用尖括号<>是因为它们不需要逸出。

正则expression式如下所示：

 < [^<>]* ( ( (?<Open><) [^<>]* )+ ( (?<Close-Open>>) [^<>]* )+ )* (?(Open)(?!)) >

 (?<=\().*(?=\))

如果你想在两个匹配的圆括号之间select文本，那么运用正则expression式是不可能的。这是不可能的^（*）。

这个正则expression式只是返回string中第一个开始和最后一个结束括号之间的文本。

^（*）除非你的正则expression式引擎具有平衡组或recursion等function 。支持这些function的引擎数量正在慢慢增长，但它们仍然不是常用的。

使用Ruby（1.9.3或更高版本）的正则expression式：

 /(?<match>\((?:\g<match>|[^()]++)*\))/

在rubular上演示

这是最终的正则expression式：

 \( (?<arguments> ( ([^\(\)']*) | (\([^\(\)']*\)) | '(.*?)' )* ) \)

例：

 input: ( arg1, arg2, arg3, (arg4), '(pip' ) output: arg1, arg2, arg3, (arg4), '(pip'

注意'(pip'是正确的stringpipe理。（在调节器： http ： //sourceforge.net/projects/regulator/ ）

答案取决于你是否需要匹配匹配的括号集合，或者仅仅是input文本中的第一个打开到最后一个closures。

如果你需要匹配匹配的嵌套括号，那么你需要比正则expression式更多的东西。 – 见@德曼

如果这只是第一次打开最后closures，请参阅@Zach

决定你想要发生什么：

 abc ( 123 ( foobar ) def ) xyz ) ghij

在这种情况下，您需要确定您的代码需要匹配。

我已经写了一个名为平衡的 JavaScript库来帮助完成这个任务，你可以通过这样做来完成这个任务

 balanced.matches({ source: source, open: '(', close: ')' });

你甚至可以做replace

 balanced.replacements({ source: source, open: '(', close: ')', replace: function (source, head, tail) { return head + source + tail; } });

inheritance人一个更复杂和交互式的例子JSFiddle

所以你需要第一个和最后一个括号，使用像这样str.indexOf（'（'）; – 它会给你第一个发生str.lastIndexOf（'）'）; – 最后一个

所以你需要string之间，String searchingString = str.substring（str1.indexOf（'（'），str1.lastIndexOf（'）'）;

这是一个可自定义的解决scheme，允许在Java中使用单个字符文字分隔符：

 public static List<String> getBalancedSubstrings(String s, Character markStart, Character markEnd, Boolean includeMarkers) { List<String> subTreeList = new ArrayList<String>(); int level = 0; int lastOpenDelimiter = -1; for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if (c == markStart) { level++; if (level == 1) { lastOpenDelimiter = (includeMarkers ? i : i + 1); } } else if (c == markEnd) { if (level == 1) { subTreeList.add(s.substring(lastOpenDelimiter, (includeMarkers ? i + 1 : i))); } if (level > 0) level--; } } return subTreeList; } }

示例用法：

 String s = "some text(text here(possible text)text(possible text(more text)))end text"; List<String> balanced = getBalancedSubstrings(s, '(', ')', true); System.out.println("Balanced substrings:\n" + balanced); // => [(text here(possible text)text(possible text(more text)))]

 """ Here is a simple python program showing how to use regular expressions to write a paren-matching recursive parser. This parser recognises items enclosed by parens, brackets, braces and <> symbols, but is adaptable to any set of open/close patterns. This is where the re package greatly assists in parsing. """ import re # The pattern below recognises a sequence consisting of: # 1. Any characters not in the set of open/close strings. # 2. One of the open/close strings. # 3. The remainder of the string. # # There is no reason the opening pattern can't be the # same as the closing pattern, so quoted strings can # be included. However quotes are not ignored inside # quotes. More logic is needed for that.... pat = re.compile(""" ( .*? ) ( \( | \) | \[ | \] | \{ | \} | \< | \> | \' | \" | BEGIN | END | $ ) ( .* ) """, re.X) # The keys to the dictionary below are the opening strings, # and the values are the corresponding closing strings. # For example "(" is an opening string and ")" is its # closing string. matching = { "(" : ")", "[" : "]", "{" : "}", "<" : ">", '"' : '"', "'" : "'", "BEGIN" : "END" } # The procedure below matches string s and returns a # recursive list matching the nesting of the open/close # patterns in s. def matchnested(s, term=""): lst = [] while True: m = pat.match(s) if m.group(1) != "": lst.append(m.group(1)) if m.group(2) == term: return lst, m.group(3) if m.group(2) in matching: item, s = matchnested(m.group(3), matching[m.group(2)]) lst.append(m.group(2)) lst.append(item) lst.append(matching[m.group(2)]) else: raise ValueError("After <<%s %s>> expected %s not %s" % (lst, s, term, m.group(2))) # Unit test. if __name__ == "__main__": for s in ("simple string", """ "double quote" """, """ 'single quote' """, "one'two'three'four'five'six'seven", "one(two(three(four)five)six)seven", "one(two(three)four)five(six(seven)eight)nine", "one(two)three[four]five{six}seven<eight>nine", "one(two[three{four<five>six}seven]eight)nine", "oneBEGINtwo(threeBEGINfourENDfive)sixENDseven", "ERROR testing ((( mismatched ))] parens"): print "\ninput", s try: lst, s = matchnested(s) print "output", lst except ValueError as e: print str(e) print "done"

正则expression式不能做到这一点。

正则expression式基于称为Finite State Automata (FSA)的计算模型。正如名字所示， FSA只能记住当前状态，没有关于以前状态的信息。

FSA

在上图中，S1和S2是两个状态，其中S1是开始和最后一个步骤。所以如果我们尝试string0110 ，转换过程如下：

  0 1 1 0 -> S1 -> S2 -> S2 -> S2 ->S1

在上面的步骤中，当我们处于第二个S2即0110parsing01之后，FSA没有关于01前面的0 ，因为它只能记住当前状态和下一个input符号。

在上面的问题中，我们需要知道左括号的否定; 这意味着它必须存储在某个地方。但是由于FSAs不能这样做，所以不能写正则expression式。

但是，可以写一个algorithm来实现这个目标。 algorithm通常属于Pushdown Automata (PDA) 。 PDA是FSA一个级别。 PDA有一个额外的堆栈来存储的东西。 PDA可以用来解决上述问题，因为我们可以在堆栈中“ push ”左括号，并在遇到右括号时“ pop ”它们。如果最后，堆栈是空的，则打开括号和右括号匹配。否则不是。

详细的讨论可以在这里find。

正则expression式匹配外括号

phpstorm将array（）表示法replace为短语法

Python正则expression式 – 如何从通配符expression式捕获多个组？

正则expression式来searchGadaffi

为什么.NET中的多行正则expression式不匹配CRLF？

JavaScript：indexOf与searchstring时匹配？

使用正则expression式和JavaScript在html中突出显示单词 – 几乎在那里

如何在JavaScript中validation电子邮件地址？

如何匹配除特定的空白字符之外的任何非空白字符？

如何逃避正则expression式特殊字符使用JavaScript？

Javascriptreplace参考匹配组？