如何在string.replace中input正则expression式？

我需要一些关于声明正则expression式的帮助。我的input如下所示：

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>

所需的输出是：

 this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. and there are many other lines in the txt files with such tags

我试过这个：

 #!/usr/bin/python import os, sys, re, glob for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')): for line in reader: line2 = line.replace('<[1> ', '') line = line2.replace('</[1> ', '') line2 = line.replace('<[1>', '') line = line2.replace('</[1>', '') print line

我也试过这个（但似乎我使用错误的正则expression式语法）：

  line2 = line.replace('<[*> ', '') line = line2.replace('</[*> ', '') line2 = line.replace('<[*>', '') line = line2.replace('</[*>', '')

我不想硬编码replace从1到99。。。

这个testing片段应该这样做：

 import re line = re.sub(r"</?\[\d+>", "", line)

编辑：这是一个注释版本，解释它是如何工作的：

 line = re.sub(r""" (?x) # Use free-spacing mode. < # Match a literal '<' /? # Optionally match a '/' \[ # Match a literal '[' \d+ # Match one or more digits > # Match a literal '>' """, "", line)

正则expression式很有趣！ 但我强烈build议花一两个小时来学习基础知识。对于初学者来说，你需要知道哪些字符是特殊的：需要转义的“元字符” （即在前面放置一个反斜杠 – 规则在字符类内部和外部是不同的）。有一个很好的在线教程： www .regular-expressions.info 。你花在那里的时间会多次为自己付出。快乐的regexing！

str.replace()做了固定的replace。使用re.sub()来代替。

我会这样去（正则expression式在评论中解释）：

 import re # If you need to use the regex more than once it is suggested to compile it. pattern = re.compile(r"</{0,}\[\d+>") # <\/{0,}\[\d+> # # Match the character “<” literally «<» # Match the character “/” literally «\/{0,}» # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}» # Match the character “[” literally «\[» # Match a single digit 0..9 «\d+» # Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» # Match the character “>” literally «>» subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>""" result = pattern.sub("", subject) print(result)

如果你想了解更多关于正则expression式，我build议阅读Jan Goyvaerts和Steven Levithan的正则expression式食谱。

最简单的方法

 import re txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>' out = re.sub("(<[^>]+>)", '', txt) print out

string对象的replace方法不接受正则expression式，只接受固定string（参见文档： http : //docs.python.org/2/library/stdtypes.html#str.replace ）。

你必须使用re模块：

 import re newline= re.sub("<\/?\[[0-9]+>", "", line)

不必使用正则expression式（对于您的示例string）

 >>> s 'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n' >>> for w in s.split(">"): ... if "<" in w: ... print w.split("<")[0] ... this is a paragraph with in between and then there are cases ... where the number ranges from 1-100 . and there are many other lines in the txt files with such tags

如何在string.replace中input正则expression式？

正则expression式匹配有效的IPv6地址

如何在GREP，REGEX或PERL模式下提取string

十进制数字正则expression式，其中十进制数字是可选的

这个正则expression式如何find三angular形数字？

Python中的expression式和语句有什么区别？

正则expression式的zip代码

jQuery中的正则expression式字段validation

JavaScript的正则expression式 – 看看后面的替代？

在Java中使用REGEXparsingXML

我怎样才能认出一个邪恶的正则expression式？