Pythonreplace多个string
我想使用.replace函数来replace多个string。
我现在有
string.replace("condition1", "")
但想要有类似的东西
string.replace("condition1", "").replace("condition2", "text")
虽然这不觉得好的语法
什么是正确的方法来做到这一点? 有点像在grep / regex中你可以做\1
和\2
来replace某些searchstring的字段
这里是一个简短的例子,应该用正则expression式来做这个事情:
import re rep = {"condition1": "", "condition2": "text"} # define desired replacements here # use these three lines to do the replacement rep = dict((re.escape(k), v) for k, v in rep.iteritems()) pattern = re.compile("|".join(rep.keys())) text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
例如:
>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--") '() and --text--'
你可以做一个漂亮的小循环函数。
def replace_all(text, dic): for i, j in dic.iteritems(): text = text.replace(i, j) return text
其中text
是完整的string, dic
是一个字典 – 每个定义是一个string,将replace一个匹配的术语。
注意 :在Python 3中, iteritems()
已经被items()
小心:请注意,这个答案要求:
- 订单是不相关的每个替代品
- 每次更换都可以更改以前replace的结果
这是因为python字典没有可靠的迭代顺序。
例如,如果一本字典有:
{“猫”:“狗”,“狗”:“猪”}
和string是:
“这是我的猫,这是我的狗。”
我们不一定知道首先使用哪个字典条目,结果是否是:
“这是我的猪,这是我的猪。”
要么
“这是我的狗,这是我的猪。”
记住text
string有多大以及字典中有多less对是有效的。
这是使用reduce的第一个解决scheme的一个变体,如果你喜欢function。 🙂
repls = {'hello' : 'goodbye', 'world' : 'earth'} s = 'hello, world' reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)
马丁尼的更好的版本:
repls = ('hello', 'goodbye'), ('world', 'earth') s = 'hello, world' reduce(lambda a, kv: a.replace(*kv), repls, s)
我build立了这个FJs优秀的答案:
import re def multiple_replacer(*key_values): replace_dict = dict(key_values) replacement_function = lambda match: replace_dict[match.group(0)] pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M) return lambda string: pattern.sub(replacement_function, string) def multiple_replace(string, *key_values): return multiple_replacer(*key_values)(string)
一次性用法:
>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love") >>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements) Do you love tea? No, I prefer café.
请注意,由于更换只需一次完成,“café”变为“tea”,但不会变回“café”。
如果您需要多次进行相同的更换,则可以轻松地创buildreplacefunction:
>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t')) >>> many_many_strings = (u'This text will be escaped by "my_escaper"', u'Does this work?\tYes it does', u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"') >>> for line in many_many_strings: ... print my_escaper(line) ... This text will be escaped by \"my_escaper\" Does this work?\tYes it does And can we span multiple lines?\t\"Yes\twe\tcan!\"
改进:
- 把代码转换成一个函数
- 增加了多线支持
- 修正了逃跑中的错误
- 容易为特定的多个replace创buildfunction
请享用! 🙂
这只是对FJ和MiniQuark最好的答案的简要回顾。 所有你需要实现多个同时stringreplace是以下function:
import re def multiple_replace(string, rep_dict): pattern = re.compile("|".join([re.escape(k) for k in rep_dict.keys()]), re.M) return pattern.sub(lambda x: rep_dict[x.group(0)], string)
用法:
>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'}) 'Do you prefer tea? No, I prefer cafe.'
如果你愿意,你可以从这个简单的开始,做出你自己的专用replace函数。
我想提出使用string模板。 只需将string放在一个字典中,全部设置! 来自docs.python.org的示例
>>> from string import Template >>> s = Template('$who likes $what') >>> s.substitute(who='tim', what='kung pao') 'tim likes kung pao' >>> d = dict(who='tim') >>> Template('Give $who $100').substitute(d) Traceback (most recent call last): [...] ValueError: Invalid placeholder in string: line 1, col 10 >>> Template('$who likes $what').substitute(d) Traceback (most recent call last): [...] KeyError: 'what' >>> Template('$who likes $what').safe_substitute(d) 'tim likes $what'
我需要一个解决scheme,其中要replace的string可以是正则expression式,例如,通过用一个空白字符replace多个空格字符来帮助正常化长文本。 build立在包括MiniQuark和mmj在内的其他一系列答案的基础上,我想到了这一点:
def multiple_replace(string, reps, re_flags = 0): """ Transforms string, replacing keys from re_str_dict with values. reps: dictionary, or list of key-value pairs (to enforce ordering; earlier items have higher priority). Keys are used as regular expressions. re_flags: interpretation of regular expressions, such as re.DOTALL """ if isinstance(reps, dict): reps = reps.items() pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0]) for i, re_str in enumerate(reps)), re_flags) return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
它适用于其他答案中给出的例子,例如:
>>> multiple_replace("(condition1) and --condition2--", ... {"condition1": "", "condition2": "text"}) '() and --text--' >>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'}) 'goodbye, earth' >>> multiple_replace("Do you like cafe? No, I prefer tea.", ... {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'}) 'Do you prefer tea? No, I prefer cafe.'
对我来说最重要的是你也可以使用正则expression式,例如,只replace整个单词,或者规范空白:
>>> s = "I don't want to change this name:\n Philip II of Spain" >>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '} >>> multiple_replace(s, re_str_dict) "You don't want to change this name: Philip II of Spain"
如果你想使用字典键作为普通string,你可以在使用例如这个函数调用multiple_replace之前转义那些字典键:
def escape_keys(d): """ transform dictionary d by applying re.escape to the keys """ return dict((re.escape(k), v) for k, v in d.items()) >>> multiple_replace(s, escape_keys(re_str_dict)) "I don't want to change this name:\n Philip II of Spain"
以下函数可以帮助您在字典键中find错误的正则expression式(因为来自multiple_replace的错误信息不是很明显):
def check_re_list(re_list): """ Checks if each regular expression in list is well-formed. """ for i, e in enumerate(re_list): try: re.compile(e) except (TypeError, re.error): print("Invalid regular expression string " "at position {}: '{}'".format(i, e)) >>> check_re_list(re_str_dict.keys())
请注意,它不链接replace,而是同时执行它们。 这使得它更有效率,而不会限制它可以做什么。 为了模仿链接的效果,您可能只需要添加更多的stringreplace对,并确保对的预期sorting:
>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"}) 'mutton' >>> multiple_replace("button", [("button", "lamb"), ... ("but", "mut"), ("mutton", "lamb")]) 'lamb'
就我而言,我需要一个简单的名称replace键,所以我想到了这一点:
a = 'this is a test string' b = {'i': 'Z', 's': 'Y'} for x,y in b.items(): a = a.replace(x, y) >>> a 'thZY ZY a teYt YtrZng'
在这里我的$ 0.02。 这是基于安德鲁·克拉克的答案,只是更清楚一点,它也涵盖了一个情况,当一个stringreplace是另一个stringreplace的子string(较长的string胜利)
def multireplace(string, replacements): """ Given a string and a replacement map, it returns the replaced string. :param str string: string to execute replacements on :param dict replacements: replacement dictionary {value to find: value to replace} :rtype: str """ # Place longer ones first to keep shorter substrings from matching # where the longer ones should take place # For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against # the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc' substrs = sorted(replacements, key=len, reverse=True) # Create a big OR regex that matches any of the substrings to replace regexp = re.compile('|'.join(map(re.escape, substrs))) # For each match, look up the new string in the replacements return regexp.sub(lambda match: replacements[match.group(0)], string)
正是在这个要点 ,如果你有任何build议,随时修改它。
你应该不这样做,但我觉得它太酷了:
>>> replacements = {'cond1':'text1', 'cond2':'text2'} >>> cmd = 'answer = s' >>> for k,v in replacements.iteritems(): >>> cmd += ".replace(%s, %s)" %(k,v) >>> exec(cmd)
现在, answer
是所有替代品的结果
再次,这是非常 hacky,不是你应该经常使用的东西。 但是,如果你需要的话,你可以做这样的事情。
这是一个更有效的长string,有很多小replace的示例。
source = "Here is foo, it does moo!" replacements = { 'is': 'was', # replace 'is' with 'was' 'does': 'did', '!': '?' } def replace(source, replacements): finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced result = [] pos = 0 while True: match = finder.search(source, pos) if match: # cut off the part up until match result.append(source[pos : match.start()]) # cut off the matched part and replace it in place result.append(replacements[source[match.start() : match.end()]]) pos = match.end() else: # the rest after the last match result.append(source[pos:]) break return "".join(result) print replace(source, replacements)
重点在于避免长串的串联。 我们将源string切成碎片,在我们形成列表的时候replace一些碎片,然后把整个东西join到string中。
或者只是为了快速破解:
for line in to_read: read_buffer = line stripped_buffer1 = read_buffer.replace("term1", " ") stripped_buffer2 = stripped_buffer1.replace("term2", " ") write_to_file = to_write.write(stripped_buffer2)
这是用字典做的另一种方法:
listA="The cat jumped over the house".split() modify = {word:word for number,word in enumerate(listA)} modify["cat"],modify["jumped"]="dog","walked" print " ".join(modify[x] for x in listA)
从安德鲁的宝贵的答案开始,我开发了一个脚本,从一个文件加载字典,并阐述打开的文件夹上的所有文件来做replace。 该脚本从可以在其中设置分隔符的外部文件加载映射。 我是一个初学者,但我发现这个脚本非常有用,当在多个文件中进行多个replace。 它在几秒钟内载入了超过1000个条目的字典。 这不是优雅,但它为我工作
import glob import re mapfile = input("Enter map file name with extension eg. codifica.txt: ") sep = input("Enter map file column separator eg. |: ") mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ") suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ") rep = {} # creation of empy dictionary with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted for line in temprep: (key, val) = line.strip('\n').split(sep) rep[key] = val for filename in glob.iglob(mask): # recursion on all the files with the mask prompted with open (filename, "r") as textfile: # load each file in the variable text text = textfile.read() # start replacement #rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters pattern = re.compile("|".join(rep.keys())) text = pattern.sub(lambda m: rep[m.group(0)], text) #write of te output files with the prompted suffice target = open(filename[:-4]+"_NEW.txt", "w") target.write(text) target.close()
这是我解决问题的方法。 我用它在一个聊天机器人一次replace不同的单词。
def mass_replace(text, dct): new_string = "" old_string = text while len(old_string) > 0: s = "" sk = "" for k in dct.keys(): if old_string.startswith(k): s = dct[k] sk = k if s: new_string+=s old_string = old_string[len(sk):] else: new_string+=old_string[0] old_string = old_string[1:] return new_string print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})
这将成为The cat hunts the dog
又如:input列表
error_list = ['[br]', '[ex]', 'Something'] words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']
期望的输出将是
words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
代码:
[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]
我不知道速度,但这是我的workaday快速修复:
reduce(lambda a, b: a.replace(*b) , [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval) , 'tomato' #The string from which to replace values )
…但我喜欢上面的#1正则expression式的答案。 注意 – 如果一个新值是另一个的子string,那么操作是不可交换的。
为什么不是这样的解决scheme?
s = "The quick brown fox jumps over the lazy dog" for r in (("brown", "red"), ("lazy", "quick")): s = s.replace(*r)