在Python中查找string中多次出现的string

如何在Python中的string中find多个string？考虑一下：

>>> text = "Allowed Hello Hollow" >>> text.find("ll") 1 >>>

所以第一次发生的ll是在预期的1。我如何find它的下一个发生？

同样的问题是有效的清单。考虑：

 >>> x = ['ll', 'ok', 'll']

我如何find所有与他们的索引？

使用正则expression式，您可以使用re.finditer来查找所有（不重叠）的事件：

 >>> import re >>> text = 'Allowed Hello Hollow' >>> for m in re.finditer('ll', text): print('ll found', m.start(), m.end()) ll found 1 3 ll found 10 12 ll found 16 18

或者，如果您不想要正则expression式的开销，也可以重复使用str.find来获取下一个索引：

 >>> text = 'Allowed Hello Hollow' >>> index = 0 >>> while index < len(text): index = text.find('ll', index) if index == -1: break print('ll found at', index) index += 2 # +2 because len('ll') == 2 ll found at 1 ll found at 10 ll found at 16

这也适用于列表和其他序列。

我想你在找什么是string.count

 "Allowed Hello Hollow".count('ll') >>> 3

希望这可以帮助
注意：这只能捕获不重叠的事件

对于列表示例，请使用理解：

 >>> l = ['ll', 'xx', 'll'] >>> print [n for (n, e) in enumerate(l) if e == 'll'] [0, 2]

类似的string：

 >>> text = "Allowed Hello Hollow" >>> print [n for n in xrange(len(text)) if text.find('ll', n) == n] [1, 10, 16]

这将列出“ll”的相邻运行，这可能是也可能不是你想要的：

 >>> text = 'Alllowed Hello Holllow' >>> print [n for n in xrange(len(text)) if text.find('ll', n) == n] [1, 2, 11, 17, 18]

FWIW，这里有几个非RE替代品，我觉得比poke的解决scheme更加整洁。

第一个使用str.index并检查ValueError ：

 def findall(sub, string): """ >>> text = "Allowed Hello Hollow" >>> tuple(findall('ll', text)) (1, 10, 16) """ index = 0 - len(sub) try: while True: index = string.index(sub, index + len(sub)) yield index except ValueError: pass

第二个testing使用str.find并使用iter检查-1的str.find ：

 def findall_iter(sub, string): """ >>> text = "Allowed Hello Hollow" >>> tuple(findall_iter('ll', text)) (1, 10, 16) """ def next_index(length): index = 0 - length while True: index = string.find(sub, index + length) yield index return iter(next_index(len(sub)).next, -1)

要将这些函数中的任何一个应用到列表，元组或其他可迭代的string中，可以使用一个更高级别的函数 – 一个函数作为其参数之一 – 就像这样：

 def findall_each(findall, sub, strings): """ >>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok") >>> list(findall_each(findall, 'll', texts)) [(), (2, 10), (2,), (2,), ()] >>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies") >>> list(findall_each(findall_iter, 'll', texts)) [(4, 7), (1, 6), (2, 7), (2, 6)] """ return (tuple(findall(sub, string)) for string in strings)

对于你的清单例子：

 In [1]: x = ['ll','ok','ll'] In [2]: for idx, value in enumerate(x): ...: if value == 'll': ...: print idx, value 0 ll 2 ll

如果你想要一个包含'll'的列表中的所有项目，你也可以这样做。

 In [3]: x = ['Allowed','Hello','World','Hollow'] In [4]: for idx, value in enumerate(x): ...: if 'll' in value: ...: print idx, value ...: ...: 0 Allowed 1 Hello 3 Hollow

 >>> for n,c in enumerate(text): ... try: ... if c+text[n+1] == "ll": print n ... except: pass ... 1 10 16

一般来说，编程是全新的，并通过在线教程进行工作。我被要求做到这一点，但只使用我迄今为止学到的方法（基本上是string和循环）。不知道这是否会在这里增加任何价值，我知道这不是你将如何做，但我得到它的工作：

 needle = input() haystack = input() counter = 0 n=-1 for i in range (n+1,len(haystack)+1): for j in range(n+1,len(haystack)+1): n=-1 if needle != haystack[i:j]: n = n+1 continue if needle == haystack[i:j]: counter = counter + 1 print (counter)

这个版本的string长度应该是线性的，只要序列不是太重复（在这种情况下，你可以用一个while循环replacerecursion）。

 def find_all(st, substr, start_pos=0, accum=[]): ix = st.find(substr, start_pos) if ix == -1: return accum return find_all(st, substr, start_pos=ix + 1, accum=accum + [ix])

bstpierre的列表理解对于短序列是一个很好的解决scheme，但是看起来有二次复杂性，从来没有在我使用的长文本上完成。

 findall_lc = lambda txt, substr: [n for n in xrange(len(txt)) if txt.find(substr, n) == n]

对于非平凡长度的随机string，这两个函数给出相同的结果：

 import random, string; random.seed(0) s = ''.join([random.choice(string.ascii_lowercase) for _ in range(100000)]) >>> find_all(s, 'th') == findall_lc(s, 'th') True >>> findall_lc(s, 'th')[:4] [564, 818, 1872, 2470]

但是二次版本慢了大约300倍

 %timeit find_all(s, 'th') 1000 loops, best of 3: 282 µs per loop %timeit findall_lc(s, 'th') 10 loops, best of 3: 92.3 ms per loop

 #!/usr/local/bin python3 #-*- coding: utf-8 -*- main_string = input() sub_string = input() count = counter = 0 for i in range(len(main_string)): if main_string[i] == sub_string[0]: k = i + 1 for j in range(1, len(sub_string)): if k != len(main_string) and main_string[k] == sub_string[j]: count += 1 k += 1 if count == (len(sub_string) - 1): counter += 1 count = 0 print(counter)

这个程序计算所有子string的数量，即使它们在不使用正则expression式的情况下重叠。但这是一个天真的实现，在最坏的情况下，为了获得更好的结果，build议通过后缀树，KMP和其他string匹配数据结构和algorithm。

这是我find多个事件的函数。与其他解决scheme不同，它支持可选的切片开始和结束参数，就像str.index一样：

 def all_substring_indexes(string, substring, start=0, end=None): result = [] new_start = start while True: try: index = string.index(substring, new_start, end) except ValueError: return result else: result.append(index) new_start = index + len(substring)

一个简单的迭代代码，它返回出现子串的索引列表。

  def allindices(string, sub): l=[] i = string.find(sub) while i >= 0: l.append(i) i = string.find(sub, i + 1) return l

你可以拆分得到相对位置，然后总结列表中的连续数字，并同时添加（string长度*发生顺序）以获得想要的string索引。

 >>> key = 'll' >>> text = "Allowed Hello Hollow" >>> x = [len(i) for i in text.split(key)[:-1]] >>> [sum(x[:i+1]) + i*len(key) for i in range(len(x))] [1, 10, 16] >>>

这可以使用列表parsing在一行中完成：

 example = "a test am I" indicies = [index for index, value in enumerate(example) if value == "a"] print(indices) >>> [0, 7]

类似的技术适用于列表：

 example = ["a", "b", "c", "a", "d"] indices = [index for index, value in enumerate(example) if value =="a"] print(indices) >>> [0, 3]

也许不是Pythonic，但更多的自我解释。它返回在原始string中查找单词的位置。

 def retrieve_occurences(sequence, word, result, base_counter): indx = sequence.find(word) if indx == -1: return result result.append(indx + base_counter) base_counter += indx + len(word) return retrieve_occurences(sequence[indx + len(word):], word, result, base_counter)

这个链接解释了如何在O（n）中完成整个事情，并在Python中包含一个解决scheme。

如果你进一步下去“ 后缀树 ”，你可以做同样的事情，如果你有一个大的string，但想要在其中search1000年的模式。

我认为没有必要testing文本的长度，只要继续find，直到找不到任何东西。喜欢这个：

  >>> text = 'Allowed Hello Hollow' >>> place = 0 >>> while text.find('ll', place) != -1: print('ll found at', text.find('ll', place)) place = text.find('ll', place) + 2 ll found at 1 ll found at 10 ll found at 16

你也可以用这样的条件列表理解来做到这一点：

 string1= "Allowed Hello Hollow" string2= "ll" print [num for num in xrange(len(string1)-len(string2)+1) if string1[num:num+len(string2)]==string2] # [1, 10, 16]

在Python中查找string中多次出现的string

如何testingPython 3.4 asyncio代码？

logging器configurationlogin到文件并打印到标准输出

如何计算python的平方根？

Python：它有一个argc参数吗？

Django ModelAdmin中的“list_display”可以显示ForeignKey字段的属性吗？

如何访问与该属性名称对应的对象属性给定的string

Python的隐藏function

Python：获取数组中最大项的位置

在条款中的sqlalchemy

如何将string更改为大写