查找两个子串之间的string

如何find两个子串之间的string（ '123STRINGabc' -> 'STRING' ）？

我目前的方法是这样的：

 >>> start = 'asdf=5;' >>> end = '123jasd' >>> s = 'asdf=5;iwantthis123jasd' >>> print((s.split(start))[1].split(end)[0]) iwantthis

但是，这似乎是非常低效和非pythonic。什么是更好的方式来做这样的事情？

忘了提及：string可能不会以开始和结束start和end 。前后可能会有更多的人物。

 s = "123123STRINGabcabc" def find_between( s, first, last ): try: start = s.index( first ) + len( first ) end = s.index( last, start ) return s[start:end] except ValueError: return "" def find_between_r( s, first, last ): try: start = s.rindex( first ) + len( first ) end = s.rindex( last, start ) return s[start:end] except ValueError: return "" print find_between( s, "123", "abc" ) print find_between_r( s, "123", "abc" )

得到：

 123STRING STRINGabc

我认为应该注意 – 根据你需要什么样的行为，你可以混合index和rindex调用，或者去上面的版本之一（它相当于正则expression式(.*)和(.*?)组）。

 import re s = 'asdf=5;iwantthis123jasd' result = re.search('asdf=5;(.*)123jasd', s) print result.group(1)

 s[len(start):-len(end)]

string格式为Nikolaus Gradwohl的build议增加了一些灵活性。现在可以根据需要修改start和end 。

 import re s = 'asdf=5;iwantthis123jasd' start = 'asdf=5;' end = '123jasd' result = re.search('%s(.*)%s' % (start, end), s).group(1) print(result)

 start = 'asdf=5;' end = '123jasd' s = 'asdf=5;iwantthis123jasd' print s[s.find(start)+len(start):s.rfind(end)]

给

 iwantthis

这是一个办法

 _,_,rest = s.partition(start) result,_,_ = rest.partition(end) print result

另一种使用正则expression式的方法

 import re print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]

要么

 print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)

 source='your token _here0@df and maybe _here1@df or maybe _here2@df' start_sep='_' end_sep='@df' result=[] tmp=source.split(start_sep) for par in tmp: if end_sep in par: result.append(par.split(end_sep)[0]) print result

必须显示：here0，here1，here2

正则expression式更好，但它会需要额外的库，你可能只想去python

把OP自己的解决scheme转换成一个答案：

 def find_between(s, start, end): return (s.split(start))[1].split(end)[0]

要提取STRING ，请尝试：

 myString = '123STRINGabc' startString = '123' endString = 'abc' mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]

我的方法是做类似的事情，

 find index of start string in s => i find index of end string in s => j substring = substring(i+len(start) to j-1)

这实际上是cji的答案 – 7月30日在5:58。我更改了try除了结构，以更清楚地说明导致exception的原因。

 def find_between( inputStr, firstSubstr, lastSubstr ): ''' find between firstSubstr and lastSubstr in inputStr STARTING FROM THE LEFT http://stackoverflow.com/questions/3368969/find-string-between-two-substrings above also has a func that does this FROM THE RIGHT ''' start, end = (-1,-1) try: start = inputStr.index( firstSubstr ) + len( firstSubstr ) except ValueError: print ' ValueError: ', print "firstSubstr=%s - "%( firstSubstr ), print sys.exc_info()[1] try: end = inputStr.index( lastSubstr, start ) except ValueError: print ' ValueError: ', print "lastSubstr=%s - "%( lastSubstr ), print sys.exc_info()[1] return inputStr[start:end]

这些解决scheme假定开始string和最终string是不同的。假设使用readlines（）来读取整个文件，那么当初始指标和最终指标相同时，以下是我用于整个文件的解决scheme：

 def extractstring(line,flag='$'): if flag in line: # $ is the flag dex1=line.index(flag) subline=line[dex1+1:-1] #leave out flag (+1) to end of line dex2=subline.index(flag) string=subline[0:dex2].strip() #does not include last flag, strip whitespace return(string)

例：

 lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd', 'afafoaltat $I GOT BETTER!$ derpity derp derp'] for line in lines: string=extractstring(line,flag='$') print(string)

得到：

 A NEWT? I GOT BETTER!

您可以简单地使用此代码或复制下面的function。整齐地排成一列。

 def substring(whole, sub1, sub2): return whole[whole.index(sub1) : whole.index(sub2)]

如果你运行如下function。

 print(substring("5+(5*2)+2", "(", "("))

你将会遗留下输出：

 (5*2

而不是

5*2

如果你想在输出结尾有子string，代码必须如下所示。

 return whole[whole.index(sub1) : whole.index(sub2) + 1]

但是，如果你不想在最后的子string+1必须在第一个值。

 return whole[whole.index(sub1) + 1 : whole.index(sub2)]

 from timeit import timeit from re import search, DOTALL def partition_find(string, start, end): return string.partition(start)[2].rpartition(end)[0] def re_find(string, start, end): # applying re.escape to start and end would be safer return search(start + '(.*)' + end, string, DOTALL).group(1) def index_find(string, start, end): return string[string.find(start) + len(start):string.rfind(end)] # The wikitext of "Alan Turing law" article form English Wikipeida # https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886 string = """...""" start = '==Proposals==' end = '==Rival bills==' assert index_find(string, start, end) \ == partition_find(string, start, end) \ == re_find(string, start, end) print('index_find', timeit( 'index_find(string, start, end)', globals=globals(), number=100_000, )) print('partition_find', timeit( 'partition_find(string, start, end)', globals=globals(), number=100_000, )) print('re_find', timeit( 're_find(string, start, end)', globals=globals(), number=100_000, ))

结果：

 index_find 0.35047444528454114 partition_find 0.5327825636197754 re_find 7.552149639286381

在这个例子中， index_find比index_find慢了近20倍。

使用来自不同电子邮件平台的分隔符parsing文本构成此问题的更大版本。他们通常有一个开始和停止。通配符的分隔符字符一直阻塞正则expression式。分裂的问题在这里和其他地方提到 – 哎呀，分隔符已经没有了。它发生在我身上使用replace（）给split（）别的东西消耗。代码块：

 nuke = '~~~' start = '|*' stop = '*|' julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke)) keep = [chunk for chunk in julien if start in chunk and stop in chunk] logging.info('keep: %s',keep)

这是我之前在Daniweb中作为代码片段发布的内容：

 # picking up piece of string between separators # function using partition, like partition, but drops the separators def between(left,right,s): before,_,a = s.partition(left) a,_,after = a.partition(right) return before,a,after s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa" print between('<a>','</a>',s) print between('(',')',s) print between("'","'",s) """ Output: ('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa") ('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa") ('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa') """

这对我来说似乎更直截了当：

 import re s = 'asdf=5;iwantthis123jasd' x= re.search('iwantthis',s) print(s[x.start():x.end()])

查找两个子串之间的string

Python脚本给出：：没有这样的文件或目录

pythondevise模式

什么版本的Visual Studio是我的电脑上编译的Python？

理解扁平序列的序列？

Pythonic添加datetime.date和datetime.time对象的方法

为什么在这个Python代码段中允许使用分号？

Django数据库设置'错误configuration'错误

将Pandas GroupBy对象转换为DataFrame

Python：定义我自己的操作符？

如何在Python交互模式下撤销True = False？