如何通过正则expression式过滤pandas行

我想在其中一列上使用正则expression式干净地过滤数据框。

对于一个人为的例子：

In [210]: foo = pd.DataFrame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']}) In [211]: foo Out[211]: ab 0 1 hi 1 2 foo 2 3 fat 3 4 cat

我想用正则expression式来过滤那些以f开头的行。先去：

 In [213]: foo.b.str.match('f.*') Out[213]: 0 [] 1 () 2 () 3 []

这不是太有用。但是，这将使我的布尔指数：

 In [226]: foo.b.str.match('(f.*)').str.len() > 0 Out[226]: 0 False 1 True 2 True 3 False Name: b

所以我可以通过以下方式来实现我的限制

 In [229]: foo[foo.b.str.match('(f.*)').str.len() > 0] Out[229]: ab 1 2 foo 2 3 fat

这使我人为地把一个组合到正则expression式，似乎也许不是干净的路要走。有一个更好的方法吗？

使用包含：

 In [10]: df.b.str.contains('^f') Out[10]: 0 False 1 True 2 True 3 False Name: b, dtype: bool

带有数据框的多列search：

 frame[frame.filename.str.match('*.'+MetaData+'.*') & frame.file_path.str.match('C:\test\test.txt')]

这可能有点迟了，但pandas现在更容易做到了。你可以用as_indexer=True调用匹配来获得布尔结果。这是logging在案（与match和contains之间的区别）。

已经有一个string处理函数Series.str.startwith() 。

你应该尝试foo[foo.b.str.startswith('f')] 。

结果：

 ab 1 2 foo 2 3 fat

我想你的期望。