通过lxml的属性查找元素

我需要parsing一个XML文件来提取一些数据。我只需要一些具有某些属性的元素，下面是一个文档示例：

<root> <articles> <article type="news"> <content>some text</content> </article> <article type="info"> <content>some text</content> </article> <article type="news"> <content>some text</content> </article> </articles> </root>

在这里我只想得到types为“新闻”的文章。什么是最有效和优雅的方式来做到这一点与lxml？

我尝试了查找方法，但它不是很好：

 from lxml import etree f = etree.parse("myfile") root = f.getroot() articles = root.getchildren()[0] article_list = articles.findall('article') for article in article_list: if "type" in article.keys(): if article.attrib['type'] == 'news': content = article.find('content') content = content.text

你可以使用xpath，例如root.xpath("//article[@type='news']")

这个xpathexpression式将返回所有具有值为“news”的“type”属性的<article/>元素的列表。然后你可以迭代它来做你想做的事情，或者把它传递到任何地方。

为了得到文本内容，可以像这样扩展xpath：

 root = etree.fromstring(""" <root> <articles> <article type="news"> <content>some text</content> </article> <article type="info"> <content>some text</content> </article> <article type="news"> <content>some text</content> </article> </articles> </root> """) print root.xpath("//article[@type='news']/content/text()")

这会输出['some text', 'some text'] 。或者，如果你只是想要的内容元素，这将是"//article[@type='news']/content" – 等等。

仅供参考，您可以通过findall获得相同的结果：

 root = etree.fromstring(""" <root> <articles> <article type="news"> <content>some text</content> </article> <article type="info"> <content>some text</content> </article> <article type="news"> <content>some text</content> </article> </articles> </root> """) articles = root.find("articles") article_list = articles.findall("article[@type='news']/content") for a in article_list: print a.text

通过lxml的属性查找元素

如何使用“查找”search在特定date创build的文件？

如何循环查找由find返回的文件名？

如何找出哪个密钥库用于签署应用程序？

jQuery通过使用AND和OR运算符来select属性

Numpy：快速find第一个价值指数

查找并replace文件中的单词/行

如何去除unix中的“./”“find”？

如何在整个项目/文件夹中recursionsearch一个单词？

使用-exec {}查找，有没有一种方法来计算总数？

如何在Linux中删除多个0字节的文件？