Tag: lxml

通过lxml的属性查找元素: 我需要parsing一个XML文件来提取一些数据。我只需要一些具有某些属性的元素，下面是一个文档示例： <root> <articles> <article type="news"> <content>some text</content> </article> <article type="info"> <content>some text</content> </article> <article type="news"> <content>some text</content> </article> </articles> </root> 在这里我只想得到types为“新闻”的文章。什么是最有效和优雅的方式来做到这一点与lxml？我尝试了查找方法，但它不是很好： from lxml import etree f = etree.parse("myfile") root = f.getroot() articles = root.getchildren()[0] article_list = articles.findall('article') for article in article_list: if "type" in article.keys(): if article.attrib['type'] == 'news': content = article.find('content') […]

在virtualenv中使用pip安装lxml Ubuntu 12.10错误：命令'gcc'失败，退出状态为4: 在尝试在Ubuntu 12.10 x64中将“pip install lxml”运行到virtualenv时遇到以下错误。我有Python 2.7。我在这里看到了其他相关的问题，并尝试安装python-dev，libxml2-dev和libxslt1-dev。请从我提示命令到发生错误的那一刻起，查看回溯。下载/解包lxml 为包lxml运行setup.py egg_info /usr/lib/python2.7/distutils/dist.py:267：UserWarning：Unknown distribution option：'bugtrack_url' warnings.warn（MSG）构buildlxml版本3.1.2。没有Cython的build筑。使用libxslt 1.1.26的构buildconfiguration 在以下目录中针对libxml2 / libxslt构build：/ usr / lib 警告：在目录“src / lxml / tests”下找不到与'* .txt'匹配的文件安装收集的软件包：lxml 为lxml运行setup.py install /usr/lib/python2.7/distutils/dist.py:267：UserWarning：Unknown distribution option：'bugtrack_url' warnings.warn（MSG）构buildlxml版本3.1.2。没有Cython的build筑。使用libxslt 1.1.26的构buildconfiguration 在以下目录中针对libxml2 / libxslt构build：/ usr / lib 构build“lxml.etree”扩展 gcc -pthread -fno-strict-aliasing -DNDEBUG -g […]

Python：如何将html打印成文件: 我正在使用lxml.html来生成一些HTML。我想漂亮打印（缩进）我的最终结果到一个HTML文件。我怎么做？这是我所尝试过的，到现在为止（我对Python和lxml比较陌生）： import lxml.html as lh from lxml.html import builder as E sliderRoot=lh.Element("div", E.CLASS("scroll"), style="overflow-x: hidden; overflow-y: hidden;") scrollContainer=lh.Element("div", E.CLASS("scrollContainer"), style="width: 4340px;") sliderRoot.append(scrollContainer) print lh.tostring(sliderRoot, pretty_print = True, method="html") 正如你所看到的，我正在使用pretty_print=True属性。我认为这会给缩进的代码，但它并没有真正的帮助。这是输出： <div style="overflow-x: hidden; overflow-y: hidden;" class="scroll"><div style="width: 4340px;" class="scrollContainer"></div></div>

如何在不使用MacPorts或Fink的情况下在OS X Leopard上安装lxml？: 我已经尝试了这个，并且在过去遇到了很多问题。有没有人有一个配方安装lxml的OS X没有MacPorts或Fink，绝对有效？最好完成1-2-3步骤来下载和构build每个依赖关系。

点是不能正确安装包：权限被拒绝错误: 我想安装lxml在我的Mac上安装scrapy（v 10.9.4） ╭─ishaantaylor@Ishaans-MacBook-Pro.local ~ ╰─➤ pip install lxml Downloading/unpacking lxml Downloading lxml-3.4.0.tar.gz (3.5MB): 3.5MB downloaded Running setup.py (path:/private/var/folders/8l/t7tcq67d34v7qq_4hp3s1dm80000gn/T/pip_build_ishaantaylor/lxml/setup.py) egg_info for package lxml /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url' warnings.warn(msg) Building lxml version 3.4.0. Building without Cython. Using build configuration of libxslt 1.1.28 warning: no previously-included files found matching '*.py' Installing collected packages: lxml Running setup.py […]

bs4.FeatureNotFound：找不到具有您请求的function的树生成器：lxml。你需要安装一个parsing器库吗？: … soup = BeautifulSoup(html, "lxml") File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? 上述输出在我的terminal上。我在Mac OS 10.7.x上。我有Python 2.7.1，并遵循本教程获得美丽的汤和lxml，这两个都安装成功，并与一个单独的testing文件位于这里工作。在导致这个错误的Python脚本中，我已经包含这一行： from pageCrawler import comparePages并在pageCrawler文件中包含了以下两行： from bs4 import BeautifulSoup from urllib2 import urlopen 任何帮助找出问题是什么，以及如何解决将不胜感激。

如何删除lxml中的元素: 我需要使用python的lxml完全删除基于属性内容的元素。例： import lxml.etree as et xml=""" <groceries> <fruit state="rotten">apple</fruit> <fruit state="fresh">pear</fruit> <fruit state="fresh">starfruit</fruit> <fruit state="rotten">mango</fruit> <fruit state="fresh">peach</fruit> </groceries> """ tree=et.fromstring(xml) for bad in tree.xpath("//fruit[@state=\'rotten\']"): #remove this element from the tree print et.tostring(tree, pretty_print=True) 我想这打印： <groceries> <fruit state="fresh">pear</fruit> <fruit state="fresh">starfruit</fruit> <fruit state="fresh">peach</fruit> </groceries> 有没有办法做到这一点，而不是存储一个临时variables，并手动打印，如： newxml="<groceries>\n" for elt in tree.xpath('//fruit[@state=\'fresh\']'): newxml+=et.tostring(elt) newxml+="</groceries>"

在pythonparsingHTML – lxml或BeautifulSoup？哪种更适合哪种用途？: 从我可以做出来的，Python中的两个主要的HTMLparsing库是lxml和BeautifulSoup。我select了BeautifulSoup作为我正在开发的一个项目，但是除了find语法更容易学习和理解外，我没有特别的理由select它。但是我看到很多人都赞成lxml，我听说lxml更快。所以我想知道一个在另一个的优点是什么？我什么时候想要使用lxml，何时使用BeautifulSoup会更好？还有其他的图书馆值得考虑吗？

在python中安装lxml模块: 当运行一个python脚本时，我得到了这个错误 from lxml import etree ImportError: No module named lxml 现在我试图安装lxml sudo easy_install lmxl 但它给了我以下错误 Building lxml version 2.3.beta1. NOTE: Trying to build without Cython, pre-generated 'src/lxml/lxml.etree.c' needs to be available. ERROR: /bin/sh: xslt-config: not found ** make sure the development packages of libxml2 and libxslt are installed ** 使用libxslt的构buildconfiguration src/lxml/lxml.etree.c:4: fatal error: Python.h: No […]

非ASCII字符的SyntaxError: 我想parsing包含一些非ASCII cheracter的xml，代码如下所示 from lxml import etree from lxml import objectify content = u'<?xml version="1.0" encoding="utf-8"?><div>Order date : 05/08/2013 12:24:28</div>' mail.replace('\xa0',' ') xml = etree.fromstring(mail) 但它显示了我的错误行'内容= …'就像 syntaxError: Non-ASCII character '\xc2' in file /home/projects/ztest/responce.py on line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details 在terminal它正在工作，但在eclipse IDE上运行时，它给我一个错误。不知道如何克服..

Interesting Posts

Tag: lxml

通过lxml的属性查找元素

在virtualenv中使用pip安装lxml Ubuntu 12.10错误：命令'gcc'失败，退出状态为4

Python：如何将html打印成文件

如何在不使用MacPorts或Fink的情况下在OS X Leopard上安装lxml？

点是不能正确安装包：权限被拒绝错误

bs4.FeatureNotFound：找不到具有您请求的function的树生成器：lxml。你需要安装一个parsing器库吗？

如何删除lxml中的元素

在pythonparsingHTML – lxml或BeautifulSoup？哪种更适合哪种用途？

在python中安装lxml模块

非ASCII字符的SyntaxError

你如何获得Laravel存储文件夹的path？

如何避免Response.End（）在Excel文件下载过程中出现“线程被中止”exception

检查string是否只是空格？

运行pdflatex悄悄地

使用Twig中的数组键访问数组值

导航抽屉半透明状态栏不起作用

jQuery .val（）与.attr（“value”）

我们有getElementsByClassName在JavaScript？

在Qt中添加额外的编译器选项

有没有一个好的，完整的教程可用的Erlang分析转换？

与java中的静态字段接口共享“常量”

高性能并发MultiMap Java / Scala

UIButton上的iOS NSAttributedString

我如何find当前的可执行文件名？

varchar（MAX）总是比较好？

Tag: lxml

通过lxml的属性查找元素

在virtualenv中使用pip安装lxml Ubuntu 12.10错误：命令'gcc'失败，退出状态为4

Python：如何将html打印成文件

如何在不使用MacPorts或Fink的情况下在OS X Leopard上安装lxml？

点是不能正确安装包：权限被拒绝错误

bs4.FeatureNotFound：找不到具有您请求的function的树生成器：lxml。 你需要安装一个parsing器库吗？

如何删除lxml中的元素

在pythonparsingHTML – lxml或BeautifulSoup？ 哪种更适合哪种用途？

在python中安装lxml模块

非ASCII字符的SyntaxError

你如何获得Laravel存储文件夹的path？

如何避免Response.End（）在Excel文件下载过程中出现“线程被中止”exception

检查string是否只是空格？

运行pdflatex悄悄地

使用Twig中的数组键访问数组值

导航抽屉半透明状态栏不起作用

jQuery .val（）与.attr（“value”）

我们有getElementsByClassName在JavaScript？

在Qt中添加额外的编译器选项

有没有一个好的，完整的教程可用的Erlang分析转换？

与java中的静态字段接口共享“常量”

高性能并发MultiMap Java / Scala

UIButton上的iOS NSAttributedString

我如何find当前的可执行文件名？

varchar（MAX）总是比较好？

bs4.FeatureNotFound：找不到具有您请求的function的树生成器：lxml。你需要安装一个parsing器库吗？

在pythonparsingHTML – lxml或BeautifulSoup？哪种更适合哪种用途？