Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/365.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python是查找标记最有效的方法_Python_Xml_Css Selectors_Beautifulsoup - Fatal编程技术网

Python是查找标记最有效的方法

Python是查找标记最有效的方法,python,xml,css-selectors,beautifulsoup,Python,Xml,Css Selectors,Beautifulsoup,我正在使用python和BeautifulSoup解析许多大型XML文件。我经常遇到以下任务: <Section1> <Report> <Matrix>...</Matrix> <Matrix>...</Matrix> <Matrix>...</Matrix> <Matrix>...</Matrix>

我正在使用python和BeautifulSoup解析许多大型XML文件。我经常遇到以下任务:

<Section1>
    <Report>
        <Matrix>...</Matrix>
        <Matrix>...</Matrix>
        <Matrix>...</Matrix>
        <Matrix>...</Matrix>
    </Report>
</Section1>
为什么我不能使用这样的选择器

matrices = soup.find("Section1 Matrix")
有没有更快的方法?有时,我访问嵌套在XML中更远的节点,我需要确保它们是后代,但不一定是其他几个节点的直接子节点。提供的示例a是一个简化。任何帮助都将不胜感激

美化组您需要将选择器传递到
。选择
方法

In [1]: from bs4 import BeautifulSoup as BS

In [2]: soup = BS("""<Section1>
   ...:     <Report>
   ...:         <Matrix>...</Matrix>
   ...:         <Matrix>...</Matrix>
   ...:         <Matrix>...</Matrix>
   ...:         <Matrix>...</Matrix>
   ...:     </Report>
   ...: </Section1>""", "xml")

In [3]: soup.select("Section1 Matrix")
Out[3]: 
[<Matrix>...</Matrix>,
 <Matrix>...</Matrix>,
 <Matrix>...</Matrix>,
 <Matrix>...</Matrix>]


1如果cssselect尚未与pip一起安装,则需要安装它:
pip安装cssselect

是否尝试使用lxml?这将大大提高性能。
In [1]: from bs4 import BeautifulSoup as BS

In [2]: soup = BS("""<Section1>
   ...:     <Report>
   ...:         <Matrix>...</Matrix>
   ...:         <Matrix>...</Matrix>
   ...:         <Matrix>...</Matrix>
   ...:         <Matrix>...</Matrix>
   ...:     </Report>
   ...: </Section1>""", "xml")

In [3]: soup.select("Section1 Matrix")
Out[3]: 
[<Matrix>...</Matrix>,
 <Matrix>...</Matrix>,
 <Matrix>...</Matrix>,
 <Matrix>...</Matrix>]
In [3]: from lxml.etree import fromstring

In [4]: xml_doc = '''<Section1>
   ...:     <Report>
   ...:         <Matrix>...</Matrix>
   ...:         <Matrix>...</Matrix>
   ...:         <Matrix>...</Matrix>
   ...:         <Matrix>...</Matrix>
   ...:     </Report>
   ...: </Section1>'''

In [5]: tree = fromstring(xml_doc)

In [6]: matrix = [el for el in sel(tree)]

In [7]: matrix
Out[7]: 
[<Element Matrix at 0x7f84b5b8f388>,
 <Element Matrix at 0x7f84b5b8fc48>,
 <Element Matrix at 0x7f84b5b8fd88>,
 <Element Matrix at 0x7f84b5b8fdc8>]