Python lxml中属性和样式标记之间的差异_Python_Lxml

Python lxml中属性和样式标记之间的差异

python

Python lxml中属性和样式标记之间的差异,python,lxml,Python,Lxml,在使用了BeautifulSoup之后，我正在努力学习lxml。然而，我一般不是一个很强的程序员我在一些源html中有以下代码： The reasons to eat pickles include: 我上面的htm代

在使用了BeautifulSoup之后，我正在努力学习lxml。然而，我一般不是一个很强的程序员

我在一些源html中有以下代码：

<p style="font-family:times;text-align:justify"><font size="2"><b><i> The reasons to eat pickles include:  </i></b></font></p>

我上面的htm代码的第一行是newHTM[19]

哼，这似乎让我离得更近了

newHTM.cssselect('b')

我还不完全理解，但以下是解决方案：

for each in newHTM:
    if each.cssselect('b')
        each.text_content()

使用cssapi确实不是正确的方法。如果要查找所有b元素，请执行以下操作

strHTM=open(r'c:\myfile.htm','r').read() # no need to split it into lines first
newHTM=html.fromString(strHTM)
bELements = newHTM.findall('b')
for b in bElements:
    print b.text_content()

这是我开始的地方，它不起作用。据我所知，这是因为newHTM是一个类，但现在我迷路了。我不知道为什么我决定在newHTM中对每个对象进行操作，但这是关键。我错了，因为newHTM和newHTM中的每个对象都是同一类型的对象，所以这不是我要编辑的，但我不能从字符串sb fromString开始，并且您的列表的名称不同。但当我在我的htm片段上运行此代码时，belents的长度为0。

for each in newHTM:
    if each.cssselect('b')
        each.text_content()

strHTM=open(r'c:\myfile.htm','r').read() # no need to split it into lines first
newHTM=html.fromString(strHTM)
bELements = newHTM.findall('b')
for b in bElements:
    print b.text_content()