Python 使用lxml解析html

Python 使用lxml解析html,python,xpath,lxml,Python,Xpath,Lxml,我有下面的html代码 <a name="Audio-Encoders"></a> <h1 class="chapter"><a href="ffmpeg.html#toc-Audio-Encoders">14. Audio Encoders</a></h1> <p>A description of some of the currently available audio encoders follows.

我有下面的html代码

<a name="Audio-Encoders"></a>
<h1 class="chapter"><a href="ffmpeg.html#toc-Audio-Encoders">14. Audio Encoders</a></h1>

<p>A description of some of the currently available audio encoders
follows.
</p>
<a name="ac3-and-ac3_005ffixed"></a>
<h2 class="section"><a href="ffmpeg.html#toc-ac3-and-ac3_005ffixed">14.1 ac3 and     ac3_fixed</a></h2>

<p>AC-3 audio encoders.
</p>
<p>These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as
the undocumented RealAudio 3 (a.k.a. dnet).
</p>
<p>The <var>ac3</var> encoder uses floating-point math, while the <var>ac3_fixed</var>
encoder only uses fixed-point integer math. This does not mean that one is
always faster, just that one or the other may be better suited to a
particular system. The floating-point encoder will generally produce better
quality audio for a given bitrate. The <var>ac3_fixed</var> encoder is not the
default codec for any of the output formats, so it must be specified explicitly
using the option <code>-acodec ac3_fixed</code> in order to use it.
</p>
<a name="AC_002d3-Metadata"></a>
<h3 class="subsection"><a href="ffmpeg.html#toc-AC_002d3-Metadata">14.1.1 AC-3     Metadata</a></h3>

<p>The AC-3 metadata options are used to set parameters that describe the audio,
but in most cases do not affect the audio encoding itself. Some of the options
do directly affect or influence the decoding and playback of the resulting
bitstream, while others are just for informational purposes. A few of the
options will add bits to the output stream that could otherwise be used for
audio data, and will thus affect the quality of the output. Those will be
indicated accordingly with a note in the option list below.
</p>
<p>These parameters are described in detail in several publicly-available
documents.
</p><ul>
如何从每个after
中提取文本

例如
h1标签后的内容是“对当前可用音频编码器的描述” 下面是“

使用”

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 
根据以下XML文档评估此XPath表达式(从提供的格式不正确的片段中获取,删除尾部未关闭的
ul
,并将结果包装到单个顶部元素中):

选择所需的文本节点(本例中仅一个):

类似地,此XPath表达式

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 
选择这些文本节点

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 
最后,这个XPath表达式是:

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 
选择这些文本节点

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 
使用

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 
根据以下XML文档评估此XPath表达式(从提供的格式不正确的片段中获取,删除尾部未关闭的
ul
,并将结果包装到单个顶部元素中):

选择所需的文本节点(本例中仅一个):

类似地,此XPath表达式

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 
选择这些文本节点

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 
最后,这个XPath表达式是:

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 
选择这些文本节点

/*/h1/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h1'
         ]//text()
/*/h2/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h2'
         ]//text()
AC-3 audio encoders. These encoders implement part of ATSC A/52:2010 and ETSI TS 102 366, as well as the undocumented RealAudio 3 (a.k.a. dnet). The 
        ac3 encoder uses floating-point math, while the 
        ac3_fixed encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The floating-point encoder will generally produce better quality audio for a given bitrate. The 
        ac3_fixed encoder is not the default codec for any of the output formats, so it must be specified explicitly using the option 
        -acodec ac3_fixed in order to use it. 
/*/h3/following-sibling::p
         [name(preceding-sibling::*[starts-with(name(),'h')][1])
         = 'h3'
         ]//text()
The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself. Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. These parameters are described in detail in several publicly-available documents. 

这是无效的HTML你那里。我改变了问题,使它更清楚,谢谢你的答复,这几乎没有任何更清楚;请将您的问题设置为独立的,并且不依赖于其他网站(这可能会随着时间的推移而改变,也可能不会改变,从而使您的问题无效)。我再次更改了问题以获得独立的问题。MPEG文档以TextInfo格式提供()这比生成的HTML更容易解析。你那里的HTML是无效的。我把问题改得更清楚了,谢谢你的回答,这几乎不清楚了;请将您的问题设置为独立的,并且不要依赖于其他网站(这可能会随着时间的推移而改变,也可能不会使您的问题无效)。我再次更改了问题,因为一个独立的问题fMPEG文档以TextInfo格式()提供,它比生成的HTML更容易解析。感谢您提供了令人惊讶的答案,但我正试着和你一起做类似的事情xpath@PerguntasEasy,然后只需删除每个表达式末尾的字符串
//text()
。感谢您提供了令人惊讶的答案,但我正在尝试使用xpath@PerguntasEasy,然后只需删除字符串
//text()
位于每个表达式末尾的。