Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/77.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-对特定单词在HTML文件上使用正则表达式?_Python_Html_Regex_Email_Beautifulsoup - Fatal编程技术网

Python-对特定单词在HTML文件上使用正则表达式?

Python-对特定单词在HTML文件上使用正则表达式?,python,html,regex,email,beautifulsoup,Python,Html,Regex,Email,Beautifulsoup,我的电子邮件中嵌入了HTML表格。我使用BeautifulSoup来提取数据表,但是在表之外有一些数据我无法用这种方法捕获 下面是一封包含两个数据表的电子邮件示例: 正如我所说,我使用bs4从表的单元格中捕获信息。然后我将这些数据转换成数据帧。我还想捕获包价格,以便将其附加到每个鱼的重量值中。简单的命令,如: for line in f: if ("Package" in line): print("line:", line) …无法打印任何内容。当我仔细检查HTML时,我

我的电子邮件中嵌入了HTML表格。我使用
BeautifulSoup
来提取数据表,但是在表之外有一些数据我无法用这种方法捕获

下面是一封包含两个数据表的电子邮件示例:

正如我所说,我使用
bs4
从表的单元格中捕获信息。然后我将这些数据转换成数据帧。我还想捕获
价格,以便将其附加到每个鱼的重量值中。简单的命令,如:

for line in f: 
    if ("Package" in line):
    print("line:", line)
…无法打印任何内容。当我仔细检查HTML时,我看到它如下所示:

<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>FW: NEFS 11 fish available</title>
<link rel="important stylesheet" href="">
<style>div.headerdisplayname {font-weight:bold;}</style></head>
<body>
<table border=0 cellspacing=0 cellpadding=0 width="100%" class="header-part1"><tr><td><b>Subject: </b>FW: NEFS 11 fish available</td></tr><tr><td><b>From: </b>Claire Fitz-Gerald <claire@capecodfishermen.org></td></tr><tr><td><b>Date: </b>6/2/2016 5:55 PM</td></tr></table><br>
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; ">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
    {font-family:"Franklin Gothic Demi";
    panose-1:2 11 7 3 2 1 2 2 2 4;}
@font-face
    {font-family:"Franklin Gothic Book";
    panose-1:2 11 5 3 2 1 2 2 2 4;}
@font-face
    {font-family:Verdana;
    panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0in;
    margin-bottom:.0001pt;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
    {mso-style-priority:99;
    color:#0563C1;
    text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
    {mso-style-priority:99;
    color:#954F72;
    text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
    {mso-style-name:msonormal;
    mso-margin-top-alt:auto;
    margin-right:0in;
    mso-margin-bottom-alt:auto;
    margin-left:0in;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
span.EmailStyle18
    {mso-style-type:personal-reply;
    font-family:"Calibri",sans-serif;
    color:#1F497D;}
.MsoChpDefault
    {mso-style-type:export-only;
    font-family:"Calibri",sans-serif;}
@page WordSection1
    {size:8.5in 11.0in;
    margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
    {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">Please see below quota listings.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-family:&quot;Franklin Gothic Book&quot;,sans-serif;color:#1F497D">Claire Fitz-Gerald<o:p></o:p></span></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:&quot;Franklin Gothic Book&quot;,sans-serif;color:#1F497D"><o:p>&nbsp;</o:p></span></i></p>
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:&quot;Franklin Gothic Demi&quot;,sans-serif;color:#002776">Cape Cod Commercial Fishermen's Alliance<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:&quot;Franklin Gothic Book&quot;,sans-serif;color:#DE3500">~ Small Boats.&nbsp; Big Ideas. ~</span></b><b><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#DE3500"><o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:&quot;Franklin Gothic Demi&quot;,sans-serif;color:#002776">Celebrating 25 years. Navigating 25 more.</span></b><span style="font-size:11.0pt;font-family:&quot;Franklin Gothic Book&quot;,sans-serif;color:#002060">
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"> NEFS V [mailto:nefsector5@gmail.com]
<br>
<b>Sent:</b> Thursday, June 02, 2016 12:32 PM<br>
<b>To:</b> Ben Martens &lt;ben@mainecoastfishermen.org&gt;; Claire Fitz-Gerald &lt;claire@capecodfishermen.org&gt;; Dave Leveille 2 &amp; 6 &lt;nefs02@gmail.com&gt;; Hank SHS &lt;shsector@gmail.com&gt;; John Haran 10 &amp; 13 &lt;sector13@comcast.net&gt;; Linda MaCann 7 &amp; 8 &lt;nbsector07@comcast.net&gt;;
 mike walsh 6 &lt;fisherwoman2042003@yahoo.com&gt;; Patrick NCCS &lt;patrick@penobscoteast.org&gt;; paula lynch 12 &lt;paulasectorx@yahoo.com&gt;; Spice Montgomery 3 &lt;nefsiii@gmail.com&gt;; Stephanie Rafael-DeMello 9 &lt;nbsector9@gmail.com&gt;; tory bramante 6 &lt;torybra@aol.com&gt;; NEFS
 11 Charles Felch &lt;boat1151@aol.com&gt;; NEFS 11 David Goethel &lt;egoethel@comcast.net&gt;; NEFS 11 Fanel Dobre &lt;stormdancer4@yahoo.com&gt;; NEFS 11 Geordie King &lt;kinggeordie@comcast.net&gt;; NEFS 11 Jamie Hayward &lt;jamienjolyn@comcast.net&gt;; NEFS 11 Jayson Driscoll &lt;jaysondriscoll@yahoo.com&gt;;
 NEFS 11 Mike and Pat Anderson &lt;padi.anderson@gmail.com&gt;; NEFS 11 Neil Pike &lt;sandipike@hotmail.com&gt;; NEFS 11 Richard Anderson &lt;monkfishing@hotmail.com&gt;; NEFS 11 Tom Lyons &lt;tomrlyons@hotmail.com&gt;; Puggy &lt;charles.felch@yahoo.com&gt;<br>
<b>Subject:</b> NEFS 11 fish available<o:p></o:p></span></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">All,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">NEFS 11 has the following available:<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p class="MsoNormal"><b><u><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">Package 1:&nbsp; $ 500.00</span></u></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p>
</div>
<div>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="396" style="width:297.0pt;border-collapse:collapse">
<tbody>
<tr style="height:15.0pt">
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gb cod east</span><o:p></o:p></p>
</td>
<td width="55" style="width:41.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td>
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">1</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gb cod west</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">5</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gom cod</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">148</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gb haddock east</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">1</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gb haddock west</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">2</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gom haddock</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">12</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">white hake</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">4</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">pollock</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">162</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">redfish</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">25</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
</div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">​</span><b><u><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">Package 2: $ 5,225.00</span></u></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p>
</div>
<div>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="387" style="width:290.0pt;border-collapse:collapse">
<tbody>
<tr style="height:15.0pt">
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gom cod</span><o:p></o:p></p>
</td>
<td width="45" style="width:34.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td>
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">916</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gom winter fl</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">498</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif;display:none"><o:p>&nbsp;</o:p></span></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="387" style="width:290.0pt;border-collapse:collapse">
<tbody>
<tr style="height:15.0pt">
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gom haddock</span><o:p></o:p></p>
</td>
<td width="45" style="width:34.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td>
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">284</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">white hake</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">505</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">dab</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">1,293</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">pollock</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">812</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">redfish</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">1,910</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">witch fl</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">352</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">cc/gom yellowtail</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">306</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">​<o:p></o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><b><u><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">Package 3:&nbsp; $ 44,150.00</span></u></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p>
</div>
<div>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="449" style="width:337.0pt;border-collapse:collapse">
<tbody>
<tr style="height:15.0pt">
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gb cod east</span><o:p></o:p></p>
</td>
<td width="45" style="width:34.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td>
<td width="63" style="width:47.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td>
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">5</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gb cod west</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">17</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gom cod</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">5,000</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gom winter fl</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">2,900</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
</div>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="449" style="width:337.0pt;border-collapse:collapse">
<tbody>
<tr style="height:15.0pt">
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gb haddock east</span><o:p></o:p></p>
</td>
<td width="45" style="width:34.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td>
<td width="63" style="width:47.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td>
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">836</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gb haddock west</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">2,118</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">gom haddock</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">18,000</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">white hake</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">8,842</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">dab</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">8,650</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">pollock</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">78,000</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">redfish</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">35,923</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">witch fl</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">3,250</span><o:p></o:p></p>
</td>
</tr>
<tr style="height:15.0pt">
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">cc/gom yellowtail</span><o:p></o:p></p>
</td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td>
<td style="padding:0in 0in 0in 0in;height:15.0pt">
<p class="MsoNormal"><span style="color:black">2,250</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<div>
<p class="MsoNormal"><b><u><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">Package 4:&nbsp; $ 43,135.00</span></u></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Verdana&quot;,sans-serif">GOM cod&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;6,900</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Verdana&quot;,sans-serif">dabs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3,800</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Verdana&quot;,sans-serif">witch fl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4,000</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Verdana&quot;,sans-serif">cc/gom yt&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5,100</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p>&nbsp;</o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><b><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">GB West Cod&nbsp; - 3,251 lbs libe weight = $ 6,500.00</span></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><br clear="all">
<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><br>
-- <o:p></o:p></p>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal">Daniel Salerno<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">NEFS 5 &amp; NEFS 11<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">401-932-0070<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">401-633-6539 (fax)<o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
</body>
</html>
输出:

Package 1:  $ 500.00
[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']
[u'gb cod east', u'', u'1']
[u'gb cod west', u'', u'5']
[u'gom cod', u'', u'148']
[u'gb haddock east', u'', u'1']
[u'gb haddock west', u'', u'2']
[u'gom haddock', u'', u'12']
[u'white hake', u'', u'4']
[u'pollock', u'', u'162']
[u'redfish', u'', u'25']
[u'gom cod', u'', u'916']
[u'gom winter fl', u'', u'498']
[u'gom haddock', u'', u'284']
[u'white hake', u'', u'505']
[u'dab', u'', u'1,293']
[u'pollock', u'', u'812']
[u'redfish', u'', u'1,910']
[u'witch fl', u'', u'352']
[u'cc/gom yellowtail', u'', u'306']
[u'gb cod east', u'', u'', u'5']
[u'gb cod west', u'', u'', u'17']
[u'gom cod', u'', u'', u'5,000']
[u'gom winter fl', u'', u'', u'2,900']
[u'gb haddock east', u'', u'', u'836']
[u'gb haddock west', u'', u'', u'2,118']
[u'gom haddock', u'', u'', u'18,000']
[u'white hake', u'', u'', u'8,842']
[u'dab', u'', u'', u'8,650']
[u'pollock', u'', u'', u'78,000']
[u'redfish', u'', u'', u'35,923']
[u'witch fl', u'', u'', u'3,250']
[u'cc/gom yellowtail', u'', u'', u'2,250']
​Package 2: $ 5,225.00
[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']
[u'gb cod east', u'', u'1']
[u'gb cod west', u'', u'5']
[u'gom cod', u'', u'148']
[u'gb haddock east', u'', u'1']
[u'gb haddock west', u'', u'2']
[u'gom haddock', u'', u'12']
[u'white hake', u'', u'4']
[u'pollock', u'', u'162']
[u'redfish', u'', u'25']
[u'gom cod', u'', u'916']
[u'gom winter fl', u'', u'498']
[u'gom haddock', u'', u'284']
[u'white hake', u'', u'505']
[u'dab', u'', u'1,293']
[u'pollock', u'', u'812']
[u'redfish', u'', u'1,910']
[u'witch fl', u'', u'352']
[u'cc/gom yellowtail', u'', u'306']
[u'gb cod east', u'', u'', u'5']
[u'gb cod west', u'', u'', u'17']
[u'gom cod', u'', u'', u'5,000']
[u'gom winter fl', u'', u'', u'2,900']
[u'gb haddock east', u'', u'', u'836']
[u'gb haddock west', u'', u'', u'2,118']
[u'gom haddock', u'', u'', u'18,000']
[u'white hake', u'', u'', u'8,842']
[u'dab', u'', u'', u'8,650']
[u'pollock', u'', u'', u'78,000']
[u'redfish', u'', u'', u'35,923']
[u'witch fl', u'', u'', u'3,250']
[u'cc/gom yellowtail', u'', u'', u'2,250']
Package 3:  $ 44,150.00
[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']
[u'gb cod east', u'', u'1']
[u'gb cod west', u'', u'5']
[u'gom cod', u'', u'148']
[u'gb haddock east', u'', u'1']
[u'gb haddock west', u'', u'2']
[u'gom haddock', u'', u'12']
[u'white hake', u'', u'4']
[u'pollock', u'', u'162']
[u'redfish', u'', u'25']
[u'gom cod', u'', u'916']
[u'gom winter fl', u'', u'498']
[u'gom haddock', u'', u'284']
[u'white hake', u'', u'505']
[u'dab', u'', u'1,293']
[u'pollock', u'', u'812']
[u'redfish', u'', u'1,910']
[u'witch fl', u'', u'352']
[u'cc/gom yellowtail', u'', u'306']
[u'gb cod east', u'', u'', u'5']
[u'gb cod west', u'', u'', u'17']
[u'gom cod', u'', u'', u'5,000']
[u'gom winter fl', u'', u'', u'2,900']
[u'gb haddock east', u'', u'', u'836']
[u'gb haddock west', u'', u'', u'2,118']
[u'gom haddock', u'', u'', u'18,000']
[u'white hake', u'', u'', u'8,842']
[u'dab', u'', u'', u'8,650']
[u'pollock', u'', u'', u'78,000']
[u'redfish', u'', u'', u'35,923']
[u'witch fl', u'', u'', u'3,250']
[u'cc/gom yellowtail', u'', u'', u'2,250']
Package 4:  $ 43,135.00
[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']
[u'gb cod east', u'', u'1']
[u'gb cod west', u'', u'5']
[u'gom cod', u'', u'148']
[u'gb haddock east', u'', u'1']
[u'gb haddock west', u'', u'2']
[u'gom haddock', u'', u'12']
[u'white hake', u'', u'4']
[u'pollock', u'', u'162']
[u'redfish', u'', u'25']
[u'gom cod', u'', u'916']
[u'gom winter fl', u'', u'498']
[u'gom haddock', u'', u'284']
[u'white hake', u'', u'505']
[u'dab', u'', u'1,293']
[u'pollock', u'', u'812']
[u'redfish', u'', u'1,910']
[u'witch fl', u'', u'352']
[u'cc/gom yellowtail', u'', u'306']
[u'gb cod east', u'', u'', u'5']
[u'gb cod west', u'', u'', u'17']
[u'gom cod', u'', u'', u'5,000']
[u'gom winter fl', u'', u'', u'2,900']
[u'gb haddock east', u'', u'', u'836']
[u'gb haddock west', u'', u'', u'2,118']
[u'gom haddock', u'', u'', u'18,000']
[u'white hake', u'', u'', u'8,842']
[u'dab', u'', u'', u'8,650']
[u'pollock', u'', u'', u'78,000']
[u'redfish', u'', u'', u'35,923']
[u'witch fl', u'', u'', u'3,250']
[u'cc/gom yellowtail', u'', u'', u'2,250']
不幸的是,在每个迭代中都有一些冗余的、不相关的重复

[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']

如果需要,可以规避

我尝试在上次
for
-循环之前添加一行
chunks=soup.find_all('p',{'class':“MsoNormal”})
,然后将上次
for
-循环修改为:
for line in chunks:if'Package'in line.text:print line.text for tables in tables:
。。。for循环的其余部分保持不变。我得到了每一行
包装#
,然后是包装对应的鱼重值。这对你也有用吗?如果不清楚,我可以发布完整的代码。
chunks
是否在表中的表的
后面?你知道吗,是的,如果你能把它贴出来,那会更容易理解。是的,
chunks
放在哪里并不重要,但我把它放在
表格之后,放在
行中chunks
loopIdk man之前,它甚至都不适合我。我对HTML的东西一窍不通,对BeautifulSoup也不是很熟悉。明天我会继续努力让你的代码工作,因为你的打印输出看起来正是我想要的。最后,我将数据转换成一个数据帧,所以我只想捕获包价格并将其附加到相应的数据框中真的吗?嗯,对不起。在您的代码中有
html=body.decode()
,因为它是一个字符串,您应该能够将我的代码中的
html\u doc
与该
html
变量交换。我基本上复制了你粘贴到字符串中的
html
文件,并在其上运行了我的代码。好吧,好吧,就是那篇
html\u doc
文章把我的代码弄乱了,像往常一样,我犯了一个愚蠢的错误。你的代码运行得很好。因此,基本上,对于HTML文档,您不能执行简单的命令,如
如果行中的'Package':
,您必须执行类似于使用
BeautifulSoup
的操作,找到
p
标记和
MsoNormal
标记,然后使用更类似于
如果行中的'Package',text:
,我理解得对吗,这就产生了每对鱼的重量作为自己的清单,价格超出了所有相应的清单;因此,当我像以前一样将这些列表转换为数据帧时,它有点失败。这可能不是问这个问题的合适地方,我可能不得不提出一个新的SO问题,但您知道如何将所有这些列表转换为一个整洁的数据框架吗?并将包装价格值附加到所有对应的鱼?很抱歉,我还没有在您的评论中回答您的问题,我注意到我的回答非常不正确(正如您所注意到的,它会重复每个价格的所有鱼重对,而不是每个价格的每个对应鱼重对)。我一直在努力解决这个问题。抱歉,我没有澄清“html\u doc”只是您的
html=body.decode()
的“字符串”表示形式,而且(因为html只是一个字符串),您是对的,如果“Package”在第行:
中,您可以执行类似于
的操作,如果对象是BS
标记
类型,我们首先需要使用
标记.text从中获取字符串值,然后我们可以搜索它
Package 1:  $ 500.00
[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']
[u'gb cod east', u'', u'1']
[u'gb cod west', u'', u'5']
[u'gom cod', u'', u'148']
[u'gb haddock east', u'', u'1']
[u'gb haddock west', u'', u'2']
[u'gom haddock', u'', u'12']
[u'white hake', u'', u'4']
[u'pollock', u'', u'162']
[u'redfish', u'', u'25']
[u'gom cod', u'', u'916']
[u'gom winter fl', u'', u'498']
[u'gom haddock', u'', u'284']
[u'white hake', u'', u'505']
[u'dab', u'', u'1,293']
[u'pollock', u'', u'812']
[u'redfish', u'', u'1,910']
[u'witch fl', u'', u'352']
[u'cc/gom yellowtail', u'', u'306']
[u'gb cod east', u'', u'', u'5']
[u'gb cod west', u'', u'', u'17']
[u'gom cod', u'', u'', u'5,000']
[u'gom winter fl', u'', u'', u'2,900']
[u'gb haddock east', u'', u'', u'836']
[u'gb haddock west', u'', u'', u'2,118']
[u'gom haddock', u'', u'', u'18,000']
[u'white hake', u'', u'', u'8,842']
[u'dab', u'', u'', u'8,650']
[u'pollock', u'', u'', u'78,000']
[u'redfish', u'', u'', u'35,923']
[u'witch fl', u'', u'', u'3,250']
[u'cc/gom yellowtail', u'', u'', u'2,250']
​Package 2: $ 5,225.00
[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']
[u'gb cod east', u'', u'1']
[u'gb cod west', u'', u'5']
[u'gom cod', u'', u'148']
[u'gb haddock east', u'', u'1']
[u'gb haddock west', u'', u'2']
[u'gom haddock', u'', u'12']
[u'white hake', u'', u'4']
[u'pollock', u'', u'162']
[u'redfish', u'', u'25']
[u'gom cod', u'', u'916']
[u'gom winter fl', u'', u'498']
[u'gom haddock', u'', u'284']
[u'white hake', u'', u'505']
[u'dab', u'', u'1,293']
[u'pollock', u'', u'812']
[u'redfish', u'', u'1,910']
[u'witch fl', u'', u'352']
[u'cc/gom yellowtail', u'', u'306']
[u'gb cod east', u'', u'', u'5']
[u'gb cod west', u'', u'', u'17']
[u'gom cod', u'', u'', u'5,000']
[u'gom winter fl', u'', u'', u'2,900']
[u'gb haddock east', u'', u'', u'836']
[u'gb haddock west', u'', u'', u'2,118']
[u'gom haddock', u'', u'', u'18,000']
[u'white hake', u'', u'', u'8,842']
[u'dab', u'', u'', u'8,650']
[u'pollock', u'', u'', u'78,000']
[u'redfish', u'', u'', u'35,923']
[u'witch fl', u'', u'', u'3,250']
[u'cc/gom yellowtail', u'', u'', u'2,250']
Package 3:  $ 44,150.00
[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']
[u'gb cod east', u'', u'1']
[u'gb cod west', u'', u'5']
[u'gom cod', u'', u'148']
[u'gb haddock east', u'', u'1']
[u'gb haddock west', u'', u'2']
[u'gom haddock', u'', u'12']
[u'white hake', u'', u'4']
[u'pollock', u'', u'162']
[u'redfish', u'', u'25']
[u'gom cod', u'', u'916']
[u'gom winter fl', u'', u'498']
[u'gom haddock', u'', u'284']
[u'white hake', u'', u'505']
[u'dab', u'', u'1,293']
[u'pollock', u'', u'812']
[u'redfish', u'', u'1,910']
[u'witch fl', u'', u'352']
[u'cc/gom yellowtail', u'', u'306']
[u'gb cod east', u'', u'', u'5']
[u'gb cod west', u'', u'', u'17']
[u'gom cod', u'', u'', u'5,000']
[u'gom winter fl', u'', u'', u'2,900']
[u'gb haddock east', u'', u'', u'836']
[u'gb haddock west', u'', u'', u'2,118']
[u'gom haddock', u'', u'', u'18,000']
[u'white hake', u'', u'', u'8,842']
[u'dab', u'', u'', u'8,650']
[u'pollock', u'', u'', u'78,000']
[u'redfish', u'', u'', u'35,923']
[u'witch fl', u'', u'', u'3,250']
[u'cc/gom yellowtail', u'', u'', u'2,250']
Package 4:  $ 43,135.00
[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']
[u'gb cod east', u'', u'1']
[u'gb cod west', u'', u'5']
[u'gom cod', u'', u'148']
[u'gb haddock east', u'', u'1']
[u'gb haddock west', u'', u'2']
[u'gom haddock', u'', u'12']
[u'white hake', u'', u'4']
[u'pollock', u'', u'162']
[u'redfish', u'', u'25']
[u'gom cod', u'', u'916']
[u'gom winter fl', u'', u'498']
[u'gom haddock', u'', u'284']
[u'white hake', u'', u'505']
[u'dab', u'', u'1,293']
[u'pollock', u'', u'812']
[u'redfish', u'', u'1,910']
[u'witch fl', u'', u'352']
[u'cc/gom yellowtail', u'', u'306']
[u'gb cod east', u'', u'', u'5']
[u'gb cod west', u'', u'', u'17']
[u'gom cod', u'', u'', u'5,000']
[u'gom winter fl', u'', u'', u'2,900']
[u'gb haddock east', u'', u'', u'836']
[u'gb haddock west', u'', u'', u'2,118']
[u'gom haddock', u'', u'', u'18,000']
[u'white hake', u'', u'', u'8,842']
[u'dab', u'', u'', u'8,650']
[u'pollock', u'', u'', u'78,000']
[u'redfish', u'', u'', u'35,923']
[u'witch fl', u'', u'', u'3,250']
[u'cc/gom yellowtail', u'', u'', u'2,250']
[u'Subject: FW: NEFS 11 fish available']
[u'From: Claire Fitz-Gerald']
[u'Date: 6/2/2016 5:55 PM']