Python-转置数据帧

Python-转置数据帧,python,oracle,pandas,transpose,Python,Oracle,Pandas,Transpose,我想转换以下数据帧,以便将其导出到Oracle表中 0 ID Available Quota \ 1 1724 GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT 2 1578 GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ... 3 310 GBE COD GBW COD DABS WHAKE PO

我想转换以下数据帧,以便将其导出到Oracle表中

0    ID                                    Available Quota  \
1  1724       GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT   
2  1578  GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ...   
3   310  GBE COD GBW COD DABS WHAKE POLL RED SNE BB GOM BB   

0                                 Live Weight Pounds  \
1                        2328 445 3007 850 3101 1995   
2     538 5894 1755 243 490 153 3965 2727 9227 15060   
3  825 9033 1241 3120 65234 76610 1688 1195 2121 ...   

0                                              Price Date Posted  
1                                     Package $9,000        5/20  
2  $1.00 $0.40 $0.20 $1.00 $0.45 $0.50 $0.15 $0.2...        5/20  
3                                    Package $15,000        5/20
理想情况下,数据应如下对齐,以便我可以轻松地将其放入Oracle数据库:

第二个ID的开头应该如下所示:

原始数据表如下所示,我的目标只是解析最近日期的数据顺便说一句:

使用
pd.transpose
并没有改变任何东西,因为我的数据帧显然是(3,5),它需要是(5,5)才能工作。使用
pd.melt()
会导致:

                     0                                              value
0                   ID                                               1724
1                   ID                                               1578
2                   ID                                                310
3      Available Quota       GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT
4      Available Quota  GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ...
5      Available Quota  GBE COD GBW COD DABS WHAKE POLL RED SNE BB GOM BB
6   Live Weight Pounds                        2328 445 3007 850 3101 1995
7   Live Weight Pounds     538 5894 1755 243 490 153 3965 2727 9227 15060
8   Live Weight Pounds  825 9033 1241 3120 65234 76610 1688 1195 2121 ...
9                Price                                     Package $9,000
10               Price  $1.00 $0.40 $0.20 $1.00 $0.45 $0.50 $0.15 $0.2...
11               Price                                    Package $15,000
12         Date Posted                                               5/20
13         Date Posted                                               5/20
14         Date Posted                                               5/20
……这也不适用于出口

我的相关代码:

with open(file_path, 'r') as f:
            def read_html_latest(filename, **kwargs):
            #with open(filename) as f:
                text = f.read().replace('<br>', ' ')
                df = pd.read_html(text, **kwargs)[0]
                column_headers = ['ID', 'Available Quota', 'Live Weight Pounds', 'Price', 'Date Posted']
                df.columns = df.loc[0]
                df = df.loc[1:]
                return df.assign(d=pd.to_datetime(df['Date Posted'], format='%m/%d')) \
                       .query('d == d.max()') \
                       .drop('d', 1)
            df = read_html_latest(filename, attrs={'class': 'MsoNormalTable'})
            print(df)
打开(文件路径'r')作为f:
def read_html_最新版本(文件名,**kwargs):
#打开(文件名)为f时:
text=f.read().replace(“
”,“”) df=pd.read_html(文本,**kwargs)[0] 列标题=['ID'、'可用配额'、'活重磅'、'价格'、'发布日期'] df.columns=df.loc[0] df=df.loc[1:] 返回df.assign(d=pd.to_datetime(df['Date Posted'],格式='%m/%d'))\ .query('d==d.max()'))\ .drop('d',1) df=read\u html\u latest(文件名,attrs={'class':'MsoNormalTable'}) 打印(df)
如果您能帮助解决此问题,我们将不胜感激,非常感谢

源HTML代码:

<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>FW: NEFS 2 Available Quota 5/21</title>
<link rel="important stylesheet" href="">
<style>div.headerdisplayname {font-weight:bold;}</style></head>
<body>
<table border=0 cellspacing=0 cellpadding=0 width="100%" class="header-part1"><tr><td><b>Subject: </b>FW: NEFS 2 Available Quota 5/21</td></tr><tr><td><b>From: </b>Claire Fitz-Gerald <claire@capecodfishermen.org></td></tr><tr><td><b>Date: </b>5/21/2014 10:08 AM</td></tr></table><br>
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; "><meta name=Generator content="Microsoft Word 12 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
    {font-family:Tahoma;
    panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
    {font-family:"Franklin Gothic Book";
    panose-1:2 11 5 3 2 1 2 2 2 4;}
@font-face
    {font-family:"Franklin Gothic Demi";
    panose-1:2 11 7 3 2 1 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0in;
    margin-bottom:.0001pt;
    font-size:11.0pt;
    font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
    {mso-style-priority:99;
    color:blue;
    text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
    {mso-style-priority:99;
    color:purple;
    text-decoration:underline;}
span.EmailStyle17
    {mso-style-type:personal;
    font-family:"Calibri","sans-serif";
    color:windowtext;}
span.title1
    {mso-style-name:title1;
    font-family:"Arial","sans-serif";
    color:#1F487E;
    font-weight:normal;}
span.EmailStyle19
    {mso-style-type:personal-reply;
    font-family:"Calibri","sans-serif";
    color:#1F497D;}
.MsoChpDefault
    {mso-style-type:export-only;
    font-size:10.0pt;}
@page WordSection1
    {size:8.5in 11.0in;
    margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
    {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>Please see the below quota listings.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Claire Fitz-Gerald<o:p></o:p></span></p><p class=MsoNormal><i><span style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p>&nbsp;</o:p></span></i></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance<o:p></o:p></span></b></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>~ Small Boats.&nbsp; Big Ideas. ~</span></b><b><span style='color:#DE3500'><o:p></o:p></span></b></p></div><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> David Leveille [mailto:nefs02@gmail.com] <br><b>Sent:</b> Wednesday, May 21, 2014 8:50 AM<br><b>To:</b> David Leveille<br><b>Subject:</b> NEFS 2 Available Quota 5/21<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#1F487E'>AVAILABLE QUOTA FY 2014</span><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'><o:p></o:p></span></p><table class=MsoNormalTable border=0 cellspacing=0 cellpadding=0 width="71%" style='width:71.28%'><tr><td width=220 style='width:164.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><b><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:black'>ID <o:p></o:p></span></b></p></td><td width=161 style='width:120.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Available Quota <o:p></o:p></span></b></p></td><td width=189 style='width:141.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Live Weight Pounds <o:p></o:p></span></b></p></td><td width=126 style='width:94.55pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Price <o:p></o:p></span></b></p></td><td width=168 style='width:125.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Date Posted <o:p></o:p></span></b></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1724<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>GOM BB<br>GREYSOLE<br>DABS<br>GOM YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>2328<br>445<br>3007<br>850<br>3101<br>1995<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$9,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1578<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB BB<br>GB YT<br>SNE BB<br>SNE YT<br>GOM BB<br>Whake<br>POLL<br>RED<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>538<br>5894<br>1755<br>243<br>490<br>153<br>3965<br>2727<br>9227<br>15060<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.00<br>$0.40<br>$0.20<br>$1.00<br>$0.45<br>$0.50<br>$0.15<br>$0.20<br>$0.01<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>310<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>DABS<br>WHAKE<br>POLL<br>RED<br>SNE BB<br>GOM BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>825<br>9033<br>1241<br>3120<br>65234<br>76610<br>1688<br>1195<br>2121<br>7285<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$15,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr style='height:23.25pt'><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>347<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>SNE BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>8,000<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$0.50<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/7<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878A<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>SNE BB<br>GOM BB<br>GB BB<br>GREYSOLE<br>GOM YT<br>SNE YT<br>POLL<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>6188<br>635<br>3916<br>7873<br>6762<br>3358<br>9776<br>271<br>186550<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.95<br>$1.35<br>$0.50<br>$0.50<br>$0.20<br>$1.40<br>$1.20<br>$0.50<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878B<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1113<br>12186<br>850<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<br>$10,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr></table><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal>David Leveille<o:p></o:p></p><p class=MsoNormal>II Northeast Fishery Sector Inc.<o:p></o:p></p><p class=MsoNormal>10 Witham Street<o:p></o:p></p><p class=MsoNormal>Gloucester, MA. 01930<o:p></o:p></p><p class=MsoNormal>Cell 978 375 3509<o:p></o:p></p><p class=MsoNormal>Fax 978 281 1555<o:p></o:p></p><p class=MsoNormal>Web <a href="http://nefs2.com/">http://nefs2.com/</a><o:p></o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><div class=MsoNormal align=center style='text-align:center'><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'></body></html>
</body>
</html>

FW:NEFS 2可用配额5/21
div.headerdisplayname{字体大小:粗体;}
主题:FW:NEFS 2可用配额5/21日期:Claire Fitz Gerald日期:2014年5月21日上午10:08

请查看以下配额列表。

谢谢,

克莱尔·菲茨·杰拉尔德

小型渔船。好主意~

发件人:David Leveille[mailto:nefs02@gmail.com]
发送日期:5月21日,星期三,2014年8月50日上午
致:David Leveille
主题:NEFS 2可用配额5/21

2014财年可用配额

ID

可用配额

价格

发布日期

1724

GOM COD1995年

包装

9000美元class=MsoNormal style='line-height:15.0pt'>GBE-COD
gbbb
gbyt
SNE-BB
GOM-BB
Whake
POLL
RED

538
5894
1755
243
490
153
3965
2727
9227
1506010.20

0.45
0.45
0.45

0.50
0.50
0.50
0.50
0.15
0.15
0.15
0.15.0 pt>15.0 pt>15.0 pt>15.0 pt>15.0


0.15.50
0.15.50

0.15.50
0
0
0.50
0.50
0.15.50
0.15.15
0.15
0
0.15
0
0.15.15
0.15
0.15
0
0.15
0.15.15
0
0825
9033
3120
65234
76610
1688
1195
2121
7285

包装

class=MsoNormal-style='line-height:15.0pt'>347

SNE-BB

8000

0.50美元格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔样式class='line-height:15.0pt'>1.95美元
$1.35美元
$0.50
$0.20
$1.40
$1.20
$0.50
$0.01

5/12

1878B标准高度:15.0pt>1113
12186
850

Package
$10000

5/12

David levelle

II东北渔业公司街道,马萨诸塞州格洛斯特。01930

Cell 978 375 3509

传真978 281 1555

Web


此工作代码读取每个单元格,创建列表,然后列表到数据帧请注意,只有当一行中所有单元格中的项目数相同时,此代码才有效。

from bs4 import BeautifulSoup, NavigableString, Tag
import pandas as pd
import numpy as np
def celltext(cell):
    '''    
        textlist=[]
        for br in cell.findAll('br'):
            next = br.nextSibling
            if not (next and isinstance(next,NavigableString)):
                continue
            next2 = next.nextSibling
            if next2 and isinstance(next2,Tag) and next2.name == 'br':
                text = str(next).strip()
                if text:
                    textlist.append(next)
        return (textlist)
    '''
    textlist=[]
    y = cell.find('span')
    for a in y.childGenerator(): 
        if isinstance(a, NavigableString):
            textlist.append(str(a))
    return (textlist)

html=open('patht\to\html.html','r').read()
soup = BeautifulSoup(html, 'lxml') # Parse the HTML as a string
table = soup.find_all('table')[1] # Grab the second table

df_Quota = pd.DataFrame()

for row in table.find_all('tr'):    
    columns = row.find_all('td')
    if columns[0].get_text().strip()<>'ID':  # skip header 
        Quota = celltext(columns[1]) 
        Weight =  celltext(columns[2])
        price =  celltext(columns[3])

        Nrows= max([len(Quota),len(Weight),len(price)]) #get the max number of rows

        IDList = [columns[0].get_text()] * Nrows
        DateList = [columns[4].get_text()] * Nrows

        if price[0].strip()=='Package':
             price = [columns[3].get_text()] * Nrows

        if len(Quota)<len(Weight):  #if Quota has less itmes extened with nan
           lstnans= [np.nan]*(len(Weight)-len(Quota))
           Quota.extend(lstnans)

        FinalDataframe = pd.DataFrame(
        {
        'ID':IDList,    
         'AvailableQuota': Quota,
         'LiveWeightPounds': Weight,
         'price':price,
         'DatePosted':DateList
        })
    df_Quota= df_Quota.append(FinalDataframe)
print df_Quota

如何确定“可用配额”中有多少文本值,我确实看到一个或多个文本。另外,您希望第二排的价格是多少?现在更合理了,谢谢:)来源是excel文件吗?。你也许可以用这个。也在第
 AvailableQuota DatePosted     ID LiveWeightPounds            price
0        GOM COD       5/12  1878A             6188            $1.95
1       GOM HADD       5/12  1878A              635            $1.35
2         SNE BB       5/12  1878A             3916            $0.50
3         GOM BB       5/12  1878A             7873            $0.50
4          GB BB       5/12  1878A             6762            $0.20
5       GREYSOLE       5/12  1878A             3358            $1.40
6         GOM YT       5/12  1878A             9776            $1.20
7         SNE YT       5/12  1878A              271            $0.50
8           POLL       5/12  1878A           186550            $0.01
0        GOM COD       5/20   1724             2328   Package $9,000
1       GOM HADD       5/20   1724              445   Package $9,000
2         GOM BB       5/20   1724             3007   Package $9,000
3       GREYSOLE       5/20   1724              850   Package $9,000
4           DABS       5/20   1724             3101   Package $9,000
5         GOM YT       5/20   1724             1995   Package $9,000
0        GBE COD       5/20   1578              538            $1.00
1        GBW COD       5/20   1578             5894            $0.40
2          GB BB       5/20   1578             1755            $0.20
3          GB YT       5/20   1578              243            $1.00
4         SNE BB       5/20   1578              490            $0.45
5         SNE YT       5/20   1578              153            $0.50
6         GOM BB       5/20   1578             3965            $0.15
7          Whake       5/20   1578             2727            $0.20
8           POLL       5/20   1578             9227            $0.01
9            RED       5/20   1578            15060            $0.01
0        GBE COD       5/20    310              825  Package $15,000
1        GBW COD       5/20    310             9033  Package $15,000
2           DABS       5/20    310             1241  Package $15,000
3          WHAKE       5/20    310             3120  Package $15,000
4           POLL       5/20    310            65234  Package $15,000
5            RED       5/20    310            76610  Package $15,000
6         SNE BB       5/20    310             1688  Package $15,000
7         GOM BB       5/20    310             1195  Package $15,000
8            NaN       5/20    310             2121  Package $15,000
9            NaN       5/20    310             7285  Package $15,000
0         SNE BB        5/7    347            8,000            $0.50
0        GOM COD       5/12  1878A             6188            $1.95
1       GOM HADD       5/12  1878A              635            $1.35
2         SNE BB       5/12  1878A             3916            $0.50
3         GOM BB       5/12  1878A             7873            $0.50
4          GB BB       5/12  1878A             6762            $0.20
5       GREYSOLE       5/12  1878A             3358            $1.40
6         GOM YT       5/12  1878A             9776            $1.20
7         SNE YT       5/12  1878A              271            $0.50
8           POLL       5/12  1878A           186550            $0.01
0        GBE COD       5/12  1878B             1113   Package$10,000
1        GBW COD       5/12  1878B            12186   Package$10,000
2          GB YT       5/12  1878B              850   Package$10,000