Python-转置数据帧
我想转换以下数据帧,以便将其导出到Oracle表中Python-转置数据帧,python,oracle,pandas,transpose,Python,Oracle,Pandas,Transpose,我想转换以下数据帧,以便将其导出到Oracle表中 0 ID Available Quota \ 1 1724 GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT 2 1578 GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ... 3 310 GBE COD GBW COD DABS WHAKE PO
0 ID Available Quota \
1 1724 GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT
2 1578 GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ...
3 310 GBE COD GBW COD DABS WHAKE POLL RED SNE BB GOM BB
0 Live Weight Pounds \
1 2328 445 3007 850 3101 1995
2 538 5894 1755 243 490 153 3965 2727 9227 15060
3 825 9033 1241 3120 65234 76610 1688 1195 2121 ...
0 Price Date Posted
1 Package $9,000 5/20
2 $1.00 $0.40 $0.20 $1.00 $0.45 $0.50 $0.15 $0.2... 5/20
3 Package $15,000 5/20
理想情况下,数据应如下对齐,以便我可以轻松地将其放入Oracle数据库:
第二个ID的开头应该如下所示:
原始数据表如下所示,我的目标只是解析最近日期的数据顺便说一句:
使用pd.transpose
并没有改变任何东西,因为我的数据帧显然是(3,5),它需要是(5,5)才能工作。使用pd.melt()
会导致:
0 value
0 ID 1724
1 ID 1578
2 ID 310
3 Available Quota GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT
4 Available Quota GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ...
5 Available Quota GBE COD GBW COD DABS WHAKE POLL RED SNE BB GOM BB
6 Live Weight Pounds 2328 445 3007 850 3101 1995
7 Live Weight Pounds 538 5894 1755 243 490 153 3965 2727 9227 15060
8 Live Weight Pounds 825 9033 1241 3120 65234 76610 1688 1195 2121 ...
9 Price Package $9,000
10 Price $1.00 $0.40 $0.20 $1.00 $0.45 $0.50 $0.15 $0.2...
11 Price Package $15,000
12 Date Posted 5/20
13 Date Posted 5/20
14 Date Posted 5/20
……这也不适用于出口
我的相关代码:
with open(file_path, 'r') as f:
def read_html_latest(filename, **kwargs):
#with open(filename) as f:
text = f.read().replace('<br>', ' ')
df = pd.read_html(text, **kwargs)[0]
column_headers = ['ID', 'Available Quota', 'Live Weight Pounds', 'Price', 'Date Posted']
df.columns = df.loc[0]
df = df.loc[1:]
return df.assign(d=pd.to_datetime(df['Date Posted'], format='%m/%d')) \
.query('d == d.max()') \
.drop('d', 1)
df = read_html_latest(filename, attrs={'class': 'MsoNormalTable'})
print(df)
打开(文件路径'r')作为f:
def read_html_最新版本(文件名,**kwargs):
#打开(文件名)为f时:
text=f.read().replace(“
”,“”)
df=pd.read_html(文本,**kwargs)[0]
列标题=['ID'、'可用配额'、'活重磅'、'价格'、'发布日期']
df.columns=df.loc[0]
df=df.loc[1:]
返回df.assign(d=pd.to_datetime(df['Date Posted'],格式='%m/%d'))\
.query('d==d.max()'))\
.drop('d',1)
df=read\u html\u latest(文件名,attrs={'class':'MsoNormalTable'})
打印(df)
如果您能帮助解决此问题,我们将不胜感激,非常感谢
源HTML代码:
<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>FW: NEFS 2 Available Quota 5/21</title>
<link rel="important stylesheet" href="">
<style>div.headerdisplayname {font-weight:bold;}</style></head>
<body>
<table border=0 cellspacing=0 cellpadding=0 width="100%" class="header-part1"><tr><td><b>Subject: </b>FW: NEFS 2 Available Quota 5/21</td></tr><tr><td><b>From: </b>Claire Fitz-Gerald <claire@capecodfishermen.org></td></tr><tr><td><b>Date: </b>5/21/2014 10:08 AM</td></tr></table><br>
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; "><meta name=Generator content="Microsoft Word 12 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:"Franklin Gothic Book";
panose-1:2 11 5 3 2 1 2 2 2 4;}
@font-face
{font-family:"Franklin Gothic Demi";
panose-1:2 11 7 3 2 1 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.title1
{mso-style-name:title1;
font-family:"Arial","sans-serif";
color:#1F487E;
font-weight:normal;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>Please see the below quota listings.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><div><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Claire Fitz-Gerald<o:p></o:p></span></p><p class=MsoNormal><i><span style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p> </o:p></span></i></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance<o:p></o:p></span></b></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>~ Small Boats. Big Ideas. ~</span></b><b><span style='color:#DE3500'><o:p></o:p></span></b></p></div><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> David Leveille [mailto:nefs02@gmail.com] <br><b>Sent:</b> Wednesday, May 21, 2014 8:50 AM<br><b>To:</b> David Leveille<br><b>Subject:</b> NEFS 2 Available Quota 5/21<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#1F487E'>AVAILABLE QUOTA FY 2014</span><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'><o:p></o:p></span></p><table class=MsoNormalTable border=0 cellspacing=0 cellpadding=0 width="71%" style='width:71.28%'><tr><td width=220 style='width:164.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><b><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:black'>ID <o:p></o:p></span></b></p></td><td width=161 style='width:120.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Available Quota <o:p></o:p></span></b></p></td><td width=189 style='width:141.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Live Weight Pounds <o:p></o:p></span></b></p></td><td width=126 style='width:94.55pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Price <o:p></o:p></span></b></p></td><td width=168 style='width:125.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Date Posted <o:p></o:p></span></b></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1724<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>GOM BB<br>GREYSOLE<br>DABS<br>GOM YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>2328<br>445<br>3007<br>850<br>3101<br>1995<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p> </o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$9,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1578<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB BB<br>GB YT<br>SNE BB<br>SNE YT<br>GOM BB<br>Whake<br>POLL<br>RED<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>538<br>5894<br>1755<br>243<br>490<br>153<br>3965<br>2727<br>9227<br>15060<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.00<br>$0.40<br>$0.20<br>$1.00<br>$0.45<br>$0.50<br>$0.15<br>$0.20<br>$0.01<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>310<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>DABS<br>WHAKE<br>POLL<br>RED<br>SNE BB<br>GOM BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>825<br>9033<br>1241<br>3120<br>65234<br>76610<br>1688<br>1195<br>2121<br>7285<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p> </o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$15,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr style='height:23.25pt'><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>347<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>SNE BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>8,000<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$0.50<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/7<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878A<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>SNE BB<br>GOM BB<br>GB BB<br>GREYSOLE<br>GOM YT<br>SNE YT<br>POLL<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>6188<br>635<br>3916<br>7873<br>6762<br>3358<br>9776<br>271<br>186550<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.95<br>$1.35<br>$0.50<br>$0.50<br>$0.20<br>$1.40<br>$1.20<br>$0.50<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878B<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1113<br>12186<br>850<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<br>$10,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr></table><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>David Leveille<o:p></o:p></p><p class=MsoNormal>II Northeast Fishery Sector Inc.<o:p></o:p></p><p class=MsoNormal>10 Witham Street<o:p></o:p></p><p class=MsoNormal>Gloucester, MA. 01930<o:p></o:p></p><p class=MsoNormal>Cell 978 375 3509<o:p></o:p></p><p class=MsoNormal>Fax 978 281 1555<o:p></o:p></p><p class=MsoNormal>Web <a href="http://nefs2.com/">http://nefs2.com/</a><o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><div class=MsoNormal align=center style='text-align:center'><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'></body></html>
</body>
</html>
FW:NEFS 2可用配额5/21
div.headerdisplayname{字体大小:粗体;}
主题:FW:NEFS 2可用配额5/21日期:Claire Fitz Gerald日期:2014年5月21日上午10:08
请查看以下配额列表。
谢谢,
克莱尔·菲茨·杰拉尔德
小型渔船。好主意~
发件人:David Leveille[mailto:nefs02@gmail.com]
发送日期:5月21日,星期三,2014年8月50日上午
致:David Leveille
主题:NEFS 2可用配额5/21
2014财年可用配额
ID
可用配额
价格
发布日期
1724
GOM COD1995年
包装
9000美元class=MsoNormal style='line-height:15.0pt'>GBE-COD
gbbb
gbyt
SNE-BB
GOM-BB
Whake
POLL
RED
538
5894
1755
243
490
153
3965
2727
9227
1506010.20
0.45
0.45
0.45
0.50
0.50
0.50
0.50
0.15
0.15
0.15
0.15.0 pt>15.0 pt>15.0 pt>15.0 pt>15.0
0.15.50
0.15.50
0.15.50
0
0
0.50
0.50
0.15.50
0.15.15
0.15
0
0.15
0
0.15.15
0.15
0.15
0
0.15
0.15.15
0
0
825
9033
3120
65234
76610
1688
1195
2121
7285包装
class=MsoNormal-style='line-height:15.0pt'>347
SNE-BB
8000
0.50美元格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔
格奥尔样式class='line-height:15.0pt'>1.95美元
$1.35美元
$0.50
$0.20
$1.40
$1.20
$0.50
$0.01
5/12
1878B标准高度:15.0pt>1113
12186
850
Package
$10000
5/12
David levelle
II东北渔业公司街道,马萨诸塞州格洛斯特。01930
Cell 978 375 3509
传真978 281 1555
Web
此工作代码读取每个单元格,创建列表,然后列表到数据帧请注意,只有当一行中所有单元格中的项目数相同时,此代码才有效。
from bs4 import BeautifulSoup, NavigableString, Tag
import pandas as pd
import numpy as np
def celltext(cell):
'''
textlist=[]
for br in cell.findAll('br'):
next = br.nextSibling
if not (next and isinstance(next,NavigableString)):
continue
next2 = next.nextSibling
if next2 and isinstance(next2,Tag) and next2.name == 'br':
text = str(next).strip()
if text:
textlist.append(next)
return (textlist)
'''
textlist=[]
y = cell.find('span')
for a in y.childGenerator():
if isinstance(a, NavigableString):
textlist.append(str(a))
return (textlist)
html=open('patht\to\html.html','r').read()
soup = BeautifulSoup(html, 'lxml') # Parse the HTML as a string
table = soup.find_all('table')[1] # Grab the second table
df_Quota = pd.DataFrame()
for row in table.find_all('tr'):
columns = row.find_all('td')
if columns[0].get_text().strip()<>'ID': # skip header
Quota = celltext(columns[1])
Weight = celltext(columns[2])
price = celltext(columns[3])
Nrows= max([len(Quota),len(Weight),len(price)]) #get the max number of rows
IDList = [columns[0].get_text()] * Nrows
DateList = [columns[4].get_text()] * Nrows
if price[0].strip()=='Package':
price = [columns[3].get_text()] * Nrows
if len(Quota)<len(Weight): #if Quota has less itmes extened with nan
lstnans= [np.nan]*(len(Weight)-len(Quota))
Quota.extend(lstnans)
FinalDataframe = pd.DataFrame(
{
'ID':IDList,
'AvailableQuota': Quota,
'LiveWeightPounds': Weight,
'price':price,
'DatePosted':DateList
})
df_Quota= df_Quota.append(FinalDataframe)
print df_Quota
如何确定“可用配额”中有多少文本值,我确实看到一个或多个文本。另外,您希望第二排的价格是多少?现在更合理了,谢谢:)来源是excel文件吗?。你也许可以用这个。也在第
AvailableQuota DatePosted ID LiveWeightPounds price
0 GOM COD 5/12 1878A 6188 $1.95
1 GOM HADD 5/12 1878A 635 $1.35
2 SNE BB 5/12 1878A 3916 $0.50
3 GOM BB 5/12 1878A 7873 $0.50
4 GB BB 5/12 1878A 6762 $0.20
5 GREYSOLE 5/12 1878A 3358 $1.40
6 GOM YT 5/12 1878A 9776 $1.20
7 SNE YT 5/12 1878A 271 $0.50
8 POLL 5/12 1878A 186550 $0.01
0 GOM COD 5/20 1724 2328 Package $9,000
1 GOM HADD 5/20 1724 445 Package $9,000
2 GOM BB 5/20 1724 3007 Package $9,000
3 GREYSOLE 5/20 1724 850 Package $9,000
4 DABS 5/20 1724 3101 Package $9,000
5 GOM YT 5/20 1724 1995 Package $9,000
0 GBE COD 5/20 1578 538 $1.00
1 GBW COD 5/20 1578 5894 $0.40
2 GB BB 5/20 1578 1755 $0.20
3 GB YT 5/20 1578 243 $1.00
4 SNE BB 5/20 1578 490 $0.45
5 SNE YT 5/20 1578 153 $0.50
6 GOM BB 5/20 1578 3965 $0.15
7 Whake 5/20 1578 2727 $0.20
8 POLL 5/20 1578 9227 $0.01
9 RED 5/20 1578 15060 $0.01
0 GBE COD 5/20 310 825 Package $15,000
1 GBW COD 5/20 310 9033 Package $15,000
2 DABS 5/20 310 1241 Package $15,000
3 WHAKE 5/20 310 3120 Package $15,000
4 POLL 5/20 310 65234 Package $15,000
5 RED 5/20 310 76610 Package $15,000
6 SNE BB 5/20 310 1688 Package $15,000
7 GOM BB 5/20 310 1195 Package $15,000
8 NaN 5/20 310 2121 Package $15,000
9 NaN 5/20 310 7285 Package $15,000
0 SNE BB 5/7 347 8,000 $0.50
0 GOM COD 5/12 1878A 6188 $1.95
1 GOM HADD 5/12 1878A 635 $1.35
2 SNE BB 5/12 1878A 3916 $0.50
3 GOM BB 5/12 1878A 7873 $0.50
4 GB BB 5/12 1878A 6762 $0.20
5 GREYSOLE 5/12 1878A 3358 $1.40
6 GOM YT 5/12 1878A 9776 $1.20
7 SNE YT 5/12 1878A 271 $0.50
8 POLL 5/12 1878A 186550 $0.01
0 GBE COD 5/12 1878B 1113 Package$10,000
1 GBW COD 5/12 1878B 12186 Package$10,000
2 GB YT 5/12 1878B 850 Package$10,000