Python web scrape中的CSV格式问题_Python_Loops_Csv_Beautifulsoup

Python web scrape中的CSV格式问题

python loops csv

Python web scrape中的CSV格式问题,python,loops,csv,beautifulsoup,Python,Loops,Csv,Beautifulsoup,我编写了以下脚本以从download.com获取一些数据： from bs4 import BeautifulSoup import urllib2 import csv query = raw_input("Please enter the query value: ") limit = raw_input("Please enter the results limit (1-100): ") minreview = raw_input("Please enter the min user

我编写了以下脚本以从download.com获取一些数据：

from bs4 import BeautifulSoup
import urllib2
import csv

query = raw_input("Please enter the query value: ")
limit = raw_input("Please enter the results limit (1-100): ")
minreview = raw_input("Please enter the min user rating (1-5): ")
maxreview = raw_input("Please enter the max user rating (1-5): ")
csvname = raw_input("Enter a filename.csv for CSV output: ")

cnetFile = urllib2.urlopen("http://developer.api.cnet.com/rest/v1.0/softwareProductSearch?partKey=APIKEYGOESHERE&partTag=APIKEYGOESHERE&query=" + query + "$
cnetXml = cnetFile.read()
cnetFile.close()

soup = BeautifulSoup(cnetXml, features="xml")
#print soup.prettify()

f = csv.writer(open(csvname, "w"))
f.writerow(["Name", "Link", "Mfg", "Mfg Link", "Price", "Downloads", "User Rating Summary", "User Rating Product"])

data = soup.find_all(['Name', 'Price', 'TotalDownloads', 'LinkURL', 'Rating'])
#print data

for x in data:
        strip1 = x.contents
        print strip1
        f.writerow(strip1)

返回2个产品时，CSV的输出如下所示：（每个产品应返回8个字段，如代码中的标题中所示，但偶尔会缺少一个字段，如“2”下第一个产品中的第八个字段。）

以下是soup变量中的数据示例：

<?xml version="1.0" encoding="utf-8"?>
<CNETResponse realm="cnet" version="1.0" xmlns="http://developer.api.cnet.com/re
st/v1.0/ns" xmlns:xlink="http://www.w3.org/1999/xlink">
<SoftwareProducts numFound="898" numReturned="2" start="0">
<SoftwareProduct id="11889531" setId="10367545" xlink:href="http://developer.api
.cnet.com/rest/v1.0/softwareProduct?productSetId=10367545&amp;iod=userRatings&am
p;partKey=572kjgq8h2mqbsup36cubkys&amp;partTag=572kjgq8h2mqbsup36cubkys">
<Name>Firegraphic</Name>
<Version>11.0</Version>
<LinkURL>http://www.download.com/firegraphic/3000-2192_4-10367545.html?tag=api</
LinkURL>
<Publisher id="6268727">
<Name>Firegraphic</Name>
<LinkURL>http://www.firegraphic.com</LinkURL>
<UrsRegId/>
</Publisher>
<License>Free to try</License>
<BetaRelease>false</BetaRelease>
<Price currency="USD">$49.95</Price>
<Summary>Import, organize, view, edit, print, and share your digital images.</Su
mmary>
<Description>&lt;p&gt;Firegraphic is an image viewer for photography professiona
ls, Web, and graphic designers to import, organize, view, edit, print, and share
 their digital images. The new Firegraphic has improved its memory usage and con
sumes very low memory, which leaves more memory for you to edit your photos in t
he image editor. Firegraphic now supports the RAW file formats from digital came
ras. Firegraphic gives you the ability to open multiple photos in the Viewer and
 compare photos side-by-side to choose your best shot. You also can customize th
e tools in your toolbar and the Context menu in the Viewer. The Firegraphic user
 interface lets you change the skin color and edit photos with a third-party ima
ge editor.&lt;/p&gt;</Description>
<WhatsNew/>
<Requirements> </Requirements>
<Platform>Windows</Platform>
<OperatingSystems>
<OperatingSystem id="3">Windows</OperatingSystem>
<OperatingSystem id="17">Windows 2000</OperatingSystem>
<OperatingSystem id="25">Windows XP</OperatingSystem>
<OperatingSystem id="43">Windows 2003</OperatingSystem>
<OperatingSystem id="52">Windows Vista</OperatingSystem>
<OperatingSystem id="133">Windows 7</OperatingSystem>
</OperatingSystems>
<EditorsRating outOf="5">3.0</EditorsRating>
<EditorsNote/>
<PreferredNode id="2192"/>
<WeeklyDownloads>8</WeeklyDownloads>
<TotalDownloads>2546868</TotalDownloads>
<CreatedDate>2011-04-21 17:41:19.0</CreatedDate>
<ReleaseDate>2011-04-21 00:00:00.0</ReleaseDate>
<ReviewDate>2008-11-09 00:00:00.0</ReviewDate>
<Limitations>30-day trial</Limitations>
<BuyNowUrl type=""> </BuyNowUrl>
<TrialPayUrl/>
<CleverBridgeUrl/>
<UpsellUnit/>
<ButtonPartner/>
<CNETContentIds/>
<FileSize>8358576</FileSize>
<Category id="2192" xlink:href="http://developer.api.cnet.com/rest/v1.0/category
?categoryId=2192&amp;siteId=4&amp;partKey=572kjgq8h2mqbsup36cubkys&amp;partTag=5
72kjgq8h2mqbsup36cubkys"/>
<UserRatingSummary>
<Rating outOf="5">2.0</Rating>
<TotalVotes>7</TotalVotes>
</UserRatingSummary>
<UserRatingProduct>
<Rating outOf="5"/>
<TotalVotes>0</TotalVotes>
</UserRatingProduct>
<EditorsPick/>
<ListingType>STANDARD</ListingType>
</SoftwareProduct>
<SoftwareProduct id="10296367" setId="10065486" xlink:href="http://developer.api
.cnet.com/rest/v1.0/softwareProduct?productSetId=10065486&amp;iod=userRatings&am
p;partKey=572kjgq8h2mqbsup36cubkys&amp;partTag=572kjgq8h2mqbsup36cubkys">
<Name>MP3 CD Maker</Name>
<Version>2.0</Version>
<LinkURL>http://www.download.com/mp3-cd-maker/3000-2140_4-10065486.html?tag=api<
/LinkURL>
<Publisher id="83016">
<Name>ZY Computing</Name>
<LinkURL>http://www.dvdsanta.com</LinkURL>
<UrsRegId/>
</Publisher>
<License>Free to try</License>
<BetaRelease>false</BetaRelease>
<Price currency="USD">$24.95</Price>
<Summary>Create audio CDs from your MP3 collection.</Summary>
<Description>&lt;p&gt;MP3 CD Maker works with a CD recorder to create audio CDs
from collections of MP3 audio files. It directly converts MP3 files into the CD
audio format and can decode any MP3 file into WAV or raw audio. A normalization
feature lets you ensure that all MP3s in a set have the same volume level. &lt;/
p&gt;&lt;p&gt;Version 2.0 adds support for 200 more CD-R/RW drives.&lt;/p&gt;</D
escription>
<WhatsNew/>
<Requirements>Windows 95/98/Me/NT/2000/XP</Requirements>
<Platform>Windows</Platform>
<OperatingSystems>
<OperatingSystem id="3">Windows</OperatingSystem>
<OperatingSystem id="5">Windows 95</OperatingSystem>
<OperatingSystem id="8">Windows NT</OperatingSystem>
<OperatingSystem id="6">Windows 98</OperatingSystem>
<OperatingSystem id="7">Windows Me</OperatingSystem>
<OperatingSystem id="17">Windows 2000</OperatingSystem>
<OperatingSystem id="25">Windows XP</OperatingSystem>
</OperatingSystems>
<EditorsRating outOf="5">4.0</EditorsRating>
<EditorsNote/>
<PreferredNode id="2140"/>
<WeeklyDownloads>103</WeeklyDownloads>
<TotalDownloads>1653394</TotalDownloads>
<CreatedDate>2004-06-16 19:07:46.0</CreatedDate>
<ReleaseDate>2004-06-16 00:00:00.0</ReleaseDate>
<ReviewDate>2009-02-27 00:00:00.0</ReviewDate>
<Limitations>limited to 4 songs on a CD</Limitations>
<BuyNowUrl type="dl_buy_ond">http://send.onenetworkdirect.net/z/126524/CD103284/
</BuyNowUrl>
<TrialPayUrl/>
<CleverBridgeUrl/>
<UpsellUnit/>
<ButtonPartner/>
<CNETContentIds/>
<FileSize>1283187</FileSize>
<Category id="2140" xlink:href="http://developer.api.cnet.com/rest/v1.0/category
?categoryId=2140&amp;siteId=4&amp;partKey=572kjgq8h2mqbsup36cubkys&amp;partTag=5
72kjgq8h2mqbsup36cubkys"/>
<UserRatingSummary>
<Rating outOf="5">2.0</Rating>
<TotalVotes>3</TotalVotes>
</UserRatingSummary>
<UserRatingProduct>
<Rating outOf="5">2.0</Rating>
<TotalVotes>3</TotalVotes>
</UserRatingProduct>
<EditorsPick/>
<ListingType>STANDARD</ListingType>
</SoftwareProduct>
</SoftwareProducts>
</CNETResponse>

我留下的唯一问题是，有时“评级”的“全部查找”得到2个评级，有时只有1个评级。这让我每8行就开始一行，因为有时只返回7个项目。如果只返回一个评级，如何在第二个“评级”中打印“无”

对数据使用

writerow（）

，就像您已经对标题使用的一样。由于

contents

属性返回一个列表，您不需要转换任何内容：

for x in data:
    strip1 = x.contents
    f.writerow(strip1)

编辑：如果由于

内容

每次返回一个元素而导致上述解决方案不起作用，请尝试将其保存到数组中，并在末尾打印：

strip1 = []
for x in data:
    strip1.extend(x.contents)
f.writerow(strip1)

新建编辑：查看您的

xml

文件后，我的方法是循环遍历每个

元素，并从中提取所需的字段，如：

for product in soup.find_all('SoftwareProduct'):
    strip1 = []
    strip1.extend(product.Name.contents)
    strip1.extend(product.Price.contents)
    ...
    f.writerow(strip1)

对数据使用

writerow（）

，就像您已经对标题使用一样。由于

contents

属性返回一个列表，您不需要转换任何内容：

for x in data:
    strip1 = x.contents
    f.writerow(strip1)

编辑：如果由于

内容

每次返回一个元素而导致上述解决方案不起作用，请尝试将其保存到数组中，并在末尾打印：

strip1 = []
for x in data:
    strip1.extend(x.contents)
f.writerow(strip1)

新建编辑：查看您的

xml

文件后，我的方法是循环遍历每个

元素，并从中提取所需的字段，如：

for product in soup.find_all('SoftwareProduct'):
    strip1 = []
    strip1.extend(product.Name.contents)
    strip1.extend(product.Price.contents)
    ...
    f.writerow(strip1)

标题是我想要的，但问题是我的数据没有在列中填充。它正在向下填充到行中。@Christopher:writerow（）是否也在不同的行中添加了它的字段？是的，第一个带有标题信息的writerow将进入不同的列中。第二个写数据的writerow是直接下到行中，而不是跨到列中。您能提供您要取消的

url

是什么吗？现在一切都很好。我没有注意到软件产品标签把每一个都包起来，所以这真的很有帮助。再次感谢您在这方面的帮助，我非常感谢！标题是我想要的，但问题是我的数据没有在列中填充。它正在向下填充到行中。@Christopher:writerow（）是否也在不同的行中添加了它的字段？是的，第一个带有标题信息的writerow将进入不同的列中。第二个写数据的writerow是直接下到行中，而不是跨到列中。您能提供您要取消的

url

是什么吗？现在一切都很好。我没有注意到软件产品标签把每一个都包起来，所以这真的很有帮助。再次感谢您在这方面的帮助，我非常感谢！