Python BeautifulSoup用“填写缺失信息”;不适用;不起作用
我正在以下网站上练习我的网页抓取技能:“ 到目前为止,我掌握的代码如下。看起来我得到的公司计数是正确的,但我的CSV文件中有重复的行,我认为每当公司丢失信息时就会出现这种情况。在我的代码的多个部分中,我试图检测缺失的信息并用文本“N/A”替换,但它不起作用。我猜这个问题可能与Zip()函数有关,但我不确定如何修复它 非常感谢您的帮助Python BeautifulSoup用“填写缺失信息”;不适用;不起作用,python,csv,beautifulsoup,Python,Csv,Beautifulsoup,我正在以下网站上练习我的网页抓取技能:“ 到目前为止,我掌握的代码如下。看起来我得到的公司计数是正确的,但我的CSV文件中有重复的行,我认为每当公司丢失信息时就会出现这种情况。在我的代码的多个部分中,我试图检测缺失的信息并用文本“N/A”替换,但它不起作用。我猜这个问题可能与Zip()函数有关,但我不确定如何修复它 非常感谢您的帮助 """ Grabs brewery name, contact person, phone number, website address, and email a
"""
Grabs brewery name, contact person, phone number, website address, and email address
for each brewery listed on the website.
"""
import requests, csv
from bs4 import BeautifulSoup
url = "http://web.californiacraftbeer.com/Brewery-Member"
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
each_company = soup.find_all("div", {"class": "ListingResults_All_CONTAINER ListingResults_Level3_CONTAINER"})
error_msg = "N/A"
def scraper():
"""Grabs information and writes to CSV"""
print("Running...")
results = []
count = 0
for info in each_company:
try:
company_name = info.find_all("span", itemprop="name")
except Exception as e:
company_name = "N/A"
try:
contact_name = info.find_all("div", {"class": "ListingResults_Level3_MAINCONTACT"})
except Exception as e:
contact_name = "N/A"
try:
phone_number = info.find_all("div", {"class": "ListingResults_Level3_PHONE1"})
except Exception as e:
phone_number = "N/A"
try:
website = info.find_all("span", {"class": "ListingResults_Level3_VISITSITE"})
except Exception as e:
website = "N/A"
for company, contact, phone, site in zip(company_name, contact_name, phone_number, website):
count += 1
print("Grabbing {0} ({1})...".format(company.text, count))
newrow = []
try:
newrow.append(company.text)
except Exception as e:
newrow.append(error_msg)
try:
newrow.append(contact.text)
except Exception as e:
newrow.append(error_msg)
try:
newrow.append(phone.text)
except Exception as e:
newrow.append(error_msg)
try:
newrow.append(site.find('a')['href'])
except Exception as e:
newrow.append(error_msg)
try:
newrow.append("info@" + company.text.replace(" ", "").lower() + ".com")
except Exception as e:
newrow.append(error_msg)
results.append(newrow)
print("Done")
outFile = open("brewery.csv", "w")
out = csv.writer(outFile, delimiter=',',quoting=csv.QUOTE_ALL, lineterminator='\n')
out.writerows(results)
outFile.close()
def main():
"""Runs web scraper"""
scraper()
if __name__ == '__main__':
main()
从
如果find_all()找不到任何内容,它将返回一个空列表。如果
find()找不到任何内容,它返回None“
因此,例如,当company\u name=info.find\u all(“span”,itemprop=“name”)
运行并且不匹配任何内容时,它不会抛出异常,并且不会设置“NA”
在这种情况下,您需要检查公司名称
是否为空列表:
if not company_name:
company_name = "N/A"