Python 2.7:AttributeError:';列表';对象没有属性';获取';
我已经建立了一个脚本,可以抓取英国的法院列表,生成指向每个法院地址页面的链接列表,然后想从该页面中抓取地址 到目前为止,它工作得很好,但我被困在“写入csv”位。我认为这与Python 2.7:AttributeError:';列表';对象没有属性';获取';,python,iteration,export-to-csv,Python,Iteration,Export To Csv,我已经建立了一个脚本,可以抓取英国的法院列表,生成指向每个法院地址页面的链接列表,然后想从该页面中抓取地址 到目前为止,它工作得很好,但我被困在“写入csv”位。我认为这与iteritems()缺少基于。我发现一个迭代器没有与iterable相同的方法(我在代码中使用了一个迭代器),但它并没有帮助我解决我的特定问题 这是我的密码: import csv import time import random import requests from bs4 import BeautifulSoup
iteritems()
缺少基于。我发现一个迭代器没有与iterable相同的方法(我在代码中使用了一个迭代器),但它并没有帮助我解决我的特定问题
这是我的密码:
import csv
import time
import random
import requests
from bs4 import BeautifulSoup as bs
# lambda expression to request url and parse it through bs
soup = lambda url: bs((requests.get(url)).text, "html.parser")
def crawl_court_listings(base, buff, char):
""" """
# common URL segment + cuffer URL segment + end character -> URL
url = base + buff + str(chr(char))
# soup lambda expression -> grab first unordered list
links = (soup(url)).find('div', {'class', 'content inner cf'}).find('ul')
# empty dictionary
results = {}
# loop through links, get link title and href
for item in links.find_all('a', href=True):
court_link = item['href']
title = item.string
# generate full court address page url from href
full_court_link = base + court_link
# save title and full URL to results
results[title] = full_court_link
# increment char var by 1
char += 1
# return results dict and incremented char value
return results, char
def get_court_address(court_name, full_court_link):
""" """
# get horrible chunk of poorly formatted address(es)
address_blob = (soup(full_court_link)).find('div', {'id': 'addresses'}).text
# clean the blob
clean_address = ("\n".join(line.strip() for line in address_blob.split("\n")))
# write to csv
with open('court_addresses.csv', 'w') as csvfile:
fieldnames = [court_name, full_court_link, clean_address]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow(fieldnames)
if __name__ == "__main__":
base = 'https://courttribunalfinder.service.gov.uk/'
buff = 'courts/'
# 65 = "A". Starting from Char "A", retrieve list of Titles and Links of for Court Addresses. Return Char +1
results, char = crawl_court_listings(base, buff, 65)
# 90 = "Z". Until Z, pass title and list from results into get_court_address(), then wait a few seconds
while char <= 90:
for t, l in results.iteritems():
get_court_address(t, l)
time.sleep(random.randint(0,5))
因此,我的第一个想法是,我试图将多行文本写入一个单元格,这导致了错误,但不确定如何确认这一点。我使用了print(type(address))
,它返回为unicode
,而不是列表,因此我认为这并不是问题的根源。我不明白它是从哪里得到问题相关的列表的,如果这有意义的话
如果是iteritems()
有人能解释一下错误并告诉我解决的方向吗?对于您正在写的每一行,您都需要传入一个字典-您正在传入标题列表
这句话应该是这样的:
{'court_name':X,'full_court_link':Y,'clean_address':Z}
HTH您的问题在于:
writer.writerow(fieldnames)
“字段名”是字段名的列表。您需要传递键值对的dict。所以它看起来应该更像这样:
Write to us:
1st Floor
Piccadilly Exchange
Piccadilly Plaza
Manchester
Greater Manchester
M1 4AH
# write to csv
with open('court_addresses.csv', 'w') as csvfile:
# note - these are strings, not variables
fieldnames = ['court_name', 'full_court_link', 'clean_address']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({"court_name" : court_name,
"full_court_link" : full_court_link},
"clean_address" : clean_address})
PSST:你还有一个问题。您正在为您解析的每个法庭重新打开输出文件。您可能想打开该文件一次(在_main __下),然后将句柄传递到get_court_address()当您有一个列表而不是一个dict时,为什么要使用DictWriter?您的问题是您使用csv.DictWriter错误-这一行尤其是csv.writerow(fieldname)
。.writerow()
的输入必须是一个dict
而不是一个列表。感谢在main下打开一次的提示,这非常有用
writer.writerow(fieldnames)
# write to csv
with open('court_addresses.csv', 'w') as csvfile:
# note - these are strings, not variables
fieldnames = ['court_name', 'full_court_link', 'clean_address']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({"court_name" : court_name,
"full_court_link" : full_court_link},
"clean_address" : clean_address})
with open('court_addresses.csv', 'w') as csvfile:
fieldnames = ['court_name', 'full_court_link', 'clean_address']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({'court_name': court_name, 'full_court_link': full_court_link, 'clean_address': clean_address})