通过命令行运行的Python脚本未创建CSV
我是Python新手,正在抓取一个站点来收集库存信息。库存项目分布在网站的6个页面上。抓取过程非常顺利,我能够解析出我想要选择的所有HTML元素 现在,我将把它带到下一步,并尝试使用Python3中包含的csv.writer将其导出到csv文件中。脚本在我的命令行中运行,不会出现任何语法错误,但不会创建csv文件。我想知道我的脚本是否有任何明显的问题,或者在尝试将解析后的HTML元素放入csv时遗漏了什么 这是我的密码:通过命令行运行的Python脚本未创建CSV,python,csv,export-to-csv,Python,Csv,Export To Csv,我是Python新手,正在抓取一个站点来收集库存信息。库存项目分布在网站的6个页面上。抓取过程非常顺利,我能够解析出我想要选择的所有HTML元素 现在,我将把它带到下一步,并尝试使用Python3中包含的csv.writer将其导出到csv文件中。脚本在我的命令行中运行,不会出现任何语法错误,但不会创建csv文件。我想知道我的脚本是否有任何明显的问题,或者在尝试将解析后的HTML元素放入csv时遗漏了什么 这是我的密码: import requests import csv from bs4 i
import requests
import csv
from bs4 import BeautifulSoup
main_used_page = 'https://www.necarconnection.com/used-vehicles/'
page = requests.get(main_used_page)
soup = BeautifulSoup(page.text,'html.parser')
def get_items(main_used_page,urls):
main_site = 'https://www.necarconnection.com/'
counter = 0
for x in urls:
site = requests.get(main_used_page + urls[counter])
soup = BeautifulSoup(site.content,'html.parser')
counter +=1
for item in soup.find_all('li'):
vehicle = item.find('div',class_='inventory-post')
image = item.find('div',class_='vehicle-image')
price = item.find('div',class_='price-top')
vin = item.find_all('div',class_='vinstock')
try:
url = image.find('a')
link = url.get('href')
pic_link = url.img
img_url = pic_link['src']
if 'gif' in pic_link['src']:img_url = pic_link['data-src']
landing = requests.get(main_site + link)
souped = BeautifulSoup(landing_page.content,'html.parser')
comment = ''
for comments in souped.find_all('td',class_='results listview'):
com = comments.get_text()
comment += com
with open('necc-december.csv','w',newline='') as csv_file:
fieldnames = ['CLASSIFICATION','TYPE','PRICE','VIN',
'INDEX','LINK','IMG','DESCRIPTION']
writer = csv.DictWriter(csv_file,fieldnames=fieldnames)
writer.writeheader()
writer.writerow({
'CLASSIFICATION':vehicle['data-make'],
'TYPE':vehicle['data-type'],
'PRICE':price,
'VIN':vin,
'INDEX':vehicle['data-location'],
'LINK':link,
'IMG':img_url,
'DESCRIPTION':comment})
except TypeError: None
except AttributeError: None
except UnboundLocalError: None
urls = ['']
counter = 0
prev = 0
for x in range(100):
site = requests.get(main_used_page + urls[counter])
soup = BeautifulSoup(site.content,'html.parser')
for button in soup.find_all('a',class_='pages'):
if button['class'] == ['prev']:
prev +=1
if button['class'] == ['next']:
next_url = button.get('href')
if next_url not in urls:
urls.append(next_url)
counter +=1
if prev - 1 > counter:break
get_items(main_used_page,urls)
下面是通过命令行处理脚本后的屏幕截图:
脚本运行需要一段时间,因此我知道脚本正在被读取和处理。我只是不确定这和实际制作csv文件之间出了什么问题
我希望这是有帮助的。同样,如果您有任何关于使用Python 3 csv.writer的提示或技巧,我将不胜感激,因为我已经尝试了多种不同的变体。我发现您编写csv的代码运行良好。这里是隔离的
import csv
vehicle = {'data-make': 'Buick',
'data-type': 'Sedan',
'data-location': 'Bronx',
}
price = '8000.00'
vin = '11040VDOD330C0D0D003'
link = 'https://www.necarconnection.com/someplace'
img_url = 'https://www.necarconnection.com/image/someimage'
comment = 'Fine Car'
with open('necc-december.csv','w',newline='') as csv_file:
fieldnames = ['CLASSIFICATION','TYPE','PRICE','VIN',
'INDEX','LINK','IMG','DESCRIPTION']
writer = csv.DictWriter(csv_file,fieldnames=fieldnames)
writer.writeheader()
writer.writerow({
'CLASSIFICATION':vehicle['data-make'],
'TYPE':vehicle['data-type'],
'PRICE':price,
'VIN':vin,
'INDEX':vehicle['data-location'],
'LINK':link,
'IMG':img_url,
'DESCRIPTION':comment})
它将创建necc-december.csv罚款:
CLASSIFICATION,TYPE,PRICE,VIN,INDEX,LINK,IMG,DESCRIPTION
Buick,Sedan,8000.00,11040VDOD330C0D0D003,Bronx,https://www.necarconnection.com/someplace,https://www.necarconnection.com/image/someimage,Fine Car
我认为问题在于代码找不到任何带有class='next'的按钮
要运行代码,我必须初始化下一个url
next_url = None
然后把你的情况从
if next_url not in urls:
到
我在for循环中添加了调试:
for button in soup.find_all('a',class_='pages'):
print ('button:', button)
得到了这个输出:
button: <a class="pages current" data-page="1" href="javascript:void(0);">1</a>
button: <a class="pages" data-page="2" href="javascript:void(0);">2</a>
button: <a class="pages" data-page="3" href="javascript:void(0);">3</a>
button: <a class="pages" data-page="4" href="javascript:void(0);">4</a>
button: <a class="pages" data-page="5" href="javascript:void(0);">5</a>
button: <a class="pages" data-page="6" href="javascript:void(0);">6</a>
button: <a class="pages current" data-page="1" href="javascript:void(0);">1</a>
button: <a class="pages" data-page="2" href="javascript:void(0);">2</a>
button: <a class="pages" data-page="3" href="javascript:void(0);">3</a>
button: <a class="pages" data-page="4" href="javascript:void(0);">4</a>
button: <a class="pages" data-page="5" href="javascript:void(0);">5</a>
button: <a class="pages" data-page="6" href="javascript:void(0);">6</a>
按钮:
按钮:
按钮:
按钮:
按钮:
按钮:
按钮:
按钮:
按钮:
按钮:
按钮:
按钮:
因此,没有class='next'按钮。您正在循环中写入csv,因此每次传递都会覆盖文件。请尝试附加到文件,而不是写入<代码>打开('necc-defect.csv','a',换行符='')谢谢-您建议将该部分放在脚本的其他位置吗?我曾尝试取消缩进,但收到语法错误“unexpected unindent”,这是您自上次通过所有测试以来所做的最后一次更改。那里会有错误。您正在进行测试驱动开发?如果没有,那么您需要调试它。调试是编程中最难做的事情,也许是最难做的事情。为了避免调试,请进行测试驱动的开发。@Sabrina。我会将所有信息存储在一个
字典中
,在所有数据调用函数write to CVS并将所有数据写入文件之后
button: <a class="pages current" data-page="1" href="javascript:void(0);">1</a>
button: <a class="pages" data-page="2" href="javascript:void(0);">2</a>
button: <a class="pages" data-page="3" href="javascript:void(0);">3</a>
button: <a class="pages" data-page="4" href="javascript:void(0);">4</a>
button: <a class="pages" data-page="5" href="javascript:void(0);">5</a>
button: <a class="pages" data-page="6" href="javascript:void(0);">6</a>
button: <a class="pages current" data-page="1" href="javascript:void(0);">1</a>
button: <a class="pages" data-page="2" href="javascript:void(0);">2</a>
button: <a class="pages" data-page="3" href="javascript:void(0);">3</a>
button: <a class="pages" data-page="4" href="javascript:void(0);">4</a>
button: <a class="pages" data-page="5" href="javascript:void(0);">5</a>
button: <a class="pages" data-page="6" href="javascript:void(0);">6</a>