Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
无法使我的python web抓取脚本与多处理一起工作_Python_Python 3.x_Web Scraping - Fatal编程技术网

无法使我的python web抓取脚本与多处理一起工作

无法使我的python web抓取脚本与多处理一起工作,python,python-3.x,web-scraping,Python,Python 3.x,Web Scraping,我从csv中读取我的URL,并希望在最后将结果导出到新的csv中。我使用了大约60个URL,如下所示 import csv from bs4 import BeautifulSoup import requests from time import sleep from multiprocessing import Pool contents = [] with open('websupplies2.csv') as csvf: reader = csv.reader(csvf, de

我从csv中读取我的URL,并希望在最后将结果导出到新的csv中。我使用了大约60个URL,如下所示

import csv
from bs4 import BeautifulSoup 
import requests 
from time import sleep
from multiprocessing import Pool

contents = []

with open('websupplies2.csv') as csvf:
 reader = csv.reader(csvf, delimiter=";")
 for row in reader:
    contents.append(row) # Add each url to list contents

 price_text='-'
 availability_text='-'

def parse(contents):
  info = []
  with open('output_websupplies.csv', mode='w') as f:
  f_writer = csv.writer(f, delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL)
  f_writer.writerow(['SKU','Price','Availability'])

  for row in contents:  # Parse through each url in the list.
  sleep(3)
  page = requests.get(row[1]).content
  soup = BeautifulSoup(page, "html.parser")

  price = soup.find('div', attrs={'class':'product-price'})
  if price is not None:
   price_text = price.text.strip()
   print(price_text)
  else:
   price_text = "0,00"
   print(price_text)

  availability = soup.find('div', attrs={'class':'available-text'})
  if availability is not None:
   availability_text = availability.text.strip()
   print(availability_text)
  else:
   availability_text = "Μη Διαθέσιμο"
   print(availability_text)

  info.append(row[0])
  info.append(price_text)
  info.append(availability_text)

return ';'.join(info)     

if __name__ == "__main__":
 with Pool(10) as p:
 records = p.map(parse, contents)

if len(records) > 0:
 with open('output_websupplies.csv', 'a+') as f:
    f.write('\n'.join(records))

但我得到了错误消息,比如没有定义名称错误记录。要使脚本正常工作,我应该更改什么?

首先仔细检查缩进。在这里粘贴的内容看起来不一致,如果
if len(records)>0:
行确实没有缩进,那么肯定会出现名称错误

为了使语句位于块内,它的缩进必须等于块中的其他语句,并且大于打开块的行。换句话说,
if
语句中的所有内容都应该对齐。例如:

如果名称=“\uuuuu main\uuuuuuuu”:
将池(10)作为p:
记录=p.map(解析,内容)
如果len(记录)>0:
将open('output_websupplies.csv','a+')作为f:
f、 写入('\n'.加入(记录))

我的第一个错误来自info.append(第[0]行)info.append(价格文本)info.append(可用性文本)@Evridiki-我在回答中添加了一个示例。最后一部分的逻辑正确吗?我没有忘记什么?@Evridiki-我不熟悉
Pool
API,但是除了缩进之外,该块的其余部分看起来不错。我得到了一个无效的URL“d”:没有提供架构。也许你的意思是http://d?info.append(availability\u text)错误在添加池API之前,visual studio上的缩进看起来正常