无法使我的python web抓取脚本与多处理一起工作_Python_Python 3.x_Web Scraping

无法使我的python web抓取脚本与多处理一起工作

python python-3.x web-scraping

无法使我的python web抓取脚本与多处理一起工作,python,python-3.x,web-scraping,Python,Python 3.x,Web Scraping,我从csv中读取我的URL，并希望在最后将结果导出到新的csv中。我使用了大约60个URL，如下所示 import csv from bs4 import BeautifulSoup import requests from time import sleep from multiprocessing import Pool contents = [] with open('websupplies2.csv') as csvf: reader = csv.reader(csvf, de

我从csv中读取我的URL，并希望在最后将结果导出到新的csv中。我使用了大约60个URL，如下所示

import csv
from bs4 import BeautifulSoup 
import requests 
from time import sleep
from multiprocessing import Pool

contents = []

with open('websupplies2.csv') as csvf:
 reader = csv.reader(csvf, delimiter=";")
 for row in reader:
    contents.append(row) # Add each url to list contents

 price_text='-'
 availability_text='-'

def parse(contents):
  info = []
  with open('output_websupplies.csv', mode='w') as f:
  f_writer = csv.writer(f, delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL)
  f_writer.writerow(['SKU','Price','Availability'])

  for row in contents:  # Parse through each url in the list.
  sleep(3)
  page = requests.get(row[1]).content
  soup = BeautifulSoup(page, "html.parser")

  price = soup.find('div', attrs={'class':'product-price'})
  if price is not None:
   price_text = price.text.strip()
   print(price_text)
  else:
   price_text = "0,00"
   print(price_text)

  availability = soup.find('div', attrs={'class':'available-text'})
  if availability is not None:
   availability_text = availability.text.strip()
   print(availability_text)
  else:
   availability_text = "Μη Διαθέσιμο"
   print(availability_text)

  info.append(row[0])
  info.append(price_text)
  info.append(availability_text)

return ';'.join(info)     

if __name__ == "__main__":
 with Pool(10) as p:
 records = p.map(parse, contents)

if len(records) > 0:
 with open('output_websupplies.csv', 'a+') as f:
    f.write('\n'.join(records))

但我得到了错误消息，比如没有定义名称错误记录。要使脚本正常工作，我应该更改什么？

首先仔细检查缩进。在这里粘贴的内容看起来不一致，如果

if len（records）>0:

行确实没有缩进，那么肯定会出现名称错误

为了使语句位于块内，它的缩进必须等于块中的其他语句，并且大于打开块的行。换句话说，

if

语句中的所有内容都应该对齐。例如：

如果名称=“\uuuuu main\uuuuuuuu”：
将池（10）作为p：
记录=p.map（解析，内容）
如果len（记录）>0：
将open（'output_websupplies.csv'，'a+'）作为f：
f、 写入（'\n'.加入（记录））

我的第一个错误来自info.append（第[0]行）info.append（价格文本）info.append（可用性文本）@Evridiki-我在回答中添加了一个示例。最后一部分的逻辑正确吗？我没有忘记什么？@Evridiki-我不熟悉

Pool

API，但是除了缩进之外，该块的其余部分看起来不错。我得到了一个无效的URL“d”：没有提供架构。也许你的意思是http://d？info.append（availability\u text）错误在添加池API之前，visual studio上的缩进看起来正常