Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
无法读取带有URL的csv以在python中对其进行web刮取_Python_Web Scraping - Fatal编程技术网

无法读取带有URL的csv以在python中对其进行web刮取

无法读取带有URL的csv以在python中对其进行web刮取,python,web-scraping,Python,Web Scraping,我是python新手,所以我尝试了VisualStudio和Windows7 import csv from bs4 import BeautifulSoup import requests contents = [] with open('websupplies.csv','r') as csvf: # Open file in read mode urls = csv.reader(csvf) for url in urls: contents.append(

我是python新手,所以我尝试了VisualStudio和Windows7

import csv
from bs4 import BeautifulSoup 
import requests 

contents = []
with open('websupplies.csv','r') as csvf: # Open file in read mode
   urls = csv.reader(csvf)

   for url in urls:
      contents.append(url) # Add each url to list contents


for url in contents:  # Parse through each url in the list.
   page = requests.get(url).content
   soup = BeautifulSoup(page, "html.parser")

   price = soup.find('span', attrs={'itemprop':'price'})
   availability = soup.find('div', attrs={'class':'product-availability'})
https://www.websupplies.gr/epeksergastis-intel-core-i5-8400-9mb-2-80ghz-bx80684i58400
https://www.websupplies.gr/epeksergastis-intel-celeron-g3930-2mb-2-90ghz-bx80677g3930
https://www.websupplies.gr/epeksergastis-amd-a6-9500-bristol-ridge-dual-core-3-5ghz-socket-am4-65w-ad9500agabbox
但我得到-找不到..'的连接适配器['a url']'

为什么?

csv的结构如下所示

import csv
from bs4 import BeautifulSoup 
import requests 

contents = []
with open('websupplies.csv','r') as csvf: # Open file in read mode
   urls = csv.reader(csvf)

   for url in urls:
      contents.append(url) # Add each url to list contents


for url in contents:  # Parse through each url in the list.
   page = requests.get(url).content
   soup = BeautifulSoup(page, "html.parser")

   price = soup.find('span', attrs={'itemprop':'price'})
   availability = soup.find('div', attrs={'class':'product-availability'})
https://www.websupplies.gr/epeksergastis-intel-core-i5-8400-9mb-2-80ghz-bx80684i58400
https://www.websupplies.gr/epeksergastis-intel-celeron-g3930-2mb-2-90ghz-bx80677g3930
https://www.websupplies.gr/epeksergastis-amd-a6-9500-bristol-ridge-dual-core-3-5ghz-socket-am4-65w-ad9500agabbox

他们最后没有一个半列

有问题,它说请求需要http方案,也许这就是问题所在?当您读取文件中的行时,还必须删除/n。您的文件是一个简单的URL列表。它不是真正的CSV

CSV读取器将每一行读入自己的列表。因此,加载数据的结构将是:

[
  ["https://www.websupplies.gr/epeksergastis-intel-core-i5-8400-9mb-2-80ghz-bx80684i58400"],
  ["https://www.websupplies.gr/epeksergastis-intel-celeron-g3930-2mb-2-90ghz-bx80677g3930"],
  ["https://www.websupplies.gr/epeksergastis-amd-a6-9500-bristol-ridge-dual-core-3-5ghz-socket-am4-65w-ad9500agabbox"],
]
解决此问题的一种方法是使用
url[0]
作为
请求的参数。get
,但真正正确的解决方法是根本不使用CSV。由于每行只有一条数据,您可以直接读取数据并将其传递给请求:

with open('websupplies.csv','r') as csvf: # Open file in read mode 
   for line in csvf:
      contents.append(line.strip('\n')) # Add each url to list contents

你能展示你的CSV的结构吗?可能是@Mad my URL的重复,他们有http协议。在你的回答中有一个问题,看起来像是一条评论。我知道,但我没有足够的声誉,我会写一条评论。我没有以txt文件的形式回复你是说?