Web scraping 如何使用python beautifulsoup和请求在多个链接上迭代并逐个刮取每个链接,并将输出保存在csv中

Web scraping 如何使用python beautifulsoup和请求在多个链接上迭代并逐个刮取每个链接,并将输出保存在csv中,web-scraping,beautifulsoup,python-requests,Web Scraping,Beautifulsoup,Python Requests,我有这个代码,但不知道如何读取CSV或列表中的链接。我想读取链接并从每个链接中删除详细信息,然后将每个链接对应的列中的数据保存到输出CSV中 下面是我为获取特定数据而构建的代码 from bs4 import BeautifulSoup import requests url = "http://www.ebay.com/itm/282231178856" r = requests.get(url) x = BeautifulSoup(r.content, "html.parser") #

我有这个代码,但不知道如何读取CSV或列表中的链接。我想读取链接并从每个链接中删除详细信息,然后将每个链接对应的列中的数据保存到输出CSV中

下面是我为获取特定数据而构建的代码

from bs4 import BeautifulSoup
import requests

url = "http://www.ebay.com/itm/282231178856"
r = requests.get(url)

x = BeautifulSoup(r.content, "html.parser")

# print(x.prettify().encode('utf-8'))

# time to find some tags!!

# y = x.find_all("tag")

z = x.find_all("h1", {"itemprop": "name"})

# print z
# for loop done to extracting the title.
for item in z:
    try:
        print item.text.replace('Details about ', '')
    except:
        pass
# category extraction done

m = x.find_all("span", {"itemprop": "name"})

# print m

for item in m:
    try:
        print item.text
    except:
        pass

# item condition extraction done
n = x.find_all("div", {"itemprop": "itemCondition"})

# print n

for item in n:
    try:
        print item.text
    except:
        pass

# sold number extraction done

k = x.find_all("span", {"class": "vi-qtyS vi-bboxrev-dsplblk vi-qty-vert-algn vi-qty-pur-lnk"})

# print k

for item in k:
    try:
        print item.text
    except:
        pass

# Watchers extraction done

u = x.find_all("span", {"class": "vi-buybox-watchcount"})

# print u

for item in u:
    try:
        print item.text
    except:
        pass

# returns details extraction done

t = x.find_all("span", {"id": "vi-ret-accrd-txt"})

# print t

for item in t:
    try:
        print item.text
    except:
        pass

#per hour day view done
a = x.find_all("div", {"class": "vi-notify-new-bg-dBtm"})

# print a

for item in a:
    try:
        print item.text
    except:
        pass

#trending at price
b = x.find_all("span", {"class": "mp-prc-red"})

#print b

for item in b:
    try:
        print item.text
    except:
        pass

你的问题有点模糊

你在谈论哪些链接?一个易趣页面上有100个。你想搜集哪些信息?同样还有一吨

但无论如何,我会继续:

# First, create a list of urls you want to iterate on 

urls = []
soup = (re.text, "html.parser")

# Assuming your links of interests are values of "href" attributes within <a> tags 
a_tags = soup.find_all("a")
for tag in a_tags:
    urls.append(tag["href"])

# Second, start to iterate while storing the info
info_1, info_2 = [], []
for link in urls:
    # Do stuff here, maybe its time to define your existing loops as functions?
    info_a, info_b = YourFunctionReturningValues(soup)
    info_1.append(info_a)
    info_2.append(info_b)
希望这会有帮助

当然,不要犹豫提供更多信息,以便获得更多详细信息

在具有BeautifulGroup的属性上:


关于csv模块:

感谢您的帮助,不共享链接是一个很小的错误。我指的是物品页面本身的主URL,它遵循以下结构-->ebay.com/itm/。我有上千个项目id,因此我可以使用上面的示例代码将每个项目id放入一个列表,然后使用其他URL连接教程将每个项目id连接起来,并循环遍历它们以获取我需要的数据,然后将数据保存到我要使用代码本身创建的新CSV中,我将再次编写一个示例代码,并将其发布在这里供您进行评论,以便了解更多信息。嘿,也许您现在应该结束这个问题。我认为你的结构是对的,所以开始把问题分解成不同的部分。也就是说,(1)定义一个函数来检测页面的html元素,这些元素是您使用beautiful soup感兴趣的,(2)如何存储收集的信息,(3)如何写入csv文件:)尝试一下(很好),稍后可能会返回更精确的问题!相信我,你已经明白它是如何工作的了
# Don't forget to import the csv module
with open(r"path_to_file.csv", "wb") as my_file:
    csv_writer = csv.writer(final_csv, delimiter = ",")
    csv_writer.writerows(zip(urls, info_1, info_2, info_3))