Python Web抓取-ResultSet对象没有属性'；findAll'；_Python_Web Scraping_Beautifulsoup

Python Web抓取-ResultSet对象没有属性'；findAll'；

python web-scraping

Python Web抓取-ResultSet对象没有属性'；findAll'；,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,在for循环中读取数组中的第二个值时，bs4出现问题。下面我将粘贴代码但是，当我使用第19行时，我没有收到任何错误。当我把它换成整个数组（第18行）时，当它试图收集第二个值时，它会出错。请注意，数组中的第二个值与第19行的值相同问题在于，您将BeautifulSoup导入为soup，并且还定义了一个同名的变量soup=soup（page_html，“html.parser”）我对你的代码进行了一点重构，请告诉我它是否如预期的那样工作导入csv 导入请求从bs4导入BeautifulSo

在for循环中读取数组中的第二个值时，bs4出现问题。下面我将粘贴代码

但是，当我使用第19行时，我没有收到任何错误。当我把它换成整个数组（第18行）时，当它试图收集第二个值时，它会出错。请注意，数组中的第二个值与第19行的值相同

问题在于，您

将BeautifulSoup导入为soup

，并且还定义了一个同名的变量

soup=soup（page_html，“html.parser”）

我对你的代码进行了一点重构，请告诉我它是否如预期的那样工作

导入csv
导入请求
从bs4导入BeautifulSoup
智能生活标识=”https://www.hayneedle.com/search/index.cfm?categoryID=&page=1&searchQuery=Smart%20Living&selectedFacets=Brand%7CSmart%20Living&sortBy="
IEL_ID=”https://www.hayneedle.com/search/index.cfm?categoryID=&page=1&searchQuery=IEL&selectedFacets=Brand%7CIts%20Exciting%20Lighting&sortBy="
TD_id=”https://www.hayneedle.com/search/index.cfm?categoryID=&page=1&searchQuery=two%20dogs&selectedFacets=Brand%7CTwo%20Dogs%20Designs&sortBy="
site_URL=[smart_living_ID、IEL_ID、TD_ID]
sess=requests.Session（）
产品数据=[]
对于站点URL中的当前URL：
req=sess.get（url=curr\u url）
汤=BeautifulSoup（所需内容，“lxml”）
containers=soup.find_all（“div”，{“product-card__容器___1U2Sb”}）
对于容器中的当前容器：
prod\u title=curr\u container.div.img[“alt”]
prod\u URL=curr\u container.a[“href”]
price\u container=curr\u container.find(
“div”，
{“类”：“product-card\uuuuu productInfo\uuuuuuuuuuuuu30YSC正文无下划线txt黑色”}，
)
$s\u elem=price\u container.find（“span”，{“class”：“main price$s”}）
仙元素=美元元素。查找下一个（“span”）
prod\u price=美元\u elem.get\u text（）+美分\u elem.get\u text（）
产品价格=浮动（产品价格[1:]
产品数据.append（（产品标题、产品URL、产品价格））
CSV_标题=（“标题”、“URL”、“价格”）
以open（“../out/haynedle\u prices.csv”，“w”，newline=”“）作为文件\u out:
writer=csv.writer（文件输出）
writer.writerow（CSV_头）
writer.writerows（产品数据）

我通过重复当前URL列表10次来测试它，这比我预期的要长。当然还有一些改进需要改进，我可能会在接下来的几天内将其重写为使用lxml，多处理也可能是一个不错的选择。当然，这完全取决于您如何使用它：）

请分享整个错误消息。您从ResultSet对象没有属性“findAll”了解到了什么？我不明白为什么在一个实例中我收到了这个错误，而在下一个实例中我没有收到。顺便说一句，变量和函数名称通常应该遵循带有下划线的

小写形式。缺乏一致的命名使得代码更难理解。我不明白为什么在一个实例中我收到了这个错误，而在下一个实例中我没有收到。我也不知道，你能不能让台词和变化更明显？请回答你的问题，并在其中添加完整的回溯。我非常感谢你不辞辛劳地为我重写这篇文章。对于python来说，这是我的第一个脚本。。我需要一点时间来消化你提供给我的东西。简单地复制和粘贴会导致错误。我想我需要安装另一个软件包。错误在第18行：@Wightbread727：“soup=beautifulsou（req.content，“lxml”）”@Wightbread727有什么错误？请随便问任何关于代码的问题，我愚蠢地忘记了对它进行注释……回溯（最近的一次电话是最后一次）：文件“C:\Users\Wightbread\Contacts\Desktop\Webscrape\HNPrices\u v2.py”，第18行，在soup=BeautifulSoup（req.content，“lxml”）文件“C:\Users\Wightbread\AppData\Local\Programs\Python\Python38-32\lib\site packages\bs4\u init\uz.py”，第225行，初始提升功能未找到（bs4.FeatureNotFound:找不到具有您请求的功能的树生成器：lxml。是否需要安装解析器库？@Wightbread727问题正是它所说的！我将html.parser
替换为lxml
作为解析器，因为我非常喜欢后者。请尝试返回html.parser。Speakin对于lxml，这实际上是一个潜在的未来改变，我可以完全抛弃BeautifulSoup来实现lxml。
import requests
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

 
SmartLiving_IDS = "https://www.hayneedle.com/search/index.cfm?categoryID=&page=1&searchQuery=Smart%20Living&selectedFacets=Brand%7CSmart%20Living&sortBy="
IEL_IDS = "https://www.hayneedle.com/search/index.cfm?categoryID=&page=1&searchQuery=IEL&selectedFacets=Brand%7CIts%20Exciting%20Lighting&sortBy="
TD_IDS = "https://www.hayneedle.com/search/index.cfm?categoryID=&page=1&searchQuery=two%20dogs&selectedFacets=Brand%7CTwo%20Dogs%20Designs&sortBy="

Headers = "Description, URL, Price \n"

text_file = open("HayneedlePrices.csv", "w")
text_file.write(Headers)
text_file.close()


URL_Array = [SmartLiving_IDS, IEL_IDS, TD_IDS]
#URL_Array = [IEL_IDS]
for URL in URL_Array:
  print("\n" + "Loading New URL:" "\n" + URL + "\n" + "\n")
  
  uClient = uReq(URL)
  page_html = uClient.read()
  uClient.close() 
  soup = soup(page_html, "html.parser")
  
  Containers = soup.findAll("div", {"product-card__container___1U2Sb"})
  for Container in Containers:

    
    Title             = Container.div.img["alt"]    
    Product_URL       = Container.a["href"]
    
    Price_Container   = Container.findAll("div", {"class":"product-card__productInfo___30YSc body no-underline txt-black"})[0].findAll("span", {"style":"font-size:20px"})

    Price_Dollars     = Price_Container[0].get_text()
    Price_Cents       = Price_Container[1].get_text()


    print("\n" + "#####################################################################################################################################################################################################" + "\n")
    # print("   Container: " + "\n" + str(Container))
    # print("\n" + "-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------" + "\n")
    print(" Description: " + str(Title))
    print(" Product URL: " + str(Product_URL))
    print("       Price: " + str(Price_Dollars) + str(Price_Cents))
    print("\n" + "#####################################################################################################################################################################################################" + "\n")
 
    text_file = open("HayneedlePrices.csv", "a")
    text_file.write(str(Title) +  ", " + str(Product_URL) + ", " + str(Price_Dollars) + str(Price_Cents) + "\n")
    text_file.close()

  print("Information gathered and Saved from URL Successfully.")
  print("Looking for Next URL..")
print("No Additional URLs to Gather. Process Completed.")