使用Beauifulsoup提取变量的Python web抓取_Python_Web Scraping_Beautifulsoup

使用Beauifulsoup提取变量的Python web抓取

python web-scraping

使用Beauifulsoup提取变量的Python web抓取,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我刚开始使用Beautifulsoup，我想从网站中提取变量名称，品牌和价格，但这些变量无法正常工作 …但这不起作用有人建议如何将所有名称、品牌和价格信息提取到数据帧列表中吗？我刚刚注意到你想要一个“数据帧列表”。如果您真的想要一个“数据帧”，那么它会得到一个列表，从这个结果中应该很容易采用 from bs4 import BeautifulSoup import requests import ast # abstract syntax tree to parse dictionary t

我刚开始使用Beautifulsoup，我想从网站中提取变量

名称

，

品牌

和

价格

，但这些变量无法正常工作

…但这不起作用

有人建议如何将所有

名称

、

品牌

和

价格

信息提取到数据帧列表中吗？

我刚刚注意到你想要一个“数据帧列表”。如果您真的想要一个“数据帧”，那么它会得到一个

列表，从这个结果中应该很容易采用
from bs4 import BeautifulSoup
import requests
import ast  # abstract syntax tree to parse dictionary text

url = 'http://www.mediamarkt.nl/nl/category/_laptops-482723.html'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')

scripts = soup.find_all('script')
infos = []

for s in scripts:
    if 'var product' in s.text[0:12]:          # find the script of interest
        d = s.text.split(' = ')[1].strip(';')  # get the product information
        # parse information as dictionary text
        data = ast.literal_eval(d)

        infos.append(data)

# Here's the list
# print infos  #  [{'category': 'Computer', 'name': 'HP Pavilion X360 14-BA081ND', ... 'dimension9': 'Laptops', 'dimension10': 'Windows-laptops', 'brand': 'LENOVO'}]

# for i in infos:
#     print i['name']   # HP Pavilion X360 14-BA081ND
#     print i['brand']  # HP
#     print i['price']  # 629.00

可能有更好的方法，但希望能有所帮助。
我刚刚注意到您需要一个“数据帧列表”。如果您真的想要一个“数据帧”，那么它会得到一个列表，从这个结果中应该很容易采用
from bs4 import BeautifulSoup
import requests
import ast  # abstract syntax tree to parse dictionary text

url = 'http://www.mediamarkt.nl/nl/category/_laptops-482723.html'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')

scripts = soup.find_all('script')
infos = []

for s in scripts:
    if 'var product' in s.text[0:12]:          # find the script of interest
        d = s.text.split(' = ')[1].strip(';')  # get the product information
        # parse information as dictionary text
        data = ast.literal_eval(d)

        infos.append(data)

# Here's the list
# print infos  #  [{'category': 'Computer', 'name': 'HP Pavilion X360 14-BA081ND', ... 'dimension9': 'Laptops', 'dimension10': 'Windows-laptops', 'brand': 'LENOVO'}]

# for i in infos:
#     print i['name']   # HP Pavilion X360 14-BA081ND
#     print i['brand']  # HP
#     print i['price']  # 629.00

可能有更好的方法，但希望能有所帮助。
我对bs4不太熟悉，但对脚本了解不多。查找所有（'var'）
听起来像是在脚本下查找
标记，这可能不是您想要的。谢谢。您有没有建议使用什么命令来提取这三个变量？网站上的name
、brand
和price
在哪里。如果您知道JavaScript将是静态的，您可以通过正则表达式提取JSON，然后解析JSON。否则，您需要解释JavaScript，例如。我不太熟悉bs4，但是脚本。find_all（'var'）
听起来像是在脚本下寻找
标记，这可能不是您想要的。谢谢。您有没有建议使用什么命令来提取这三个变量？网站上的name
、brand
和price在哪里。如果您知道JavaScript将是静态的，您可以通过正则表达式提取JSON，然后解析JSON。否则，您将需要使用类似的内容来解释JavaScript。非常感谢！这当然很有帮助。非常感谢！这当然是有帮助的。
from bs4 import BeautifulSoup
import requests
import ast  # abstract syntax tree to parse dictionary text

url = 'http://www.mediamarkt.nl/nl/category/_laptops-482723.html'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')

scripts = soup.find_all('script')
infos = []

for s in scripts:
    if 'var product' in s.text[0:12]:          # find the script of interest
        d = s.text.split(' = ')[1].strip(';')  # get the product information
        # parse information as dictionary text
        data = ast.literal_eval(d)

        infos.append(data)

# Here's the list
# print infos  #  [{'category': 'Computer', 'name': 'HP Pavilion X360 14-BA081ND', ... 'dimension9': 'Laptops', 'dimension10': 'Windows-laptops', 'brand': 'LENOVO'}]

# for i in infos:
#     print i['name']   # HP Pavilion X360 14-BA081ND
#     print i['brand']  # HP
#     print i['price']  # 629.00