Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/365.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 接收请求_Python_Html_Pandas_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 接收请求

Python 接收请求,python,html,pandas,web-scraping,beautifulsoup,Python,Html,Pandas,Web Scraping,Beautifulsoup,我正在尝试获取请求 from bs4 import BeautifulSoup import requests import pandas as pd html_page = requests.get('"https://www.dataquest.io"') soup = BeautifulSoup(html_page, "lxml") soup.find_all('<\a>') 但是,这只返回一个空列表这将提取表中的行,并将每一行分配给一个字典,字典将附加到列表中。您可能

我正在尝试获取请求

from bs4 import BeautifulSoup
import requests 
import pandas as pd

html_page = requests.get('"https://www.dataquest.io"')

soup = BeautifulSoup(html_page, "lxml")
soup.find_all('<\a>')

但是,这只返回一个空列表

这将提取表中的行,并将每一行分配给一个字典,字典将附加到列表中。您可能需要稍微调整选择器

from bs4 import BeautifulSoup
import requests
from pprint import pprint

output_data = [] # This is a LoD containing all of the table data

for i in range(1, 453): # For loop used to paginate
    data_page = requests.get(f'https://www.dataquest.io?')
    print(data_page)

    soup = BeautifulSoup(data_page.text, "lxml")

    # Find all of the table rows
    elements = soup.select('div.head_table_t')
    try:
        secondary_elements = soup.select('div.list_table_subs')
        elements = elements + secondary_elements
    except:
        pass
    print(len(elements))
    # Iterate through the rows and select individual column and assign it to the dictionary with the correct header
    for element in elements:
        data = {}
        data['Name'] = element.select_one('div.col_1 a').text.strip()
        data['Page URL'] = element.select_one('div.col_1 a')['href']
        output_data.append(data) # Append dictionary (contact info) to the list
        pprint(data) # Pretty Print the dictionary out (to see what you're receiving, this can be removed)

这将拉动表中的行,并将每一行分配给字典,字典将附加到列表中。您可能需要稍微调整选择器

from bs4 import BeautifulSoup
import requests
from pprint import pprint

output_data = [] # This is a LoD containing all of the table data

for i in range(1, 453): # For loop used to paginate
    data_page = requests.get(f'https://www.dataquest.io?')
    print(data_page)

    soup = BeautifulSoup(data_page.text, "lxml")

    # Find all of the table rows
    elements = soup.select('div.head_table_t')
    try:
        secondary_elements = soup.select('div.list_table_subs')
        elements = elements + secondary_elements
    except:
        pass
    print(len(elements))
    # Iterate through the rows and select individual column and assign it to the dictionary with the correct header
    for element in elements:
        data = {}
        data['Name'] = element.select_one('div.col_1 a').text.strip()
        data['Page URL'] = element.select_one('div.col_1 a')['href']
        output_data.append(data) # Append dictionary (contact info) to the list
        pprint(data) # Pretty Print the dictionary out (to see what you're receiving, this can be removed)

试试汤。findAll'a'这只是返回所有数据。试着看看如何将表中的数据提取到dftry汤中。findAll'a'这只是返回所有数据。试图了解如何将表中的数据提取到数据流中,非常感谢。如何对整个网站执行此操作,即如何从下一页提取表格等等?由于要提取的表有453页,我已经修改了我的原始答案,这包括多页循环,您需要稍微清理一下,并将请求移动到try/except中。如果这对你有效,你能把这个标记为答案吗?你原来的问题已经回答了,你能把这个标记为解决方案吗?至于剩下的问题,你需要将次要元素添加到元素列表中,我会在我的答案标记为解决方案后更新它。我已经用添加的其他元素进行了更新,根据您的操作方式,您可能希望将它们添加到相关词典中,例如,2M控股有限公司将在其自己的词典中与其他公司进行对比。非常感谢。如何对整个网站执行此操作,即如何从下一页提取表格等等?由于要提取的表有453页,我已经修改了我的原始答案,这包括多页循环,您需要稍微清理一下,并将请求移动到try/except中。如果这对你有效,你能把这个标记为答案吗?你原来的问题已经回答了,你能把这个标记为解决方案吗?至于剩下的问题,你需要将次要元素添加到元素列表中,我会在我的答案标记为解决方案后更新它。我已经用添加的其他元素进行了更新,根据您的操作方式,您可能希望将它们添加到关联的字典中,例如,2M控股有限公司会将其他公司放在下面,而不是放在自己的字典中。