Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Can';t使用Selenium Python获取表格形式的数据_Python_Selenium_Beautifulsoup - Fatal编程技术网

Can';t使用Selenium Python获取表格形式的数据

Can';t使用Selenium Python获取表格形式的数据,python,selenium,beautifulsoup,Python,Selenium,Beautifulsoup,我不熟悉使用selenium python进行刮片。因此,我可以检索一些数据,但我希望它以表格形式显示在网页上: 以下是我到目前为止的情况: url='https://definitivehc.maps.arcgis.com/home/item.html?id=1044bb19da8d4dbfb6a96eb1b4ebf629&view=list&showFilters=false#data' browser = webdriver.Chrome(r"C:\task\c

我不熟悉使用selenium python进行刮片。因此,我可以检索一些数据,但我希望它以表格形式显示在网页上:

以下是我到目前为止的情况:

url='https://definitivehc.maps.arcgis.com/home/item.html?id=1044bb19da8d4dbfb6a96eb1b4ebf629&view=list&showFilters=false#data'

browser = webdriver.Chrome(r"C:\task\chromedriver")
browser.get(url)
time.sleep(25)


rows_in_table = browser.find_elements_by_xpath('//table[@class="dgrid-row-table"]//tr[th or td]')
for element in rows_in_table:
    print(element.text.replace('\n', ''))
结果片段:

Hospital NameHospital TypeCityState AbrvZip CodeCounty NameState Name
Phoenix VA Health Care System (AKA Carl T Hayden VA Medical Center)VA HospitalPhoenixAZ85012MaricopaArizona040130401362620000.001
Southern Arizona VA Health Care SystemVA HospitalTucsonAZ85723PimaArizona04019040192952952202.002
VA Central California Health Care SystemVA HospitalFresnoCA93703FresnoCalifornia060190601954542202.003
VA Connecticut Healthcare System - West Haven Campus (AKA West Haven VA Medical Center)VA HospitalWest HavenCT6516New HavenConnecticut09009090092162161102.004

我将非常感谢专家在这方面的帮助。谢谢。

使用Javascript动态加载数据。您可以使用
请求
模块模拟这些请求:

import json
import requests

config_url = 'https://definitivehc.maps.arcgis.com/sharing/rest/portals/self?culture=en-us&f=json'
page_url = 'https://services7.arcgis.com/{_id}/arcgis/rest/services/Definitive_Healthcare_USA_Hospital_Beds/FeatureServer/0/query?f=json&where=1%3D1&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&orderByFields=OBJECTID%20ASC&resultOffset={offset}&resultRecordCount=50&cacheHint=true&quantizationParameters=%7B%22mode%22%3A%22edit%22%7D'

_id = requests.get(config_url).json()['id']

offset = 0
while True:
    data = requests.get(page_url.format(_id=_id, offset=offset)).json()

    # uncommnet this to print all data:
    # print(json.dumps(data, indent=4))

    for i, f in enumerate(data['features'], offset+1):
        print(i, f['attributes'])
        print('-' * 160)

    if i % 50:
        break

    offset += 50
打印所有6624条记录:

...

6614 {'OBJECTID': 6614, 'HOSPITAL_NAME': 'Walter E Washington Convention Center Field Hospital (Temporarily Open due to COVID-19)', 'HOSPITAL_TYPE': 'Short Term Acute Care Hospital', 'HQ_ADDRESS': '801 Mount Vernon Pl Nw', 'HQ_ADDRESS1': None, 'HQ_CITY': 'Washington', 'HQ_STATE': 'DC', 'HQ_ZIP_CODE': '20001', 'COUNTY_NAME': 'District of Columbia', 'STATE_NAME': 'District of Columbia', 'STATE_FIPS': '11', 'CNTY_FIPS': '001', 'FIPS': '11001', 'NUM_LICENSED_BEDS': None, 'NUM_STAFFED_BEDS': None, 'NUM_ICU_BEDS': 0, 'ADULT_ICU_BEDS': 0, 'PEDI_ICU_BEDS': None, 'BED_UTILIZATION': None, 'Potential_Increase_In_Bed_Capac': 0, 'AVG_VENTILATOR_USAGE': None}
----------------------------------------------------------------------------------------------------------------------------------------------------------------
6615 {'OBJECTID': 6615, 'HOSPITAL_NAME': 'Joint Base Cape Cod Field Hospital (Temporarily Open due to COVID-19)', 'HOSPITAL_TYPE': 'Short Term Acute Care Hospital', 'HQ_ADDRESS': 'Connery Ave', 'HQ_ADDRESS1': None, 'HQ_CITY': 'Buzzards Bay', 'HQ_STATE': 'MA', 'HQ_ZIP_CODE': '2542', 'COUNTY_NAME': 'Barnstable', 'STATE_NAME': 'Massachusetts', 'STATE_FIPS': '25', 'CNTY_FIPS': '001', 'FIPS': '25001', 'NUM_LICENSED_BEDS': None, 'NUM_STAFFED_BEDS': None, 'NUM_ICU_BEDS': 0, 'ADULT_ICU_BEDS': 0, 'PEDI_ICU_BEDS': None, 'BED_UTILIZATION': None, 'Potential_Increase_In_Bed_Capac': 0, 'AVG_VENTILATOR_USAGE': None}
----------------------------------------------------------------------------------------------------------------------------------------------------------------
6616 {'OBJECTID': 6616, 'HOSPITAL_NAME': 'UMass Lowell Recreation Center Field Hospital (Temporarily Open due to COVID-19)', 'HOSPITAL_TYPE': 'Short Term Acute Care Hospital', 'HQ_ADDRESS': '322 Aiken St', 'HQ_ADDRESS1': None, 'HQ_CITY': 'Lowell', 'HQ_STATE': 'MA', 'HQ_ZIP_CODE': '1854', 'COUNTY_NAME': 'Middlesex', 'STATE_NAME': 'Massachusetts', 'STATE_FIPS': '25', 'CNTY_FIPS': '017', 'FIPS': '25017', 'NUM_LICENSED_BEDS': None, 'NUM_STAFFED_BEDS': None, 'NUM_ICU_BEDS': 0, 'ADULT_ICU_BEDS': 0, 'PEDI_ICU_BEDS': None, 'BED_UTILIZATION': None, 'Potential_Increase_In_Bed_Capac': 0, 'AVG_VENTILATOR_USAGE': None}
----------------------------------------------------------------------------------------------------------------------------------------------------------------
6617 {'OBJECTID': 6617, 'HOSPITAL_NAME': 'Miami Beach Convention Center Field Hospital (Temporarily Open due to COVID-19)', 'HOSPITAL_TYPE': 'Short Term Acute Care Hospital', 'HQ_ADDRESS': '1901 Convention Center Dr', 'HQ_ADDRESS1': None, 'HQ_CITY': 'Miami Beach', 'HQ_STATE': 'FL', 'HQ_ZIP_CODE': '33139', 'COUNTY_NAME': 'Miami-Dade', 'STATE_NAME': 'Florida', 'STATE_FIPS': '12', 'CNTY_FIPS': '086', 'FIPS': '12086', 'NUM_LICENSED_BEDS': None, 'NUM_STAFFED_BEDS': None, 'NUM_ICU_BEDS': 0, 'ADULT_ICU_BEDS': 0, 'PEDI_ICU_BEDS': None, 'BED_UTILIZATION': None, 'Potential_Increase_In_Bed_Capac': 0, 'AVG_VENTILATOR_USAGE': None}

...

这是@Andrej回答的更新版本,此代码将下载表格,并将其保存为excel文档,而不是打印

import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

config_url = 'https://definitivehc.maps.arcgis.com/sharing/rest/portals/self?culture=en-us&f=json'
page_url = 'https://services7.arcgis.com/{_id}/arcgis/rest/services/Definitive_Healthcare_USA_Hospital_Beds/FeatureServer/0/query?f=json&where=1%3D1&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&orderByFields=OBJECTID%20ASC&resultOffset={offset}&resultRecordCount=50&cacheHint=true&quantizationParameters=%7B%22mode%22%3A%22edit%22%7D'

_id = requests.get(config_url).json()['id']
required=[]
offset = 0
while True:
    data = requests.get(page_url.format(_id=_id, offset=offset)).json()

    # uncommnet this to print all data:
    #pprint(json.dumps(data, indent=4))

    for i, f in enumerate(data['features'], offset+1):
        required.append(f['attributes'])


    if i % 50:
        break

    offset += 50

df=json_normalize(required)
with pd.ExcelWriter('dataFunction.xlsx', mode='A') as writer:
    df.to_excel(writer)

我试过了,并上传了excel表格

非常感谢@Andrej Kesely,我非常感谢您的帮助。我想询问您从哪里获得这些url:config_url&page_url它们不是我的原始url,我找不到它们。谢谢。这是我想问的一个问题@Andrej,因为我是一个Noob,我没有太多的理由要问他,你能代表我问这个问题吗?我在学习时发现的东西