Python 使用selenium从表中删除数据 我想从下面的页面中删除所有的公司信息,包括:符号/名称和财报电话会议时间:

Python 使用selenium从表中删除数据 我想从下面的页面中删除所有的公司信息,包括:符号/名称和财报电话会议时间:,python,selenium,web-scraping,Python,Selenium,Web Scraping,到目前为止,我只知道公司名称,但我得到了一个错误: NoSuchElementException:没有这样的元素:无法找到元素:{方法:xpath,选择器://*[@id='cal-res-table']]/div[1]/table/tbody/tr[1]/td[2]} 会话信息:chrome=86.0.4240.198 from selenium import webdriver import datetime tomorrow = (datetime.date.today() + date

到目前为止,我只知道公司名称,但我得到了一个错误:

NoSuchElementException:没有这样的元素:无法找到元素:{方法:xpath,选择器://*[@id='cal-res-table']]/div[1]/table/tbody/tr[1]/td[2]} 会话信息:chrome=86.0.4240.198

from selenium import webdriver
import datetime

tomorrow = (datetime.date.today() + datetime.timedelta(days=1)).isoformat() #get tomorrow in iso format as needed
url = "https://finance.yahoo.com/calendar/earnings?day="+tomorrow
print ("url: " + url)

driver = webdriver.Chrome("C:/Users/jrod94/Downloads/chromedriver_win32/chromedriver.exe")
driver.get(url)
element = driver.find_element_by_xpath("//*[@id='cal-res-table']")
Companies = [a.get_attribute("Company") for a in element]

driver.close()

用熊猫怎么样

import datetime
import pandas as pd

pd.set_option('display.max_column',None)
tomorrow = (datetime.date.today() + datetime.timedelta(days=1)).isoformat() #get tomorrow in iso format as needed'''
url = pd.read_html("https://finance.yahoo.com/calendar/earnings?day="+tomorrow, header=0)
table = url[0]
print(table)
输出:-

  Symbol                         Company  Earnings Call Time EPS Estimate  \
0    WBAI                     500.Com Ltd  After Market Close            -   
1    BRBR             Bellring Brands Inc                 TAS         0.19   
2     BKE                      Buckle Inc  Before Market Open         0.54   
3     BNR        Burning Rock Biotech Ltd                 TAS        -0.12   
4     IEC            IEC Electronics Corp                 TAS            -   
5    GEOS      Geospace Technologies Corp                 TAS            -   
6    DREM  Dream Homes & Development Corp   Time Not Supplied            -   
7    DXLG        Destination XL Group Inc  Before Market Open            -   
8      FL                 Foot Locker Inc  Before Market Open         0.61   
9     HHR            HeadHunter Group PLC                 TAS         0.14   
10    HHR            HeadHunter Group PLC  Before Market Open         0.14   
11    RMR                   RMR Group Inc  Before Market Open         0.39   
12    GSX                 GSX Techedu Inc  Before Market Open        -0.31   
13    GSX                 GSX Techedu Inc                 TAS        -0.31   
14   HIBB              Hibbett Sports Inc  Before Market Open         0.45   
15   HAYN        Haynes International Inc                 TAS         -0.7   
16   IIIV                i3 Verticals Inc                 TAS         0.18   
17   AIHS          Senmiao Technology Ltd  Before Market Open           
         

用熊猫怎么样

import datetime
import pandas as pd

pd.set_option('display.max_column',None)
tomorrow = (datetime.date.today() + datetime.timedelta(days=1)).isoformat() #get tomorrow in iso format as needed'''
url = pd.read_html("https://finance.yahoo.com/calendar/earnings?day="+tomorrow, header=0)
table = url[0]
print(table)
输出:-

  Symbol                         Company  Earnings Call Time EPS Estimate  \
0    WBAI                     500.Com Ltd  After Market Close            -   
1    BRBR             Bellring Brands Inc                 TAS         0.19   
2     BKE                      Buckle Inc  Before Market Open         0.54   
3     BNR        Burning Rock Biotech Ltd                 TAS        -0.12   
4     IEC            IEC Electronics Corp                 TAS            -   
5    GEOS      Geospace Technologies Corp                 TAS            -   
6    DREM  Dream Homes & Development Corp   Time Not Supplied            -   
7    DXLG        Destination XL Group Inc  Before Market Open            -   
8      FL                 Foot Locker Inc  Before Market Open         0.61   
9     HHR            HeadHunter Group PLC                 TAS         0.14   
10    HHR            HeadHunter Group PLC  Before Market Open         0.14   
11    RMR                   RMR Group Inc  Before Market Open         0.39   
12    GSX                 GSX Techedu Inc  Before Market Open        -0.31   
13    GSX                 GSX Techedu Inc                 TAS        -0.31   
14   HIBB              Hibbett Sports Inc  Before Market Open         0.45   
15   HAYN        Haynes International Inc                 TAS         -0.7   
16   IIIV                i3 Verticals Inc                 TAS         0.18   
17   AIHS          Senmiao Technology Ltd  Before Market Open           
         

实际上,您的代码给出了一个错误,但与您的代码不在同一行,而是稍后。 可能问题是当您尝试访问元素时页面没有加载。在发生错误的线路之前稍微延迟一下可能会解决问题

from selenium import webdriver
import datetime
import time

tomorrow = (datetime.date.today() + datetime.timedelta(days=1)).isoformat() #get tomorrow in iso format as needed
url = "https://finance.yahoo.com/calendar/earnings?day="+tomorrow
print ("url: " + url)

driver = webdriver.Chrome("C:/Users/jrod94/Downloads/chromedriver_win32/chromedriver.exe")
driver.get(url)
time.sleep(1) # you can increase 1 if it still does not work
element = driver.find_element_by_xpath("//*[@id='cal-res-table']")
Companies = [a.get_attribute("Company") for a in element]

driver.close()

实际上,您的代码给出了一个错误,但与您的代码不在同一行,而是稍后。 可能问题是当您尝试访问元素时页面没有加载。在发生错误的线路之前稍微延迟一下可能会解决问题

from selenium import webdriver
import datetime
import time

tomorrow = (datetime.date.today() + datetime.timedelta(days=1)).isoformat() #get tomorrow in iso format as needed
url = "https://finance.yahoo.com/calendar/earnings?day="+tomorrow
print ("url: " + url)

driver = webdriver.Chrome("C:/Users/jrod94/Downloads/chromedriver_win32/chromedriver.exe")
driver.get(url)
time.sleep(1) # you can increase 1 if it still does not work
element = driver.find_element_by_xpath("//*[@id='cal-res-table']")
Companies = [a.get_attribute("Company") for a in element]

driver.close()

由于您的问题与硒有关:

你应该看看周围

在等待HTML源代码中所有元素的显示时,应使用以下代码进行描述:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


def main(url):
    driver = webdriver.Firefox()
    driver.get(url)
    try:
        cnames = [x.text for x in WebDriverWait(driver, 10).until(
            EC.presence_of_all_elements_located(
                (By.CSS_SELECTOR, "td[aria-label='Company']"))
        )]
    finally:
        print(cnames)
        driver.quit()


main("https://finance.yahoo.com/calendar/earnings")
输出:

['111 Inc', '360 DigiTech Inc', 'American Software Inc', 'American Software Inc', 'Corporacion America Airports SA', 'Atkore International Group Inc', 'Atkore International Group Inc', 'Helmerich and Payne Inc', 'Amtech Systems Inc', 'Amtech Systems Inc', 'Delta Apparel Inc', 'Delta Apparel Inc', 'Bellring Brands Inc', 'Berry Global Group Inc', 'Beacon Roofing Supply Inc', 'Natural Grocers By Vitamin Cottage Inc', "BJ's Wholesale Club Holdings Inc", 'Entera Bio Ltd', 'SG Blocks Inc', 'SG Blocks Inc', 'BEST Inc', 'Brady Corp', 'BioHiTech Global Inc', 'BioHiTech Global Inc', 'Oaktree Strategic Income Corporation', 'Caleres Inc', 'Pennantpark Investment Corp', 'Geospace Technologies Corp', 'Canadian Solar Inc', 'Oaktree Specialty Lending Corp', 'Matthews International Corp', 'Clearsign Technologies Corp', "Children's Place Inc", 'Elys Game Technology Corp', 'Dada Nexus Ltd', 'ESCO Technologies Inc', 'Euroseas Ltd', 'Fangdd Network Group Ltd', 'Fangdd Network Group Ltd', 'Golden Ocean Group Ltd', 'Hoegh LNG Partners LP', 'Post Holdings Inc', 'Huize Holding Ltd', 'Haynes International Inc', "Macy's Inc", 'OneWater Marine Inc', 'OneWater Marine Inc', 'Woodward Inc', 'StealthGas Inc', 'Maximus Inc', 'Ross Stores Inc', 'Intuit Inc', 'Ooma Inc', 'Williams-Sonoma Inc', 'Precipio Inc', 'NetEase Inc', 'Workday Inc', 'i3 Verticals Inc', 'Knot Offshore Partners LP', 'Maxeon Solar Technologies Ltd', 'Opera Ltd', 'Puxin Ltd', 'Puxin Ltd']
注意:您不需要使用硒,因为它会减慢您的任务

此外,我发现没有理由导入一个巨大的库(如pandas)来读取一个HTML表

您只需通过以下代码选择目标,即可获得准确的通话日期:

输出:


由于您的问题与硒有关:

你应该看看周围

在等待HTML源代码中所有元素的显示时,应使用以下代码进行描述:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


def main(url):
    driver = webdriver.Firefox()
    driver.get(url)
    try:
        cnames = [x.text for x in WebDriverWait(driver, 10).until(
            EC.presence_of_all_elements_located(
                (By.CSS_SELECTOR, "td[aria-label='Company']"))
        )]
    finally:
        print(cnames)
        driver.quit()


main("https://finance.yahoo.com/calendar/earnings")
输出:

['111 Inc', '360 DigiTech Inc', 'American Software Inc', 'American Software Inc', 'Corporacion America Airports SA', 'Atkore International Group Inc', 'Atkore International Group Inc', 'Helmerich and Payne Inc', 'Amtech Systems Inc', 'Amtech Systems Inc', 'Delta Apparel Inc', 'Delta Apparel Inc', 'Bellring Brands Inc', 'Berry Global Group Inc', 'Beacon Roofing Supply Inc', 'Natural Grocers By Vitamin Cottage Inc', "BJ's Wholesale Club Holdings Inc", 'Entera Bio Ltd', 'SG Blocks Inc', 'SG Blocks Inc', 'BEST Inc', 'Brady Corp', 'BioHiTech Global Inc', 'BioHiTech Global Inc', 'Oaktree Strategic Income Corporation', 'Caleres Inc', 'Pennantpark Investment Corp', 'Geospace Technologies Corp', 'Canadian Solar Inc', 'Oaktree Specialty Lending Corp', 'Matthews International Corp', 'Clearsign Technologies Corp', "Children's Place Inc", 'Elys Game Technology Corp', 'Dada Nexus Ltd', 'ESCO Technologies Inc', 'Euroseas Ltd', 'Fangdd Network Group Ltd', 'Fangdd Network Group Ltd', 'Golden Ocean Group Ltd', 'Hoegh LNG Partners LP', 'Post Holdings Inc', 'Huize Holding Ltd', 'Haynes International Inc', "Macy's Inc", 'OneWater Marine Inc', 'OneWater Marine Inc', 'Woodward Inc', 'StealthGas Inc', 'Maximus Inc', 'Ross Stores Inc', 'Intuit Inc', 'Ooma Inc', 'Williams-Sonoma Inc', 'Precipio Inc', 'NetEase Inc', 'Workday Inc', 'i3 Verticals Inc', 'Knot Offshore Partners LP', 'Maxeon Solar Technologies Ltd', 'Opera Ltd', 'Puxin Ltd', 'Puxin Ltd']
注意:您不需要使用硒,因为它会减慢您的任务

此外,我发现没有理由导入一个巨大的库(如pandas)来读取一个HTML表

您只需通过以下代码选择目标,即可获得准确的通话日期:

输出:



熊猫以什么方式成为一个巨大的图书馆?它在从站点读取表格时不是特别有用吗?@abrahmed pandas在背景中包括多个lib,比如numpy。甚至pd.read_html也在后台使用请求库。因此,导入pandas并将其用于只读html是不符合逻辑的。另外,我认为回答问题的合乎逻辑的方式是先被问,然后再为对方提供另一种方式对不起,这对我来说毫无意义……但是……很高兴知道。干杯。@ahmedamerican我以前也这么认为。所以,我在meta上问。这里没有这样的规定@Abrahamed我刚刚注意到你的帖子是2天前发的。顺便说一句,你收到的回应是基于你为观众分享的不同例子。但让我为你确认一下。如果你问我关于x的问题,那么在给你y之前,我应该返回给你一个关于x的答案。在这种情况下,OP和观众将得到确切的答案。熊猫是如何成为一个巨大的图书馆的?它在从站点读取表格时不是特别有用吗?@abrahmed pandas在背景中包括多个lib,比如numpy。甚至pd.read_html也在后台使用请求库。因此,导入pandas并将其用于只读html是不符合逻辑的。另外,我认为回答问题的合乎逻辑的方式是先被问,然后再为对方提供另一种方式对不起,这对我来说毫无意义……但是……很高兴知道。干杯。@ahmedamerican我以前也这么认为。所以,我在meta上问。这里没有这样的规定@Abrahamed我刚刚注意到你的帖子是2天前发的。顺便说一句,你收到的回应是基于你为观众分享的不同例子。但让我为你确认一下。如果你问我关于x的问题,那么在给你y之前,我应该返回给你一个关于x的答案。在这种情况下,OP和观众将得到确切的观点。这正是我想要的-谢谢!我想将两个日期的信息附加到同一数据框中。我已经尝试过这个代码,但无法使其工作。你能帮忙吗?导入datetime导入pandas作为pd date=datetime.date.today+datetime.timedeltadays=1.isoformat根据需要获取iso格式的明天“对于范围内的i 2:try:date=date=datetime.date.today+datetime.timedeltadays=i.isoformat根据需要获取iso格式的明天”pd.set_option“display.max_column”,无url=pd.read_html,header=0 table=url[0]table.appendtable printtable,ValueError除外:continue@JuneSmith这很容易做到,但是,我建议你问一个新问题,并附上表格截图。我认为在评论中回答这个问题并不合适。谢谢我在这里提出了一个问题:在html中循环时,抓取和附加数据tables@JuneSmith再次检查URL,确保你在问题中使用了pandas标签。这正是我想要的
或者-谢谢!我想将两个日期的信息附加到同一数据框中。我已经尝试过这个代码,但无法使其工作。你能帮忙吗?导入datetime导入pandas作为pd date=datetime.date.today+datetime.timedeltadays=1.isoformat根据需要获取iso格式的明天“对于范围内的i 2:try:date=date=datetime.date.today+datetime.timedeltadays=i.isoformat根据需要获取iso格式的明天”pd.set_option“display.max_column”,无url=pd.read_html,header=0 table=url[0]table.appendtable printtable,ValueError除外:continue@JuneSmith这很容易做到,但是,我建议你问一个新问题,并附上表格截图。我认为在评论中回答这个问题并不合适。谢谢我在这里提出了一个问题:在html中循环时,抓取和附加数据tables@JuneSmith再次检查URL,确保你在问题中使用了熊猫标签。谢谢亲爱的Berdan!!!非常感谢亲爱的伯丹!!!令人惊叹的