Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/google-app-engine/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从span标记中提取文本3127 N University St,Peoria,IL 61604_Python_Selenium - Fatal编程技术网

Python 如何从span标记中提取文本3127 N University St,Peoria,IL 61604

Python 如何从span标记中提取文本3127 N University St,Peoria,IL 61604,python,selenium,Python,Selenium,我正在尝试使用Python和Selenium开发一个web爬虫程序。当我尝试使用下面的代码解析页面时,返回一个false元素 from selenium import webdriver from selenium.webdriver.firefox.firefox_binary import FirefoxBinary capabilities = webdriver.DesiredCapabilities().FIREFOX capabilities["marionette"] = Tru

我正在尝试使用Python和Selenium开发一个web爬虫程序。当我尝试使用下面的代码解析页面时,返回一个false元素

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

capabilities = webdriver.DesiredCapabilities().FIREFOX
capabilities["marionette"] = True
binary = FirefoxBinary('C:/Program Files/Mozilla Firefox/firefox.exe')
driver = webdriver.Firefox(firefox_binary=binary, capabilities=capabilities, executable_path="C:\\Users\\19548\\AppData\\Local\\Programs\\Python\\Python37\\geckodriver.exe")
driver.get("https://www.google.com/search?sxsrf=ACYBGNT9OH8ZZcClzMK-BMwxesqsKeHyTg:1575693566606&q=google+maps+secure+dental&npsic=0&rflfq=1&rlha=0&rllag=41148676,-90063976,60206&tbm=lcl&ved=2ahUKEwjHtb_626LmAhXjzVkKHTpMCLAQtgN6BAgLEAQ&tbs=lrf:!1m4!1u3!2m2!3m1!1e1!1m5!1u15!2m2!15m1!1shas_1wheelchair_1accessible_1entrance!4e2!2m1!1e3!3sIAE,lf:1,lf_ui:4&rldoc=1#rlfi=hd:;si:16368180629414227255,l,Chlnb29nbGUgbWFwcyBzZWN1cmUgZGVudGFsIgOIAQFIxLbOi6yPgIAIWiYKDXNlY3VyZSBkZW50YWwQABABGAAYASINc2VjdXJlIGRlbnRhbA;mv:[[41.6797015,-86.9763612],[39.655607599999996,-90.7386324]]")
element=driver.find_element_by_xpath("""//*[@id="akp_tsuid2"]/div/div/div/div/div/div[1]/div/div[1]/div/div[2]/div/div[2]/div/div/span[2]""")
paragraphs=driver.find_element_by_xpath("""//*[@id="akp_tsuid2"]/div/div/div/div/div/div[1]/div/div[1]/div/div[2]/div/div[2]/div/div/span[2]""")
print (paragraphs.text)
要提取文本3127 N University St,Peoria,IL 61604,United States,您必须为位于()的元素的可见性引入WebDriverWait,并且您可以使用以下任一项:

  • 使用
    CSS\u选择器
    和文本属性:

    driver.get('https://www.google.com/search?sxsrf=ACYBGNT9OH8ZZcClzMK-BMwxesqsKeHyTg:1575693566606&q=google+maps+secure+dental&npsic=0&rflfq=1&rlha=0&rllag=41148676,-90063976,60206&tbm=lcl&ved=2ahUKEwjHtb_626LmAhXjzVkKHTpMCLAQtgN6BAgLEAQ&tbs=lrf:!1m4!1u3!2m2!3m1!1e1!1m5!1u15!2m2!15m1!1shas_1wheelchair_1accessible_1entrance!4e2!2m1!1e3!3sIAE,lf:1,lf_ui:4&rldoc=1#rlfi=hd:;si:16368180629414227255,l,Chlnb29nbGUgbWFwcyBzZWN1cmUgZGVudGFsIgOIAQFIxLbOi6yPgIAIWiYKDXNlY3VyZSBkZW50YWwQABABGAAYASINc2VjdXJlIGRlbnRhbA;mv:[[41.6797015,-86.9763612],[39.655607599999996,-90.7386324]]')
    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.mod[data-attrid='kc:/location/location:address']>div>div>span:nth-child(2)"))).text)
    
  • 使用
    XPATH
    get\u attribute()

  • 控制台输出:

    3127 N University St, Peoria, IL 61604, United States
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

我通常通过找到标签,然后找到所需的文本,例如“地址:”然后是实际的街道地址来接近这样的定位器。这使得定位器更干净,更容易阅读

对于此处的地址,可以使用XPath

//a[.='Address']//following::span
解释

相关的HTML如下所示

<div class="zloOqf PZPZlf" data-dtype="d3ifr" data-local-attribute="d3adr" data-ved="2ahUKEwiF7tvpzKTmAhVPOq0KHUoBD9wQghwoADAEegQIARAh">
    <span class="w8qArf">
        <a class="fl" href="..." data-ved="2ahUKEwiF7tvpzKTmAhVPOq0KHUoBD9wQ6BMwBHoECAEQIg">Address</a>:
    </span>
    <span class="LrzXr">3127 N University St, Peoria, IL 61604</span>
</div>
然后,我们找到下面的第一个SPAN标记

//a[.='Address']//following::span
这就是定位器。仅供参考,在定位器中指定的越少(在合理范围内),当页面发生更改时,定位器中断的可能性就越小

现在,您可以拖动该元素的
.text
,以获得所需内容。您可能需要添加等待,例如

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...

driver.get(...)
paragraph = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//a[.='Address']//following::span")))
print(paragraph.text)

了解更多信息。

您有什么问题?您需要我们做什么?我无法让脚本返回作为文本存储在span类LrzXrawesome中的地址我也要尝试这个方法,谢谢@CameronLong听起来不错,让我知道情况。
//a[.='Address']//following::span
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...

driver.get(...)
paragraph = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//a[.='Address']//following::span")))
print(paragraph.text)