Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/332.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何获取标记下的文本_Python_Python 3.x_Selenium_Dom_Selenium Webdriver - Fatal编程技术网

Python 如何获取标记下的文本

Python 如何获取标记下的文本,python,python-3.x,selenium,dom,selenium-webdriver,Python,Python 3.x,Selenium,Dom,Selenium Webdriver,我想把文字放在标签下面 我尝试了几种不同的选择: dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/text()") dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/b[1]/text()") 我的代码: dneyot=driver.find_elements_by_xpath("//*[starts-w

我想把文字放在标签下面

我尝试了几种不同的选择:

dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/text()")
dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/b[1]/text()")
我的代码:

dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/text()")
for spisok in dneyot:
    print("Период показов >3 дней", spisok.text)
UPD: 我使用以下方法在浏览器中查找所需的项目:

//*[starts-with(@id, 'popover-')]/text()[1]
但是得到错误

    selenium.common.exceptions.InvalidSelectorException:
Message: invalid selector: The result of the xpath expression "//*[starts-with(@id, 'popover-')]/text()[1]" is: [object Text]. It should be an element.
使用Beautifulsoup:

在父级
div
中查找id=popover-34252127的
div

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.your_url_here.com/")

soup = BeautifulSoup(page.content, 'html.parser')
data = soup.find("div", {"id": "popover-34252127"})
print(data)
使用Beautifulsoup:

在父级
div
中查找id=popover-34252127的
div

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.your_url_here.com/")

soup = BeautifulSoup(page.content, 'html.parser')
data = soup.find("div", {"id": "popover-34252127"})
print(data)

find\u elements\u by\u xpath()
返回一个webelement—selenium实际使用的基本对象。
xpath以
/text()
结尾,它将返回xml文档中节点的文本内容,而不是对象。因此,只需将其更改为不具有该后缀,该后缀将返回元素本身,并通过在Python中调用
.text
获取其(元素的)文本:

dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in dneyot:
    print("Период показов >3 дней", element.text)

find\u elements\u by\u xpath()
返回一个webelement—selenium实际使用的基本对象。
xpath以
/text()
结尾,它将返回xml文档中节点的文本内容,而不是对象。因此,只需将其更改为不具有该后缀,该后缀将返回元素本身,并通过在Python中调用
.text
获取其(元素的)文本:

dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in dneyot:
    print("Период показов >3 дней", element.text)
text()
返回文本节点,selenium不知道如何处理它,它只能处理
WebElement
s。您需要获取id为“popover”的元素的文本,并处理返回的文本

elements = driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in elements:
    lines = element.text.split('\n')
    for line in lines:
        print("Период показов >3 дней", line)
text()
返回文本节点,selenium不知道如何处理它,它只能处理
WebElement
s。您需要获取id为“popover”的元素的文本,并处理返回的文本

elements = driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in elements:
    lines = element.text.split('\n')
    for line in lines:
        print("Период показов >3 дней", line)

如果要获取不包含
节点文本的文本,则需要使用以下XPath:

//div[starts-with(@id, 'popover-')]
它将标识div节点,然后通过使用
find\u elements\u by\u xpath()
方法,您可以从div节点检索所有文本。请尝试以下代码:

elements = driver.find_elements_by_xpath("//div[starts-with(@id, 'popover-')]") 
for element in elements:
    print(element.text)
更新:

我怀疑,上述方法可能不起作用,我们可能无法使用常规方法识别/获取该数据-在这种情况下,您需要使用JavaScriptExecutor获取以下数据:

driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    lenght = int(driver.execute_script("return arguments[0].childNodes.length;", element));
    for i in range(1, lenght + 1, 1):
        try:
            data = str(driver.execute_script("return arguments[0].childNodes["+str(i)+"].textContent;", element)).strip();
            if data != None and data != '':
                print data
        except:
            print "=> Can't print some data..."
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    # For print b1 text
    b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
    print b1Text

    # For printing b2 text
    b2Text = driver.execute_script("return arguments[0].childNodes[6].textContent", element);
    print b2Text

print("=> Done...")
由于您的站点是用英语以外的其他语言编写的,因此您可能无法打印/获取某些数据

要获取特定子节点数据,需要执行以下操作:

driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    lenght = int(driver.execute_script("return arguments[0].childNodes.length;", element));
    for i in range(1, lenght + 1, 1):
        try:
            data = str(driver.execute_script("return arguments[0].childNodes["+str(i)+"].textContent;", element)).strip();
            if data != None and data != '':
                print data
        except:
            print "=> Can't print some data..."
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    # For print b1 text
    b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
    print b1Text

    # For printing b2 text
    b2Text = driver.execute_script("return arguments[0].childNodes[6].textContent", element);
    print b2Text

print("=> Done...")

我希望它能有所帮助……

如果您希望获得不包含
节点文本的文本,那么您需要使用以下XPath:

//div[starts-with(@id, 'popover-')]
它将标识div节点,然后通过使用
find\u elements\u by\u xpath()
方法,您可以从div节点检索所有文本。请尝试以下代码:

elements = driver.find_elements_by_xpath("//div[starts-with(@id, 'popover-')]") 
for element in elements:
    print(element.text)
更新:

我怀疑,上述方法可能不起作用,我们可能无法使用常规方法识别/获取该数据-在这种情况下,您需要使用JavaScriptExecutor获取以下数据:

driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    lenght = int(driver.execute_script("return arguments[0].childNodes.length;", element));
    for i in range(1, lenght + 1, 1):
        try:
            data = str(driver.execute_script("return arguments[0].childNodes["+str(i)+"].textContent;", element)).strip();
            if data != None and data != '':
                print data
        except:
            print "=> Can't print some data..."
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    # For print b1 text
    b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
    print b1Text

    # For printing b2 text
    b2Text = driver.execute_script("return arguments[0].childNodes[6].textContent", element);
    print b2Text

print("=> Done...")
由于您的站点是用英语以外的其他语言编写的,因此您可能无法打印/获取某些数据

要获取特定子节点数据,需要执行以下操作:

driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    lenght = int(driver.execute_script("return arguments[0].childNodes.length;", element));
    for i in range(1, lenght + 1, 1):
        try:
            data = str(driver.execute_script("return arguments[0].childNodes["+str(i)+"].textContent;", element)).strip();
            if data != None and data != '':
                print data
        except:
            print "=> Can't print some data..."
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    # For print b1 text
    b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
    print b1Text

    # For printing b2 text
    b2Text = driver.execute_script("return arguments[0].childNodes[6].textContent", element);
    print b2Text

print("=> Done...")

我希望它有帮助…

您可以使用正则表达式获取日期:

import re

#...

rePeriod = '(.*)(\\d{4}-\\d{2}-\\d{2} - \\d{4}-\\d{2}-\\d{2})(.*)'

dneyot = driver.find_elements_by_css_selector('div[id^="popover-"]')
for spisok in dneyot:
    m = re.search(rePeriod, spisok.text)
    print("Период показов >3 дней", m.group(2))

可以使用正则表达式获取日期:

import re

#...

rePeriod = '(.*)(\\d{4}-\\d{2}-\\d{2} - \\d{4}-\\d{2}-\\d{2})(.*)'

dneyot = driver.find_elements_by_css_selector('div[id^="popover-"]')
for spisok in dneyot:
    m = re.search(rePeriod, spisok.text)
    print("Период показов >3 дней", m.group(2))

请读一读为什么会这样。考虑使用格式化的基于文本的相关HTML、代码试验和错误堆栈跟踪更新问题。@ KeEpMon,尝试这个代码<代码>元素=驱动程序。考虑使用格式化的基于文本的相关HTML、代码试验和错误堆栈跟踪更新问题。@ KeEpMon,尝试这个代码<元素> =驱动程序。.text AttributeError:'list'对象没有属性'text'@keepomen应该是
驱动程序。通过\u xpath查找\u元素\u
我需要元素(全部)@keepomen更新了我的answerdneyot=driver。通过\u xpath查找\u元素(/*[以(@id,'poover-'))开头).text AttributeError:'list'对象没有属性'text'@keepomen应该是
驱动程序。通过\u xpath查找\u元素\u
我需要元素(全部)@keepomen更新了我的答案它与您使用的定位器相同-如果您只想获取第一个
子节点,然后将其附加到xpath-
/*[以(@id,'popover-')]/b[1]
。顺便说一句,您知道您当前的方法不会只打印周期超过3天的项目,对吗?我在寻找印刷品的内容。ааааааааааа它与您使用的定位器相同-如果您只想获取第一个
子节点,然后将其附加到xpath-
/*[以(@id,'popover-')]/b[1]
。顺便说一句,您知道您当前的方法不会只打印周期超过3天的项目,对吗?我正在查找打印内容。
::node()
无法使用Selenium。它只支持XPath 1.0。@JeffC你说得对,Selenium不支持
::node()
::node()
不能与Selenium一起使用。它只支持XPath 1.0。@JeffC你说得对,Selenium不支持
::node()
@keepomen,添加了获取
b1
b2
文本的代码。检查并让我知道它是否工作?@keepomen,添加了获取
b1
b2
文本的代码。检查并让我知道它是否工作?