Python 如何获取标记下的文本
我想把文字放在标签下面 我尝试了几种不同的选择:Python 如何获取标记下的文本,python,python-3.x,selenium,dom,selenium-webdriver,Python,Python 3.x,Selenium,Dom,Selenium Webdriver,我想把文字放在标签下面 我尝试了几种不同的选择: dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/text()") dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/b[1]/text()") 我的代码: dneyot=driver.find_elements_by_xpath("//*[starts-w
dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/text()")
dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/b[1]/text()")
我的代码:
dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]/text()")
for spisok in dneyot:
print("Период показов >3 дней", spisok.text)
UPD:
我使用以下方法在浏览器中查找所需的项目:
//*[starts-with(@id, 'popover-')]/text()[1]
但是得到错误
selenium.common.exceptions.InvalidSelectorException:
Message: invalid selector: The result of the xpath expression "//*[starts-with(@id, 'popover-')]/text()[1]" is: [object Text]. It should be an element.
使用Beautifulsoup:
在父级div
中查找id=popover-34252127的div
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.your_url_here.com/")
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.find("div", {"id": "popover-34252127"})
print(data)
使用Beautifulsoup:
在父级div
中查找id=popover-34252127的div
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.your_url_here.com/")
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.find("div", {"id": "popover-34252127"})
print(data)
find\u elements\u by\u xpath()
返回一个webelement—selenium实际使用的基本对象。xpath以
/text()
结尾,它将返回xml文档中节点的文本内容,而不是对象。因此,只需将其更改为不具有该后缀,该后缀将返回元素本身,并通过在Python中调用.text
获取其(元素的)文本:
dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in dneyot:
print("Период показов >3 дней", element.text)
find\u elements\u by\u xpath()
返回一个webelement—selenium实际使用的基本对象。xpath以
/text()
结尾,它将返回xml文档中节点的文本内容,而不是对象。因此,只需将其更改为不具有该后缀,该后缀将返回元素本身,并通过在Python中调用.text
获取其(元素的)文本:
dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in dneyot:
print("Период показов >3 дней", element.text)
text()
返回文本节点,selenium不知道如何处理它,它只能处理WebElement
s。您需要获取id为“popover”的元素的文本,并处理返回的文本
elements = driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in elements:
lines = element.text.split('\n')
for line in lines:
print("Период показов >3 дней", line)
text()
返回文本节点,selenium不知道如何处理它,它只能处理WebElement
s。您需要获取id为“popover”的元素的文本,并处理返回的文本
elements = driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in elements:
lines = element.text.split('\n')
for line in lines:
print("Период показов >3 дней", line)
如果要获取不包含
节点文本的文本,则需要使用以下XPath:
//div[starts-with(@id, 'popover-')]
它将标识div节点,然后通过使用find\u elements\u by\u xpath()
方法,您可以从div节点检索所有文本。请尝试以下代码:
elements = driver.find_elements_by_xpath("//div[starts-with(@id, 'popover-')]")
for element in elements:
print(element.text)
更新:
我怀疑,上述方法可能不起作用,我们可能无法使用常规方法识别/获取该数据-在这种情况下,您需要使用JavaScriptExecutor获取以下数据:
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")
xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
lenght = int(driver.execute_script("return arguments[0].childNodes.length;", element));
for i in range(1, lenght + 1, 1):
try:
data = str(driver.execute_script("return arguments[0].childNodes["+str(i)+"].textContent;", element)).strip();
if data != None and data != '':
print data
except:
print "=> Can't print some data..."
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")
xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
# For print b1 text
b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
print b1Text
# For printing b2 text
b2Text = driver.execute_script("return arguments[0].childNodes[6].textContent", element);
print b2Text
print("=> Done...")
由于您的站点是用英语以外的其他语言编写的,因此您可能无法打印/获取某些数据
要获取特定子节点数据,需要执行以下操作:
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")
xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
lenght = int(driver.execute_script("return arguments[0].childNodes.length;", element));
for i in range(1, lenght + 1, 1):
try:
data = str(driver.execute_script("return arguments[0].childNodes["+str(i)+"].textContent;", element)).strip();
if data != None and data != '':
print data
except:
print "=> Can't print some data..."
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")
xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
# For print b1 text
b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
print b1Text
# For printing b2 text
b2Text = driver.execute_script("return arguments[0].childNodes[6].textContent", element);
print b2Text
print("=> Done...")
我希望它能有所帮助……如果您希望获得不包含
节点文本的文本,那么您需要使用以下XPath:
//div[starts-with(@id, 'popover-')]
它将标识div节点,然后通过使用find\u elements\u by\u xpath()
方法,您可以从div节点检索所有文本。请尝试以下代码:
elements = driver.find_elements_by_xpath("//div[starts-with(@id, 'popover-')]")
for element in elements:
print(element.text)
更新:
我怀疑,上述方法可能不起作用,我们可能无法使用常规方法识别/获取该数据-在这种情况下,您需要使用JavaScriptExecutor获取以下数据:
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")
xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
lenght = int(driver.execute_script("return arguments[0].childNodes.length;", element));
for i in range(1, lenght + 1, 1):
try:
data = str(driver.execute_script("return arguments[0].childNodes["+str(i)+"].textContent;", element)).strip();
if data != None and data != '':
print data
except:
print "=> Can't print some data..."
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")
xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
# For print b1 text
b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
print b1Text
# For printing b2 text
b2Text = driver.execute_script("return arguments[0].childNodes[6].textContent", element);
print b2Text
print("=> Done...")
由于您的站点是用英语以外的其他语言编写的,因此您可能无法打印/获取某些数据
要获取特定子节点数据,需要执行以下操作:
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")
xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
lenght = int(driver.execute_script("return arguments[0].childNodes.length;", element));
for i in range(1, lenght + 1, 1):
try:
data = str(driver.execute_script("return arguments[0].childNodes["+str(i)+"].textContent;", element)).strip();
if data != None and data != '':
print data
except:
print "=> Can't print some data..."
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")
xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
# For print b1 text
b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
print b1Text
# For printing b2 text
b2Text = driver.execute_script("return arguments[0].childNodes[6].textContent", element);
print b2Text
print("=> Done...")
我希望它有帮助…您可以使用正则表达式获取日期:
import re
#...
rePeriod = '(.*)(\\d{4}-\\d{2}-\\d{2} - \\d{4}-\\d{2}-\\d{2})(.*)'
dneyot = driver.find_elements_by_css_selector('div[id^="popover-"]')
for spisok in dneyot:
m = re.search(rePeriod, spisok.text)
print("Период показов >3 дней", m.group(2))
可以使用正则表达式获取日期:
import re
#...
rePeriod = '(.*)(\\d{4}-\\d{2}-\\d{2} - \\d{4}-\\d{2}-\\d{2})(.*)'
dneyot = driver.find_elements_by_css_selector('div[id^="popover-"]')
for spisok in dneyot:
m = re.search(rePeriod, spisok.text)
print("Период показов >3 дней", m.group(2))
请读一读为什么会这样。考虑使用格式化的基于文本的相关HTML、代码试验和错误堆栈跟踪更新问题。@ KeEpMon,尝试这个代码<代码>元素=驱动程序。考虑使用格式化的基于文本的相关HTML、代码试验和错误堆栈跟踪更新问题。@ KeEpMon,尝试这个代码<元素> =驱动程序。.text AttributeError:'list'对象没有属性'text'@keepomen应该是
驱动程序。通过\u xpath查找\u元素\u
我需要元素(全部)@keepomen更新了我的answerdneyot=driver。通过\u xpath查找\u元素(/*[以(@id,'poover-'))开头).text AttributeError:'list'对象没有属性'text'@keepomen应该是驱动程序。通过\u xpath查找\u元素\u
我需要元素(全部)@keepomen更新了我的答案它与您使用的定位器相同-如果您只想获取第一个
子节点,然后将其附加到xpath-/*[以(@id,'popover-')]/b[1]
。顺便说一句,您知道您当前的方法不会只打印周期超过3天的项目,对吗?我在寻找印刷品的内容。ааааааааааа它与您使用的定位器相同-如果您只想获取第一个
子节点,然后将其附加到xpath-/*[以(@id,'popover-')]/b[1]
。顺便说一句,您知道您当前的方法不会只打印周期超过3天的项目,对吗?我正在查找打印内容。::node()
无法使用Selenium。它只支持XPath 1.0。@JeffC你说得对,Selenium不支持::node()
,::node()
不能与Selenium一起使用。它只支持XPath 1.0。@JeffC你说得对,Selenium不支持::node()
@keepomen,添加了获取b1
或b2
文本的代码。检查并让我知道它是否工作?@keepomen,添加了获取b1
或b2
文本的代码。检查并让我知道它是否工作?