Java 在WebDriver中，如何从由图标图像分隔的段落中提取文本段_Java_Selenium Webdriver

Java 在WebDriver中，如何从由图标图像分隔的段落中提取文本段

java selenium-webdriver

Java 在WebDriver中，如何从由图标图像分隔的段落中提取文本段,java,selenium-webdriver,Java,Selenium Webdriver,我正在我们的网页上写一个硒测试。有一个标签字段，它有一对多个文本段，由右插入符号图标分隔。我试图将标签中的各个文本段提取到列表中这就是DOM中html的外观。在这种情况下，有3个单独的文本段：“MainSchedule”、“Container1”和“Container1.2” 但是，当我尝试执行getText（）off-of-label时，它将在一个字符串中返回所有3个文本段，并且不间断显示图像图标的位置使用Chrome工具，我可以查看元素的属性，在“p.MuiTypography-root

我正在我们的网页上写一个硒测试。有一个标签字段，它有一对多个文本段，由右插入符号图标分隔。我试图将标签中的各个文本段提取到列表中

这就是DOM中html的外观。在这种情况下，有3个单独的文本段：“MainSchedule”、“Container1”和“Container1.2”

但是，当我尝试执行getText（）off-of-label时，它将在一个字符串中返回所有3个文本段，并且不间断显示图像图标的位置

使用Chrome工具，我可以查看元素的属性，在“p.MuiTypography-root”上，我看到“firstChild”文本内容是第一个文本段“MainSchedule”。我试过了

label.findElement(By.xpath("first-child"))

它只是抛出了一个错误。从“第一个孩子”开始，我可以逐步浏览Chrome工具中的“nextSibling”，并找到保存单个文本段的工具。但我还没有弄明白如何编写代码来阅读它们

我正在用java编写测试。

您不能直接在Selenium中进行测试，因为您需要返回文本片段，Selenium Finder都返回web元素

但是，您可以使用xpath选择器来执行此操作，它将返回所需的特定文本片段。基本方法是xpath选择器，如下所示：

//p[contains(@class, 'MuiTypography-root')]/text()[position() = 1]

String expressionTwo = "//p[contains(@class, 'MuiTypography-root')]";
WebElement element1 = driver.findElement(By.xpath(expressionTwo));
String html = element1.getAttribute("innerHTML").replace('\n', ' ');
String[] items = html.split("<svg .*?</svg>");
for (String item : items) {
    System.out.println(item.trim());
}

这将返回

元素中的第一个文本片段-因此，这（在删除多余的空白后）：

如何使用上述xpath选择器？我们将更改上面的“1”，使其不是硬编码的；我们将确定需要提取的可能文本片段的数量，并相应地构建一个循环

我们使用Java中提供的xpath类和解析器，如下所示：

import java.io.IOException;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.xml.sax.SAXException;

...

// assuming Firefox (I guess you are using Chrome):
System.setProperty("webdriver.gecko.driver", "your/path/here/geckodriver.exe");
WebDriver driver = new FirefoxDriver();
String uri = "your URL in here";
driver.navigate().to(uri);

// Here is where we use the Java parser and xpath classes:
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(uri);
XPath xPath = XPathFactory.newInstance().newXPath();

// count how many <svg> tags there are.
String svgCounter = "count(//p[contains(@class, 'MuiTypography-root')]/svg)";
String count = xPath.compile(svgCounter).evaluate(doc);
// There can be up to this many pieces of text we need to extract:
int max = Integer.parseInt(count) + 1;

String expressionOne = "//p[contains(@class, 'MuiTypography-root')]/text()[position() = %s]";

for (int i = 1; i <= max; i++) {
    String result = xPath.compile(String.format(expressionOne, i)).evaluate(doc).trim();
    if (!result.isBlank()) {
        System.out.println(result);
    }
}

driver.quit();

注意事项：

（1）此方法假设您有一个格式良好的HTML文档，可以在此步骤进行解析：

Document doc = docBuilder.parse(uri);

（2）上面的代码假设有一个

元素具有未指定数量的子

标记。如果页面中有多个这样的

元素，则需要相应地调整上述代码，以便逐个处理每个

元素

（3）如果没有格式良好的HTML文档，上述方法可能会失败。在这种情况下，您可以采用一种更黑客的方法，但实际上并不推荐使用它，因为它涉及使用正则表达式拆分HTML字符串，这几乎不是一个好主意。通常，这会很脆弱，并以令人惊讶的方式失败

黑客是这样的：

//p[contains(@class, 'MuiTypography-root')]/text()[position() = 1]

String expressionTwo = "//p[contains(@class, 'MuiTypography-root')]";
WebElement element1 = driver.findElement(By.xpath(expressionTwo));
String html = element1.getAttribute("innerHTML").replace('\n', ' ');
String[] items = html.split("<svg .*?</svg>");
for (String item : items) {
    System.out.println(item.trim());
}

String expressionTwo=“/p[contains（@class，'muityprography root'）”；
WebElement element1=driver.findElement（By.xpath（expressionTwo））；
字符串html=element1.getAttribute（“innerHTML”）.replace（'\n'，''）；
String[]items=html.split（“谢谢Andrew。使用该格式的位置将始终遵循该格式。（除非开发人员改变主意：D）。我确实使用innerHTML解决了它。与您展示的不太一样，但类似。只要他们不更改格式，这应该是可以的，无论如何我都必须更新此格式。再次感谢。@AllenAshe-很高兴您找到了解决方案！通常，如果您提出问题，然后找到自己的解决方案，我们鼓励您也发布自己的解决方案如果你愿意，你甚至可以将自己的答案标记为“已接受”。
Document doc = docBuilder.parse(uri);

String expressionTwo = "//p[contains(@class, 'MuiTypography-root')]";
WebElement element1 = driver.findElement(By.xpath(expressionTwo));
String html = element1.getAttribute("innerHTML").replace('\n', ' ');
String[] items = html.split("<svg .*?</svg>");
for (String item : items) {
    System.out.println(item.trim());
}