如何使用selenium'阅读带有无限滚动条的页面；java的html单元驱动程序？_Java_Html_Selenium

如何使用selenium'阅读带有无限滚动条的页面；java的html单元驱动程序？

java html selenium

如何使用selenium'阅读带有无限滚动条的页面；java的html单元驱动程序？,java,html,selenium,Java,Html,Selenium,例如，如果我想在facebook上找到一年前使用selenium发布的帖子，我如何才能向下滚动并获取文本。我已经知道了如何使用selenium进行滚动，但每当我尝试获取元素或页面源代码时，它只包含最初加载的页面，没有向下滚动的内容。我并不是在facebook上使用它，我是在一个没有java开发工具的网站上使用它，股票推特我在本例中遵循的逻辑是通过文本内容查找帖子 Let Allposts While timeout Get all the currently visib

例如，如果我想在facebook上找到一年前使用selenium发布的帖子，我如何才能向下滚动并获取文本。我已经知道了如何使用selenium进行滚动，但每当我尝试获取元素或页面源代码时，它只包含最初加载的页面，没有向下滚动的内容。我并不是在facebook上使用它，我是在一个没有java开发工具的网站上使用它，股票推特

我在本例中遵循的逻辑是通过文本内容查找帖子

  Let Allposts

    While timeout
     Get all the currently visible posts which has text in it
      Remove Allposts from currentPosts [So that we dont need to check the same post again]
       And add currentPosts to Allposts[To maintain a list]
         For each post in currentPosts
           check if post's text contains given text
           stop
       scroll to bottom[which invokes ajax call to load more posts]
       //Replace the above with any button like LoadMore or something if scroll dint invoke ajax load
       wait till the page loaded
    do it again

这对我来说非常有效，我在生日那天（1个月前）在墙上发现了一个帖子

这需要20分钟[取决于帖子数量和帖子时间，这需要更多时间]

以下内容将搜索您的facebook新闻订阅源中的给定文本

public static void fbSearch() {
    System.setProperty("webdriver.chrome.driver", "D:\\Galen\\chromedriver.exe");
    WebDriver driver = new ChromeDriver();
    driver.get("http://www.facebook.com");
    driver.findElement(By.name("email")).sendKeys("phystem");
    driver.findElement(By.name("pass")).sendKeys("yyy");
    driver.findElement(By.id("loginbutton")).click();
    waitForPageLoaded(driver);
    fbPostSearch(driver, "True Story", 20);//timeOut in Mins
}

public static Boolean fbPostSearch(WebDriver driver, String postContent, int timeOutInMins) {
    Set<WebElement> allPosts = new HashSet<>();
    int totalTime = timeOutInMins * 60000; // in millseconds
    long startTime = System.currentTimeMillis();
    boolean timeEnds = false;
    while (!timeEnds) {
        List<WebElement> posts = getPosts(driver);
        posts.removeAll(allPosts);//to remove old posts as we already searched it
        allPosts.addAll(posts);//append new posts to all posts
        for (WebElement post : posts) {
            String content = post.getText();
            if (content.contains(postContent)) {
                //this is our element
                System.out.println("Found");
                new Actions(driver).moveToElement(post).build().perform();
                ((JavascriptExecutor) driver).executeScript("arguments[0].style.outline='2px solid #ff0';", post);
                return true;
            }
        }
        scrollToBottom(driver);
        waitForPageLoaded(driver);
        timeEnds = (System.currentTimeMillis() - startTime >= totalTime);
    }
    System.out.println("Not Found");
    return false;
}

public static List<WebElement> getPosts(WebDriver driver) {
    //finding Posts which has textContent coz some posts are image only
    return driver.findElements(By.cssSelector("div._4-u2.mbm._5v3q._4-u8 div._5pbx.userContent"));
}

private static void scrollToBottom(WebDriver driver) {
    long longScrollHeight = (Long) ((JavascriptExecutor) driver).executeScript("return Math.max("
            + "document.body.scrollHeight, document.documentElement.scrollHeight,"
            + "document.body.offsetHeight, document.documentElement.offsetHeight,"
            + "document.body.clientHeight, document.documentElement.clientHeight);"
    );
    ((JavascriptExecutor) driver).executeScript("window.scrollTo(0, " + longScrollHeight + ");");
}

public static void waitForPageLoaded(WebDriver driver) {
    ExpectedCondition<Boolean> expectation = new ExpectedCondition<Boolean>() {
        @Override
        public Boolean apply(WebDriver driver) {
            return ((JavascriptExecutor) driver).executeScript(
                    "return document.readyState").equals("complete");
        }
    };
    WebDriverWait wait = new WebDriverWait(driver, 20);
    wait.until(expectation);
}

publicstaticvoidfbsearch（）{
System.setProperty（“webdriver.chrome.driver”，“D:\\Galen\\chromedriver.exe”）；
WebDriver驱动程序=新的ChromeDriver（）；
驱动程序。获取（“http://www.facebook.com");
driver.findElement（通过名称（“电子邮件”））.sendKeys（“phystem”）；
driver.findelelement（按名称（“pass”））.sendKeys（“yyy”）；
driver.findElement（By.id（“loginbutton”））.click（）；
waitForPageLoaded（驱动程序）；
fbPostSearch（驱动程序，“真实故事”，20）；//超时（分钟）
}
公共静态布尔fbPostSearch（WebDriver驱动程序、字符串postContent、int timeoutins）{
Set allPosts=new HashSet（）；
int totalTime=timeoutins*60000；//以毫秒为单位
long startTime=System.currentTimeMillis（）；
布尔timeEnds=false；
而（！timeEnds）{
列表帖子=获取帖子（驱动程序）；
removeAll（allPosts）；//删除我们已经搜索过的旧帖子
allPosts.addAll（posts）；//将新的posts附加到所有posts
for（WebElement post:posts）{
字符串内容=post.getText（）；
if（内容包含（后内容））{
//这是我们的元素
System.out.println（“找到”）；
新操作（驱动程序）.moveToElement（post.build（）.perform（）；
（（JavascriptExecutor）driver）.executeScript（“参数[0].style.outline='2px solid#ff0'；”，post）；
返回true；
}
}
scrollToBottom（驱动程序）；
waitForPageLoaded（驱动程序）；
timeEnds=（System.currentTimeMillis（）-startTime>=总时间）；
}
System.out.println（“未找到”）；
返回false；
}
公共静态列表getPosts（WebDriver驱动程序）{
//查找包含文本内容的帖子，因为某些帖子仅限于图像
返回驱动程序.findElements（由.cssSelector（“div.\u 4-u2.mbm.\u 5v3q.\u 4-u8 div.\u 5pbx.userContent”）；
}
私有静态无效scrollToBottom（WebDriver驱动程序）{
long longScrollHeight=（long）（（JavascriptExecutor）驱动程序）.executeScript（“返回数学.max（”
+document.body.scrollHeight，document.documentElement.scrollHeight
+document.body.offsetHeight，document.documentElement.offsetHeight
+document.body.clientHeight，document.documentElement.clientHeight）
);
（（JavascriptExecutor）driver.executeScript（“window.scrollTo（0，+longScrollHeight+”）；”；
}
公共静态无效waitForPageLoaded（WebDriver驱动程序）{
ExpectedCondition expectation=新的ExpectedCondition（）{
@凌驾
公共布尔应用（WebDriver驱动程序）{
返回（（JavascriptExecutor）驱动程序）.executeScript(
“return document.readyState”）。等于（“完成”）；
}
};
WebDriverWait wait=新的WebDriverWait（驱动程序，20）；
等待，直到（期望）；
}

一般来说，对于直接违反网站T&C的问题，您不会在SO上找到帮助：例如，我说过。。。我将它用于一个没有java用户开发工具的网站，stocktwits。我只是简单地使用facebook，因为这是最容易向大多数人描述的。@rmlan另外，请告诉我这是如何违反T&C的文本。如果我没有弄错的话，Facebook允许这样的事情。我看过一些关于以类似方式使用Facebook帖子的研究。这很公平。我取消了我的反对票。但是，在第3.2节中，T&Cs声明“未经我们事先许可，您不得使用自动方式（如捕获机器人、机器人、蜘蛛或刮刀）收集用户的内容或信息，或以其他方式访问Facebook。”使用selenium查找Facebook帖子符合此定义。@rmlan非常感谢！虽然它的速度还不够快，不能有太多的用处，但这是我使用的第一个有效的代码，非常感谢！