Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/383.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何从JavaWebScrapingAPI获取数据?_Java_Web Scraping_Htmlunit_Jaunt Api - Fatal编程技术网

如何从JavaWebScrapingAPI获取数据?

如何从JavaWebScrapingAPI获取数据?,java,web-scraping,htmlunit,jaunt-api,Java,Web Scraping,Htmlunit,Jaunt Api,我正在尝试从以下url获取表数据: 我在jauntapi的帮助下编写了这段代码 package org.open.browser; import com.jaunt.Element; import com.jaunt.Elements; import com.jaunt.JauntException; import com.jaunt.UserAgent; public class ICICIScraperDemo { public static void main(String

我正在尝试从以下url获取表数据:

我在jauntapi的帮助下编写了这段代码

package org.open.browser;

import com.jaunt.Element;
import com.jaunt.Elements;
import com.jaunt.JauntException;
import com.jaunt.UserAgent;

public class ICICIScraperDemo {

    public static void main(String ar[]) throws JauntException{
        
        UserAgent userAgent = new UserAgent();         //create new userAgent (headless browser)
        userAgent.visit("https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec");     
       Elements links = userAgent.doc.findEvery("<div class=expander>").findEvery("<a>");  //find search result links
        String url = null;
        for(Element link : links) {
            if(link.innerHTML().equalsIgnoreCase("Company Details")){
                  url = link.getAt("href");
            }
        }
        /*userAgent = new UserAgent(); */        //create new userAgent (headless browser)
        userAgent.visit(url);   
        System.out.println(userAgent.getSource());
        Elements results = userAgent.doc.findEvery("<tr>").findEvery("<td>");
          System.out.println(results);
    }
}
但这也没有给出结果


有人能帮助您在一次会话中从上述url和其他锚url获取数据吗?

使用HtmlUnit您可以做到这一点

    String url = "https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";

    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
        HtmlPage page = webClient.getPage(url);
        webClient.waitForBackgroundJavaScript(1000);

        final DomNodeList<DomNode> divs = page.querySelectorAll("div.bigcoll");
        System.out.println(divs.get(1).asText());
    }
stringurl=”https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";
try(final-WebClient-WebClient=new-WebClient(BrowserVersion.FIREFOX\u 60)){
HtmlPage=webClient.getPage(url);
webClient.waitForBackgroundJavaScript(1000);
final DomNodeList divs=page.querySelectorAll(“div.bigcoll”);
System.out.println(divs.get(1.asText());
}
有两件事值得一提:

  • 您必须在getPage调用之后等待一段时间,因为有些部分是由javascript/AJAX创建的
  • 在页面上查找元素的方法有很多(请参阅)。我只做了一个快速的修改来显示代码是有效的

你说的“不起作用”是什么意思?
标记都是JavaScript。你想得到什么?iframe中的真实数据加载-使用浏览器工具查看对的调用,例如,
https://www.icicidirect.com/idirectcontent/basemasterpage/ContentDataHandler.ashx?icicicode=INFTEC
@stdunbar感谢您的评论,您的意思是创建表数据的url与我发布的url不同吗?这些API也不能从页面获取数据吗?如果您在加载右侧一个表的数据时看到了这个url,那么就是我要存储的数据。
    String url = "https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";

    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
        HtmlPage page = webClient.getPage(url);
        webClient.waitForBackgroundJavaScript(1000);

        final DomNodeList<DomNode> divs = page.querySelectorAll("div.bigcoll");
        System.out.println(divs.get(1).asText());
    }