如何从JavaWebScrapingAPI获取数据?
我正在尝试从以下url获取表数据: 我在jauntapi的帮助下编写了这段代码如何从JavaWebScrapingAPI获取数据?,java,web-scraping,htmlunit,jaunt-api,Java,Web Scraping,Htmlunit,Jaunt Api,我正在尝试从以下url获取表数据: 我在jauntapi的帮助下编写了这段代码 package org.open.browser; import com.jaunt.Element; import com.jaunt.Elements; import com.jaunt.JauntException; import com.jaunt.UserAgent; public class ICICIScraperDemo { public static void main(String
package org.open.browser;
import com.jaunt.Element;
import com.jaunt.Elements;
import com.jaunt.JauntException;
import com.jaunt.UserAgent;
public class ICICIScraperDemo {
public static void main(String ar[]) throws JauntException{
UserAgent userAgent = new UserAgent(); //create new userAgent (headless browser)
userAgent.visit("https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec");
Elements links = userAgent.doc.findEvery("<div class=expander>").findEvery("<a>"); //find search result links
String url = null;
for(Element link : links) {
if(link.innerHTML().equalsIgnoreCase("Company Details")){
url = link.getAt("href");
}
}
/*userAgent = new UserAgent(); */ //create new userAgent (headless browser)
userAgent.visit(url);
System.out.println(userAgent.getSource());
Elements results = userAgent.doc.findEvery("<tr>").findEvery("<td>");
System.out.println(results);
}
}
但这也没有给出结果
有人能帮助您在一次会话中从上述url和其他锚url获取数据吗?使用HtmlUnit您可以做到这一点
String url = "https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(1000);
final DomNodeList<DomNode> divs = page.querySelectorAll("div.bigcoll");
System.out.println(divs.get(1).asText());
}
stringurl=”https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";
try(final-WebClient-WebClient=new-WebClient(BrowserVersion.FIREFOX\u 60)){
HtmlPage=webClient.getPage(url);
webClient.waitForBackgroundJavaScript(1000);
final DomNodeList divs=page.querySelectorAll(“div.bigcoll”);
System.out.println(divs.get(1.asText());
}
有两件事值得一提:
- 您必须在getPage调用之后等待一段时间,因为有些部分是由javascript/AJAX创建的
- 在页面上查找元素的方法有很多(请参阅)。我只做了一个快速的修改来显示代码是有效的
标记都是JavaScript。你想得到什么?iframe中的真实数据加载-使用浏览器工具查看对的调用,例如,https://www.icicidirect.com/idirectcontent/basemasterpage/ContentDataHandler.ashx?icicicode=INFTEC
@stdunbar感谢您的评论,您的意思是创建表数据的url与我发布的url不同吗?这些API也不能从页面获取数据吗?如果您在加载右侧一个表的数据时看到了这个url,那么就是我要存储的数据。
String url = "https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(1000);
final DomNodeList<DomNode> divs = page.querySelectorAll("div.bigcoll");
System.out.println(divs.get(1).asText());
}