Java 使用htmlunit访问html表
我想访问html文件中包含的表。这是我的密码:Java 使用htmlunit访问html表,java,htmlunit,Java,Htmlunit,我想访问html文件中包含的表。这是我的密码: import java.io.*; import com.gargoylesoftware.htmlunit.html.HtmlPage; import com.gargoylesoftware.htmlunit.html.HtmlTable; import com.gargoylesoftware.htmlunit.html.*; import com.gargoylesoftware.htmlunit.WebClient;
import java.io.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTable;
import com.gargoylesoftware.htmlunit.html.*;
import com.gargoylesoftware.htmlunit.WebClient;
public class test {
public static void main(String[] args) throws Exception {
WebClient client = new WebClient();
HtmlPage currentPage = client.getPage("http://www.mysite.com");
client.waitForBackgroundJavaScript(10000);
final HtmlDivision div = (HtmlDivision) currentPage.getByXPath("//div[@id='table-matches-time']");
String textSource = div.toString();
//String textSource = currentPage.asXml();
FileWriter fstream = new FileWriter("index.txt");
BufferedWriter out = new BufferedWriter(fstream);
out.write(textSource);
out.close();
client.closeAllWindows();
}
}
该表的格式如下:
<div id="table-matches-time" class="">
<table class=" table-main">
如何读取此表?看起来您的查询返回的是节点列表,而不是单个div。是否有多个id为的项?替换此部分代码:
(HtmlDivision) currentPage.getByXPath("//div[@id='table-matches-time']");
与:
第一个方法将始终返回一个元素集合,即使它是一个,而第二个方法将始终返回一个元素,即使有更多元素
编辑:
由于有两个元素具有相同的id
(这是不可取的),因此应使用以下方法:
(HtmlDivision) currentPage.getByXPath("//div[@id='table-matches-time']").get(0);
这样,您将获得集合的第一个元素.get(1)
会给你第二个。这很有效(并返回一个csv文件;):
import java.io.*;
导入com.gargoylesoftware.htmlunit.html.HtmlPage;
导入com.gargoylesoftware.htmlunit.html.HtmlTable;
导入com.gargoylesoftware.htmlunit.html.HtmlTableRow;
导入com.gargoylesoftware.htmlunit.html.*;
导入com.gargoylesoftware.htmlunit.WebClient;
公开课考试{
公共静态void main(字符串[]args)引发异常{
WebClient客户端=新的WebClient();
HtmlPage currentPage=client.getPage(“http://www.mysite.com");
client.waitForBackgroundJavaScript(10000);
FileWriter fstream=新的FileWriter(“index.txt”);
BufferedWriter out=新的BufferedWriter(fstream);
对于(int i=0;ibut),有两个同名的表。此返回值为:>HtmlDivision[]但是我会得到整个表。@emanuele你是说你想要
元素而不是DIV?返回的DIV中有表。我的意思是我想要一个包含表中所有数据的文本文件,但我不知道怎么做。
(HtmlDivision) currentPage.getFirstByXPath("//div[@id='table-matches-time']");
(HtmlDivision) currentPage.getByXPath("//div[@id='table-matches-time']").get(0);
import java.io.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTable;
import com.gargoylesoftware.htmlunit.html.HtmlTableRow;
import com.gargoylesoftware.htmlunit.html.*;
import com.gargoylesoftware.htmlunit.WebClient;
public class test {
public static void main(String[] args) throws Exception {
WebClient client = new WebClient();
HtmlPage currentPage = client.getPage("http://www.mysite.com");
client.waitForBackgroundJavaScript(10000);
FileWriter fstream = new FileWriter("index.txt");
BufferedWriter out = new BufferedWriter(fstream);
for (int i=0;i<2;i++){
final HtmlTable table = (HtmlTable) currentPage.getByXPath("//table[@class=' table-main']").get(i);
for (final HtmlTableRow row : table.getRows()) {
for (final HtmlTableCell cell : row.getCells()) {
out.write(cell.asText()+',');
}
out.write('\n');
}
}
out.close();
client.closeAllWindows();
}
}