Java web搜索
我有谷歌搜索的代码:Java web搜索,java,search,web,Java,Search,Web,我有谷歌搜索的代码: int num_risultati=15; String only="+filetype%3Ahtml+OR+filetype%3Ahtm+OR+filetype%3Axhtm+OR+filetype%3Axhtml"; String google = "http://www.google.com/search?lr=lang_en&num="+num_risultati+"&q="+only; String search
int num_risultati=15;
String only="+filetype%3Ahtml+OR+filetype%3Ahtm+OR+filetype%3Axhtm+OR+filetype%3Axhtml";
String google = "http://www.google.com/search?lr=lang_en&num="+num_risultati+"&q="+only;
String search = "\"Java\" \"C\"";
String charset = "UTF-8";
String userAgent = "ExampleBot 1.0 (+http://example.com/bot)";
Elements links = Jsoup.connect(google + URLEncoder.encode(search, charset)).userAgent(userAgent).get().select("li.g>h3>a");
for (Element link : links) {
String title = link.text();
String url = link.absUrl("href"); // Google returns URLs in format "http://www.google.com/url?q=<url>&sa=U&ei=<someKey>".
url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");
//System.out.println(url);
if (!url.startsWith("http")) {
continue; // Ads/news/etc.
}
System.out.println("Title: " + title);
System.out.println("URL: " + url);
System.out.println();
}
您可以使用特定域(如en.wikipedia.org)过滤Google搜索,方法如下: 网址:en.wikipedia.org 试试这个,而不是像_lq=en.wikipedia.org那样。此外,在站点筛选器之前,您可能不需要最后一个或运算符
String only="+filetype%3Ahtml+OR+filetype%3Ahtm+OR+filetype%3Axhtm+OR+filetype%3Axhtml+OR+as_lq=en.wikipedia.org"