Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/sql-server-2005/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用java解析网页_Java_Selenium_Html Parsing_Jsoup_Htmlunit - Fatal编程技术网

用java解析网页

用java解析网页,java,selenium,html-parsing,jsoup,htmlunit,Java,Selenium,Html Parsing,Jsoup,Htmlunit,嗨,我想去一家商店。到目前为止,我已经使用了硒。它工作得很好,但速度很慢。我想找到一个解决我问题的好办法。我找到了HtmlUtil和JSoup,但我想我在链接和转到下一页时遇到了clic问题 我用HtmlTil写了一个简单的例子: WebClient web = new WebClient(); HtmlPage page = web.getPage("http://news.yahoo.com/"); web.closeAllWindows(); 但我得到了很多警告和错误: WARNING:

嗨,我想去一家商店。到目前为止,我已经使用了硒。它工作得很好,但速度很慢。我想找到一个解决我问题的好办法。我找到了HtmlUtil和JSoup,但我想我在链接和转到下一页时遇到了clic问题

我用HtmlTil写了一个简单的例子:

WebClient web = new WebClient();
HtmlPage page = web.getPage("http://news.yahoo.com/");
web.closeAllWindows();
但我得到了很多警告和错误:

WARNING: CSS warning: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3604] Ignoring the following declarations in this rule.
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3996] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3996] Ignoring the following declarations in this rule.
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
警告:CSS警告:'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/p/presentation/p/desktop-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications\u v2-min.css&kx/ucs/mailccount/css/37/mail\u preview-min.css&kx/ucs/search/css/search/css/css/search/css/css/css/css/kx/css/search/kx/css/kx/css/css/css/css/css/css/css/css/css/css/css/css/css/ia/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer/footer/footer/footer-min-188629.css&os/gm/m/footer-188629ing-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css'[20:3604]忽略此规则中的以下声明。
sty 292013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler错误
警告:CSS错误:'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/p/presentation/p/desktop-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications\u v2-min.css&kx/ucs/mailccount/css/37/mail\u preview-min.css&kx/ucs/search/css/search/css/css/search/css/css/css/css/kx/css/search/kx/css/kx/css/css/css/css/css/css/css/css/css/css/css/css/css/ia/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer/footer/footer/footer-min-188629.css&os/gm/m/footer-188629ing-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css'[20:3996]样式规则中存在错误。(无效标记“*”。应为以下之一:,“}”,“;”)
2013年12月29日上午11:54:03 com.gargoylesoftware.htmlunit.DefaultCssErrorHandler警告
警告:CSS警告:'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/p/presentation/p/desktop-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications\u v2-min.css&kx/ucs/mailccount/css/37/mail\u preview-min.css&kx/ucs/search/css/search/css/css/search/css/css/css/css/kx/css/search/kx/css/kx/css/css/css/css/css/css/css/css/css/css/css/css/css/ia/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer/footer/footer/footer-min-188629.css&os/gm/m/footer-188629ing-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css'[20:3996]忽略此规则中的以下声明。
sty 292013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler错误
我找不到让我点击链接(XPath)的方法 JSoup对于解析web很好,但是在页面之间动态切换并不好


我需要你的帮助:)我不知道除了selenium之外,我还能用其他解析器得到同样的结果吗

例如:


另请参见:

是的,我知道,但我无法转到下一页,只需单击我看到的页面上的按钮或链接。是否可以将按钮参数添加到查询字符串中(请参见:)?你能给我举一个你需要的页面数据的例子吗?
Document doc = Jsoup.connect("http://first.com/").get(); // Connect to 'root' link
Elements links = doc.select("a[href]"); // Select all Links from the website

// As an example connect to the first link of the website and parse it's html:
doc = Jsoup.connect(links.first().absUrl("href")).get();

// Continue with the new website