Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/355.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 网络爬虫Amazon获取span元素_Java_Web Crawler_Jsoup - Fatal编程技术网

Java 网络爬虫Amazon获取span元素

Java 网络爬虫Amazon获取span元素,java,web-crawler,jsoup,Java,Web Crawler,Jsoup,我正在搜索亚马逊的分类,我得到了salesrank和产品URL。现在我想对类别进行爬网,并从类别范围中获取所有信息 <span class="zg_hrsr_ladder">in&nbsp;<a href="https://www.amazon.de/gp/bestsellers/books/ref=pd_zg_hrsr_b_1_1">B&uuml;cher</a> &gt; <a href="https://www.amazon

我正在搜索亚马逊的分类,我得到了salesrank和产品URL。现在我想对类别进行爬网,并从类别范围中获取所有信息

<span class="zg_hrsr_ladder">in&nbsp;<a href="https://www.amazon.de/gp/bestsellers/books/ref=pd_zg_hrsr_b_1_1">B&uuml;cher</a> &gt; <a href="https://www.amazon.de/gp/bestsellers/books/287480/ref=pd_zg_hrsr_b_1_2">Krimis & Thriller</a> &gt; <b><a href="https://www.amazon.de/gp/bestsellers/books/419954031/ref=pd_zg_hrsr_b_1_3_last">Deutschland</a></b></span>

我得到了跨度内的所有东西。但我只想要a href“Bücher”、“Krimis&Thriller”和“Deutschland”中的文本。如何获取此信息?

您希望获取
中的文本以及生成的元素

示例代码

String source = "<span class=\"zg_hrsr_ladder\">in&nbsp;<a href=\"https://www.amazon.de/gp/bestsellers/books/ref=pd_zg_hrsr_b_1_1\">B&uuml;cher</a> &gt; <a href=\"https://www.amazon.de/gp/bestsellers/books/287480/ref=pd_zg_hrsr_b_1_2\">Krimis & Thriller</a> &gt; <b><a href=\"https://www.amazon.de/gp/bestsellers/books/419954031/ref=pd_zg_hrsr_b_1_3_last\">Deutschland</a></b></span>";

Document htmlDocument = Jsoup.parse(source, "UTF-8");

Elements category = htmlDocument.select("span.zg_hrsr_ladder a");

category.forEach(aElement -> {
    System.out.println(aElement.text());
});

不要爬行,而是使用api。。。非常感谢。这对我有帮助!
String source = "<span class=\"zg_hrsr_ladder\">in&nbsp;<a href=\"https://www.amazon.de/gp/bestsellers/books/ref=pd_zg_hrsr_b_1_1\">B&uuml;cher</a> &gt; <a href=\"https://www.amazon.de/gp/bestsellers/books/287480/ref=pd_zg_hrsr_b_1_2\">Krimis & Thriller</a> &gt; <b><a href=\"https://www.amazon.de/gp/bestsellers/books/419954031/ref=pd_zg_hrsr_b_1_3_last\">Deutschland</a></b></span>";

Document htmlDocument = Jsoup.parse(source, "UTF-8");

Elements category = htmlDocument.select("span.zg_hrsr_ladder a");

category.forEach(aElement -> {
    System.out.println(aElement.text());
});
Bücher
Krimis & Thriller
Deutschland