Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/320.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java jsoup获取与它们相关的特定标记和值_Java_Regex_Jsoup - Fatal编程技术网

Java jsoup获取与它们相关的特定标记和值

Java jsoup获取与它们相关的特定标记和值,java,regex,jsoup,Java,Regex,Jsoup,我是jsoup的新手,希望更熟悉如何从网站中提取信息。我正在尝试做一些简单的事情:从易趣上获取一些价值 我想获得项目名称,html链接,价格和销售金额从“热本周”(如这里:) 但是,我不确定如何继续 package application; import java.io.BufferedReader; import java.io.InputStreamReader; import java.net.URL; import javax.swing.JOptionPane; import o

我是jsoup的新手,希望更熟悉如何从网站中提取信息。我正在尝试做一些简单的事情:从易趣上获取一些价值

我想获得项目名称,html链接,价格和销售金额从“热本周”(如这里:)

但是,我不确定如何继续

package application;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

import javax.swing.JOptionPane;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class GetHotSellers {

    public static void main(String[] args) {
        Document doc =  Jsoup.parse(readURL("http://www.ebay.co.uk/sch/Action-Figures/246/bn_1632128/i.html"));

        Elements sold_items = doc.getElementsMatchingText("sold$");   
        for(Element sold : sold_items) {
                System.out.println(sold.text());
        }
    }


     public static String readURL(String url) {

     String fileContents = "";
     String currentLine = "";

     try {
         BufferedReader reader = new BufferedReader(new InputStreamReader(new URL(url).openStream()));
         fileContents = reader.readLine();
         while (currentLine != null) {
             currentLine = reader.readLine();
             fileContents += "\n" + currentLine;
         }
         reader.close();
         reader = null;
     } catch (Exception e) {
         JOptionPane.showMessageDialog(null, e.getMessage(), "Error Message", JOptionPane.OK_OPTION);
         e.printStackTrace();

     }

     return fileContents;
    }

}
这就是我所能做到的。我需要改进我的正则表达式还是需要使用更适合我要求的其他函数

我当前的输出如下所示:

2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold 12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold 12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold
381 sold
381 sold
Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold
Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold
Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold
187 sold
187 sold
Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold
Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold
Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold
174 sold
174 sold
Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold
Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold
Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold
129 sold
129 sold
Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold
Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold
Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold
101 sold
101 sold
Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold
Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold
Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold
89 sold
89 sold
12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold
12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold
12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold
88 sold
88 sold
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
87 sold
87 sold
还有我想要的输出示例:

Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay || £7.99 || 87 sold || http://link.com
编辑:

我试过这样的方法,但是运气不好

for(String categoryURL : categoryLinksArray) {
    Document doc = Jsoup.parse(readURL(categoryURL));
    Elements sold_items = doc.getElementsByClass("b-block-info-container");
    for(Element sold : sold_items) {
            System.out.println("NAME: " + sold.attr("b-block-info-container__title b-block-info-container__title__ListingSummary") + "\n" + 
                               "PRICE: " + sold.attr("b-block-info-container__price") + "\n" +
                               "SOLD/week: " + sold.attr("item_quantity__hotness") + "\n" +
                               "URL: " + sold.attr("abs:href"));
            System.out.println("--------------------------------------");
    }
}

我做了,但效率不高,因为速度很慢

public static void main(String[] args) {

    ArrayList<String> categoryLinksArray = new ArrayList<>();

    Document links = Jsoup.parse(readURL("http://www.ebay.co.uk/sch/allcategories/all-categories"));
    Elements item_categories = links.getElementsByClass("ch");
    for (Element category : item_categories) {
        categoryLinksArray.add(category.attr("abs:href"));
    }

    for (String categoryURL : categoryLinksArray) {
        Document doc = Jsoup.parse(readURL(categoryURL));
        Elements hot_items = doc
                .getElementsByClass("b-module b-module-carousel b-module-deals topSold b-display--portrait");
        for (Element item : hot_items) {

            Elements hot_items_names = item.getElementsByClass(
                    "b-block-info-container__title b-block-info-container__title__ListingSummary");
            Elements hot_items_price = item.getElementsByClass("b-block-info-container__price");
            Elements hot_items_sold = item.getElementsByClass("item_quantity__hotness");
            Elements hot_items_url = item.getElementsByClass("b-block-tile");

            HashMap<String, String> hs_items = new HashMap<>();

            for (Element item_name : hot_items_names) {
                hs_items.put("Name", item_name.text());
            }
            for (Element item_price : hot_items_price) {
                hs_items.put("Price", item_price.text());
            }
            for (Element item_sold : hot_items_sold) {
                hs_items.put("Sold", item_sold.text());
            }
            for (Element item_url : hot_items_url) {
                hs_items.put("URL", item_url.attr("abs:href"));
            }

            System.out.println("Name: " + hs_items.get("Name") + "\n" +
                               "Price: " + hs_items.get("Price") + "\n" +
                               "Sold: " + hs_items.get("Sold") + "\n" +
                               "URL: " + hs_items.get("URL") + "\n" +
                               "----------------------------------");
        }
    }
}
publicstaticvoidmain(字符串[]args){
ArrayList categoryLinksArray=新的ArrayList();
documentlinks=Jsoup.parse(readURL(“http://www.ebay.co.uk/sch/allcategories/all-categories"));
元素项_类别=links.getElementsByClass(“ch”);
用于(元素类别:项目类别){
categoryLinksArray.add(category.attr(“abs:href”);
}
for(字符串categoryURL:categoryLinksArray){
documentdoc=Jsoup.parse(readURL(categoryURL));
元素热项目=单据
.getElementsByClass(“b-module b-module-carousel b-module-topSold b-display——纵向”);
用于(元素项:热项目){
元素热\u项\u名称=item.getElementsByClass(
“b-block-info-container_uuutitle b-block-info-container_uuutitle_uuulistingsummary”);
Elements hot_items_price=item.getElementsByClass(“b-block-info-container_price”);
Elements hot\u items\u Seld=item.getElementsByClass(“item\u quantity\u hotness”);
Elements hot_items_url=item.getElementsByClass(“b-block-tile”);
HashMap hs_items=新HashMap();
用于(元素项名称:热项名称){
hs_items.put(“Name”,item_Name.text());
}
用于(要素项目价格:热门项目价格){
hs_items.put(“Price”,item_Price.text());
}
对于(元素项目\已售出:热门项目\已售出){
hs_items.put(“已售出”,item_已售出.text());
}
用于(元素项\u url:热\u项\u url){
hs_items.put(“URL”,item_URL.attr(“abs:href”);
}
System.out.println(“名称:”+hs\u items.get(“名称”)+“\n”+
价格:“+hs\u项目。获取(“价格”)+”\n+
已售出:“+hs\u项目。获取(“已售出”)+”\n+
URL:“+hs\u项。获取(“URL”)+”\n+
"----------------------------------");
}
}
}

该页面分为多个部分。这些节标记的每个Id都以Id=“w2”、Id=“w3”开头。。。直到id=“w10”。您可以使用它浏览每个部分并选择您感兴趣的数据。例如:

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JsoupTest {
    public static void main(String argv[]) throws IOException {
        Document doc = Jsoup.connect("http://www.ebay.co.uk/sch/Action-Figures/246/bn_1632128/i.html").get();
        for(int i = 2; i<11;i++){
            Element category = doc.getElementById("w"+i); // select section with id = w2 , w3, w4 ...
            if(!category.select("div.b-module-carousel__title").isEmpty()){
                System.out.println(category.select("div.b-module-carousel__title").text()); // the title of the section is either here
            }
            else{
                System.out.println(category.select("div.b-block-list__header").text());  // or here
            }
            Elements items = category.select("li");            
            for(Element e : items){
                System.out.println(  e.select("div.b-block-info-container__title").text() 
                        // to get prices or trending-prices
                        // (some boolean expression which can be true or false)?return this if true:return this part if false
                         + " || " +  ((!e.select("div.b-block-info-container__price").isEmpty())?e.select("div.b-block-info-container__price").text():(e.select("div.b-block-info-container__trending-prices-group").text()))
                         + " || " +  e.select("div.item_quantity__hotness").text()
                         + " || " +  e.select("a").attr("href"));
            }
            System.out.println("************************************************************************************"); // just added to separate the categories
        }            
    } 
}
import java.io.IOException;
导入org.jsoup.jsoup;
导入org.jsoup.nodes.Document;
导入org.jsoup.nodes.Element;
导入org.jsoup.select.Elements;
公共类测试{
公共静态void main(字符串argv[])引发IOException{
Document doc=Jsoup.connect(“http://www.ebay.co.uk/sch/Action-Figures/246/bn_1632128/i.html).get();

对于(int i=2;我尝试对所有类别执行此操作,但在Jsoup.connect行获取空指针。您认为这是因为
“w6-2-x-carousel-items”
对于玩具类别是唯一的吗?是的,id是唯一的。因此,这对页面的其余部分不起作用。但是如果您检查页面的html代码,您将看到某种结构。请参阅我的第二个答案,并在必要时对其进行修改。
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JsoupTest {
    public static void main(String argv[]) throws IOException {
        Document doc = Jsoup.connect("http://www.ebay.co.uk/sch/Action-Figures/246/bn_1632128/i.html").get();
        for(int i = 2; i<11;i++){
            Element category = doc.getElementById("w"+i); // select section with id = w2 , w3, w4 ...
            if(!category.select("div.b-module-carousel__title").isEmpty()){
                System.out.println(category.select("div.b-module-carousel__title").text()); // the title of the section is either here
            }
            else{
                System.out.println(category.select("div.b-block-list__header").text());  // or here
            }
            Elements items = category.select("li");            
            for(Element e : items){
                System.out.println(  e.select("div.b-block-info-container__title").text() 
                        // to get prices or trending-prices
                        // (some boolean expression which can be true or false)?return this if true:return this part if false
                         + " || " +  ((!e.select("div.b-block-info-container__price").isEmpty())?e.select("div.b-block-info-container__price").text():(e.select("div.b-block-info-container__trending-prices-group").text()))
                         + " || " +  e.select("div.item_quantity__hotness").text()
                         + " || " +  e.select("a").attr("href"));
            }
            System.out.println("************************************************************************************"); // just added to separate the categories
        }            
    } 
}