Java多线程解析器_Java_Multithreading_Parsing

Java多线程解析器

java multithreading parsing

Java多线程解析器,java,multithreading,parsing,Java,Multithreading,Parsing,我正在编写一个多线程解析器。解析器类如下所示 public class Parser extends HTMLEditorKit.ParserCallback implements Runnable { private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>()); private boolean h2Tag = false;

我正在编写一个多线程解析器。解析器类如下所示

public class Parser extends HTMLEditorKit.ParserCallback implements Runnable {

    private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>());
    private boolean h2Tag = false;
    private int count;
    private static int threadCount = 0;

    public static List<Item> parse() {
        for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse

            while (threadCount == 20) { //limit the number of simultaneous threads
                try {
                    Thread.sleep(50);
                } catch (InterruptedException ex) {
                    ex.printStackTrace();
                }
            }

            Thread thread = new Thread(new Parser());
            thread.setName(Integer.toString(i));
            threadCount++; //increase the number of working threads
            thread.start();            
        }

        return itemList;
    }

    public void run() {
        //Here is a piece of code responsible for creating links based on
        //the thread name and passed as a parameter remained i,
        //connection, start parsing, etc.        
        //In general, nothing special. Therefore, I won't paste it here.

        threadCount--; //reduce the number of running threads when current stops
    }

    private static void addItem(Item item) {
        itenList.add(item);
    }

    //This method retrieves the necessary information after the H2 tag is detected
    @Override
    public void handleText(char[] data, int pos) {
        if (h2Tag) {
            String itemName = new String(data).trim();

        //Item - the item on which we receive information from a Web page
        Item item = new Item();
        item.setName(itemName);
        item.setId(count);
        addItem(item);

        //Display information about an item in the console
        System.out.println(count + " = " + itemName); 
        }
    }

    @Override
    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        if (HTML.Tag.H2 == t) {
            h2Tag = true;
        }
    }

    @Override
    public void handleEndTag(HTML.Tag t, int pos) {
        if (HTML.Tag.H2 == t) {
            h2Tag = false;
        }
    }
}

public类解析器扩展了HTMLEditorKit.ParserCallback实现了Runnable{
私有静态列表itemList=Collections.synchronizedList（新的ArrayList（））；
私有布尔标记=false；
私人整数计数；
私有静态int threadCount=0；
公共静态列表解析（）{
对于（int i=1；i在parse（）
调用返回后，您的1000个线程都已启动，但不能保证它们已完成。事实上，这不是您看到的问题。我强烈建议您不要自己编写，而是使用SDK为此类工作提供的工具
例如，文档和方法是一个很好的起点。同样，如果您不确定自己是否也有，请不要自己实现，因为编写这样的多线程代码纯粹是一种痛苦
您的代码应该如下所示：
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) { 
   futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
   f.get();
}

ExecutorService executor=Executors.newFixedThreadPool（20）；
列表>（1000）；
对于（int i=0；i<1000；i++）{
futures.add（executor.submit（newrunnable（）{…}））；
}
for（未来f：未来）{
f、 get（）；
}
代码没有问题，它按照您的编码工作。问题在于最后一次迭代。rest所有迭代都会正常工作，但在980到1000的最后一次迭代中，会创建线程，但主进程不会等待另一个线程完成，然后返回列表。因此如果您一次使用20个线程，您将获得980到1000之间的奇数
现在，您可以尝试添加线程。在返回列表之前，等待（50）
，在这种情况下，您的主线程将等待一段时间，到那时，其他线程可能会完成处理
或者您可以使用java.syncronizationAPI，而不是Thread.wait（），使用，这将帮助您等待线程完成处理，然后您可以创建新线程。
我不知道这是否是问题所在，但当前threadCount
未以线程安全的方式更新。增量和/或减量可能消失。您有多少内核？如果您担心t speed，最快的方法可能是使用与内核相同数量的线程（假设进程是CPU限制的）。使用具有固定线程池大小和可调用的ExecutorService（用于收集任务结果）可能会更好。使用任意等待时间的问题在于“到时候可能会"其他线程已完成。这一点无法保证。如果您需要确保所有线程都已完成，您必须确保等待每一个线程。@Stephan：请检查我的更新，这也是使用倒计时闩锁的一种方法，进程使用该方法等待其他线程完成处理。@M.J。非常感谢。你说得对。在我添加了线程之后。等待（1000）
，在返回列表之前，程序工作得很好。一般的经验法则是：任何时候你需要一个线程睡眠一段时间以使代码工作（无限期的通知等待是不同的），你有一个bug。上面的整个代码应该按照Stephan的建议用线程池重写。@nachtgeschrei还有一种方法你也可以在这里尝试代码[link][1][1]：谢谢。我知道我的代码很糟糕（我将阅读您提供的文档，并使此代码更加正确。使用Runtime.getRuntime（）.availableProcessors（）可能是池大小的更好选择。
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) { 
   futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
   f.get();
}