从字符串数组中删除Java中停止字的最省时的方法

从字符串数组中删除Java中停止字的最省时的方法,java,performance,list,stop-words,processing-efficiency,Java,Performance,List,Stop Words,Processing Efficiency,如何以最有效的方式删除这些停止词。下面的方法不会删除stopwords。我错过了什么 还有别的办法吗 我想用Java以最具时间效率的方式完成这项工作 public static HashSet<String> hs = new HashSet<String>(); public static String[] stopwords = {"a", "able", "about", "across", "after", "all", "almost", "

如何以最有效的方式删除这些停止词。下面的方法不会删除stopwords。我错过了什么

还有别的办法吗

我想用Java以最具时间效率的方式完成这项工作

public static HashSet<String> hs = new HashSet<String>();


public static String[] stopwords = {"a", "able", "about",
        "across", "after", "all", "almost", "also", "am", "among", "an",
        "and", "any", "are", "as", "at", "b", "be", "because", "been",
        "but", "by", "c", "can", "cannot", "could", "d", "dear", "did",
        "do", "does", "e", "either", "else", "ever", "every", "f", "for",
        "from", "g", "get", "got", "h", "had", "has", "have", "he", "her",
        "hers", "him", "his", "how", "however", "i", "if", "in", "into",
        "is", "it", "its", "j", "just", "k", "l", "least", "let", "like",
        "likely", "m", "may", "me", "might", "most", "must", "my",
        "neither", "n", "no", "nor", "not", "o", "of", "off", "often",
        "on", "only", "or", "other", "our", "own", "p", "q", "r", "rather",
        "s", "said", "say", "says", "she", "should", "since", "so", "some",
        "t", "than", "that", "the", "their", "them", "then", "there",
        "these", "they", "this", "tis", "to", "too", "twas", "u", "us",
        "v", "w", "wants", "was", "we", "were", "what", "when", "where",
        "which", "while", "who", "whom", "why", "will", "with", "would",
        "x", "y", "yet", "you", "your", "z"};
public StopWords()
{
    int len= stopwords.length;
    for(int i=0;i<len;i++)
    {
        hs.add(stopwords[i]);
    }
    System.out.println(hs);
}

public List<String> removedText(List<String> S)
{
    Iterator<String> text = S.iterator();

    while(text.hasNext())
    {
        String token = text.next();
        if(hs.contains(token))
        {

                S.remove(text.next());
        }
        text = S.iterator();
    }
    return S;
}
publicstatichashset hs=newhashset();
公共静态字符串[]stopwords={“a”,“able”,“about”,
“跨越”、“之后”、“全部”、“几乎”、“也”、“上午”、“中间”、“安”,
“和”、“任何”、“是”、“作为”、“在”、“b”、“是”、“因为”、“曾经”,
“但是”、“通过”、“c”、“可以”、“不能”、“可以”、“d”、“亲爱的”、“做过”,
“做”、“做”、“e”、“或者”、“其他”、“曾经”、“每一次”、“f”、“为了”,
“from”、“g”、“get”、“get”、“h”、“had”、“has”、“have”、“he”、“her”,
“她的”、“他”、“他的”、“如何”、“然而”、“我”、“如果”、“在”、“进入”,
“是”、“它”、“它”、“j”、“公正”、“k”、“l”、“最少”、“让”、“喜欢”,
“可能”、“m”、“可能”、“我”、“可能”、“大多数”、“必须”、“我的”,
“既不”、“不”、“也不”、“不”、“o”、“of”、“off”、“经常”,
“on”、“only”、“or”、“other”、“our”、“own”、“p”、“q”、“r”、“reat”,
“s”、“说”、“说”、“说”、“她”、“应该”、“既然”、“所以”、“一些”,
“t”,“than”,“that”,“the”,“the”,“thes”,“then”,“there”,
“这些”、“他们”、“这”、“这”、“这”、“这”、“对”、“太”、“twas”、“u”、“我们”,
“v”、“w”、“想要”、“曾经”、“我们”、“曾经”、“什么”、“何时”、“何地”,
“which”、“while”、“who”、“who”、“why”、“will”、“with”、“will”,
“x”,“y”,“还”,“你”,“你的”,“z”};
公共停止语
{
int len=stopwords.length;

对于(int i=0;i尝试以下建议的更改:

public static List<String> removedText(List<String> S)
{
    Iterator<String> text = S.iterator();

    while(text.hasNext())
    {
        String token = text.next();
        if(hs.contains(token))
        {

                S.remove(token); ////Changed text.next() --> token
        }
       // text = S.iterator(); why the need to re-assign?
    }
    return S;
}
publicstaticlist-removedText(列表S)
{
迭代器文本=S.Iterator();
while(text.hasNext())
{
字符串标记=text.next();
如果(hs.包含(令牌))
{
S.remove(token);///已更改的文本。下一步()-->token
}
//text=S.iterator();为什么需要重新分配?
}
返回S;
}

在对列表进行迭代时,不应操作该列表。此外,在计算
hasNext()
的同一循环下调用
next()
两次。相反,应使用迭代器删除该项:

public static List<String> removedText(List<String> s) {
    Iterator<String> text = s.iterator();

    while (text.hasNext()) {
        String token = text.next();
        if (hs.contains(token)) {
            text.remove();
        }
    }
    return s;
}

也许您可以在循环内使用org/apache/commons/lang/ArrayUtils

stopwords = ArrayUtils.removeElement(stopwords, element)

看起来不错。列表会有多大?如果它特别大,解决方案可能是不将单词加载到列表中开始,并在输入/输出流级别上进行处理。但我只会在当前实现出现性能或内存问题时才这样做。而不是从中删除字符串在列表中(导致内部复制失败),您可以在停止字所在的位置设置空值。然后,当您输出列表时,忽略空值,或复制列表末尾的空值,并在该点排除空值。它不会从列表中删除停止字。已尝试。它不会从中删除令牌:(另外,我之前遇到了一个与共修改相关的错误,这主要是由于由于迭代器的状态变得不一致而修改了列表。java.util.ConcurrentModificationException
stopwords = ArrayUtils.removeElement(stopwords, element)