将java流与collect(供应商、累加器、合并器)并行使用不会产生预期结果
我试图找到给定字符串中的字数。下面是它的顺序算法,运行良好将java流与collect(供应商、累加器、合并器)并行使用不会产生预期结果,java,parallel-processing,java-8,java-stream,Java,Parallel Processing,Java 8,Java Stream,我试图找到给定字符串中的字数。下面是它的顺序算法,运行良好 public int getWordcount() { boolean lastSpace = true; int result = 0; for(char c : str.toCharArray()){ if(Character.isWhitespace(c)){ lastSpace = true; }e
public int getWordcount() {
boolean lastSpace = true;
int result = 0;
for(char c : str.toCharArray()){
if(Character.isWhitespace(c)){
lastSpace = true;
}else{
if(lastSpace){
lastSpace = false;
++result;
}
}
}
return result;
}
但是,当我尝试用Stream.collect(提供者、累加器、组合器)方法“并行”这个时,我得到的字数为0。我使用一个不可变类(WordCountState)来维护单词计数的状态
代码:
public class WordCounter {
private final String str = "Java8 parallelism helps if you know how to use it properly.";
public int getWordCountInParallel() {
Stream<Character> charStream = IntStream.range(0, str.length())
.mapToObj(i -> str.charAt(i));
WordCountState finalState = charStream.parallel()
.collect(WordCountState::new,
WordCountState::accumulate,
WordCountState::combine);
return finalState.getCounter();
}
}
public class WordCountState {
private final boolean lastSpace;
private final int counter;
private static int numberOfInstances = 0;
public WordCountState(){
this.lastSpace = true;
this.counter = 0;
//numberOfInstances++;
}
public WordCountState(boolean lastSpace, int counter){
this.lastSpace = lastSpace;
this.counter = counter;
//numberOfInstances++;
}
//accumulator
public WordCountState accumulate(Character c) {
if(Character.isWhitespace(c)){
return lastSpace ? this : new WordCountState(true, counter);
}else{
return lastSpace ? new WordCountState(false, counter + 1) : this;
}
}
//combiner
public WordCountState combine(WordCountState wordCountState) {
//System.out.println("Returning new obj with count : " + (counter + wordCountState.getCounter()));
return new WordCountState(this.isLastSpace(),
(counter + wordCountState.getCounter()));
}
公共类字计数器{
private final String str=“Java8 parallelism有助于您正确使用它。”;
public int getWordCountInParallel(){
Stream charStream=IntStream.range(0,str.length())
.mapToObj(i->str.charAt(i));
WordCountState finalState=charStream.parallel()
.collect(WordCountState::新建,
WordCountState::累加,
WordCountState::combine);
返回finalState.getCounter();
}
}
公共类WordCountState{
私有最终布尔空间;
专用最终整数计数器;
私有静态int numberOfInstances=0;
public WordCountState(){
this.lastSpace=true;
这个计数器=0;
//numberOfInstances++;
}
public WordCountState(布尔lastSpace,int计数器){
this.lastSpace=lastSpace;
this.counter=计数器;
//numberOfInstances++;
}
//累加器
public WordCountState累加(字符c){
if(字符.isWhitespace(c)){
返回lastSpace?这是:newwordcountstate(true,counter);
}否则{
return lastSpace?new WordCountState(false,counter+1):这个;
}
}
//组合器
public WordCountState合并(WordCountState WordCountState){
//System.out.println(“返回计数为+(counter+wordCountState.getCounter())的新对象”);
返回新的WordCountState(this.isLastSpace(),
(counter+wordCountState.getCounter());
}
我注意到上述代码存在两个问题:
1.创建的对象数(WordCountState)大于字符串中的字符数。
2.结果始终为0。
3.根据累加器/使用者文档,累加器不应该返回void吗?即使我的累加器方法返回一个对象,编译器也不会抱怨
有没有线索表明我可能偏离了轨道
更新:
使用的溶液如下-
public int getWordCountInParallel() {
Stream<Character> charStream = IntStream.range(0, str.length())
.mapToObj(i -> str.charAt(i));
WordCountState finalState = charStream.parallel()
.reduce(new WordCountState(),
WordCountState::accumulate,
WordCountState::combine);
return finalState.getCounter();
}
public int getWordCountInParallel(){
Stream charStream=IntStream.range(0,str.length())
.mapToObj(i->str.charAt(i));
WordCountState finalState=charStream.parallel()
.reduce(新单词countState(),
WordCountState::累加,
WordCountState::combine);
返回finalState.getCounter();
}
首先,使用类似于input.split(\\s+).length的东西来获取字数不是更容易吗
如果这是在streams and Collector中的一个练习,那么让我们讨论一下您的实现。您已经指出了最大的错误:累加器和合并器不应该返回新实例。collect
的签名告诉您它需要BiConsumer
,它不会返回任何内容。因为您创建了一个新实例在累加器中的w对象,您永远不会增加收集器实际使用的WordCountState
对象的计数。通过在组合器中创建新对象,您将放弃您本应取得的任何进展。这也是为什么您在输入中创建的对象多于字符:每个字符一个,然后返回v价值观
请参阅此调整后的实现:
public static class WordCountState
{
private boolean lastSpace = true;
private int counter = 0;
public void accumulate(Character character)
{
if (!Character.isWhitespace(character))
{
if (lastSpace)
{
counter++;
}
lastSpace = false;
}
else
{
lastSpace = true;
}
}
public void combine(WordCountState wordCountState)
{
counter += wordCountState.counter;
}
}
在这里,我们并不是在每一步都创建新对象,而是更改现有对象的状态。我认为您尝试创建新对象是因为您的Elvis运算符强制您返回某些内容和/或您无法更改实例字段,因为它们是最终的。但它们不需要是最终的,您可以轻松更改它们
按顺序运行这个经过调整的实现现在可以很好地工作,因为我们可以一个接一个地查看字符,最后得到11个单词
但同时,它也失败了。它似乎为每个字符创建了一个新的WordCountState
,但并没有计算所有字符,结果是29个(至少对我来说)。这显示了您的算法的一个基本缺陷:拆分每个字符不能并行工作。想象一下输入abc
,结果应该是2。如果并行执行,并且没有指定如何拆分输入,则可能会得到以下块:ab,ca,bc
,这将加起来是4
问题是,通过字符之间的并行化(即在单词的中间),使你的单独的<代码> WordPoCtStase S相互依赖(因为它们需要知道哪一个出现在它们之前,以及它是否以一个空格字符结尾)。这会导致并行性并导致错误。
除此之外,实现收集器
接口可能更容易,而不是提供以下三种方法:
public static class WordCountCollector
implements Collector<Character, SimpleEntry<AtomicInteger, Boolean>, Integer>
{
@Override
public Supplier<SimpleEntry<AtomicInteger, Boolean>> supplier()
{
return () -> new SimpleEntry<>(new AtomicInteger(0), true);
}
@Override
public BiConsumer<SimpleEntry<AtomicInteger, Boolean>, Character> accumulator()
{
return (count, character) -> {
if (!Character.isWhitespace(character))
{
if (count.getValue())
{
String before = count.getKey().get() + " -> ";
count.getKey().incrementAndGet();
System.out.println(before + count.getKey().get());
}
count.setValue(false);
}
else
{
count.setValue(true);
}
};
}
@Override
public BinaryOperator<SimpleEntry<AtomicInteger, Boolean>> combiner()
{
return (c1, c2) -> new SimpleEntry<>(new AtomicInteger(c1.getKey().get() + c2.getKey().get()), false);
}
@Override
public Function<SimpleEntry<AtomicInteger, Boolean>, Integer> finisher()
{
return count -> count.getKey().get();
}
@Override
public Set<java.util.stream.Collector.Characteristics> characteristics()
{
return new HashSet<>(Arrays.asList(Characteristics.CONCURRENT, Characteristics.UNORDERED));
}
}
此收集器的并行性比最初的实现更好,但由于您的方法中提到的缺点,结果仍然不同(主要在14到16之间)。当您实现stream()时。收集器(供应商、收集器、合并器)
返回void
(合并器和合并器)。问题是:
collect(WordCountState::new,
WordCountState::accumulate,
WordCountState::combine)
在您的情况下,实际上意味着(仅累加器,但组合器也是如此):
这不是一件小事
collect(WordCountState::new,
WordCountState::accumulate,
WordCountState::combine)
(wordCounter, character) -> {
WordCountState state = wc.accumulate(c);
return;
}
public void accumulate(Character c) {
if (!Character.isWhitespace(c)) {
counter++;
}
}
public WordCountState accumulate2(Character c) {
if (Character.isWhitespace(c)) {
return lastSpace ? this : new WordCountState(true, counter);
} else {
return lastSpace ? new WordCountState(false, counter + 1) : this;
}
}
BiConsumer<WordCountState, Character> cons = WordCountState::accumulate;
BiConsumer<WordCountState, Character> cons2 = WordCountState::accumulate2;
BiConsumer<WordCountState, Character> clazz = new BiConsumer<WordCountState, Character>() {
@Override
public void accept(WordCountState state, Character character) {
WordCountState newState = state.accumulate2(character);
return;
}
};
public void combine(WordCountState wordCountState) {
counter = counter + wordCountState.getCounter();
}
public void accumulate(Character c) {
if (!Character.isWhitespace(c)) {
counter++;
}
}
Stream<Character> charStream = IntStream.range(0, str.length())
.mapToObj(i -> str.charAt(i));
WordCountState finalState = charStream.parallel()
.map(ch -> new WordCountState().accumulate(ch))
.reduce(new WordCountState(), WordCountState::combine);
public int getWordCountInParallel() {
return str.codePoints().parallel()
.mapToObj(WordCountState::new)
.reduce(WordCountState::new)
.map(WordCountState::getResult).orElse(0);
}
public class WordCountState {
private final boolean firstSpace, lastSpace;
private final int counter;
public WordCountState(int character){
firstSpace = lastSpace = Character.isWhitespace(character);
this.counter = 0;
}
public WordCountState(WordCountState a, WordCountState b) {
this.firstSpace = a.firstSpace;
this.lastSpace = b.lastSpace;
this.counter = a.counter + b.counter + (a.lastSpace && !b.firstSpace? 1: 0);
}
public int getResult() {
return counter+(firstSpace? 0: 1);
}
}
public int getWordCountInParallel() {
return str.codePoints().parallel()
.collect(WordCountState::new, WordCountState::accumulate, WordCountState::combine)
.getResult();
}
public class WordCountState {
private boolean firstSpace, lastSpace=true, initial=true;
private int counter;
public void accumulate(int character) {
boolean white=Character.isWhitespace(character);
if(lastSpace && !white) counter++;
lastSpace=white;
if(initial) {
firstSpace=white;
initial=false;
}
}
public void combine(WordCountState b) {
if(initial) {
this.initial=b.initial;
this.counter=b.counter;
this.firstSpace=b.firstSpace;
this.lastSpace=b.lastSpace;
}
else if(!b.initial) {
this.counter += b.counter;
if(!lastSpace && !b.firstSpace) counter--;
this.lastSpace = b.lastSpace;
}
}
public int getResult() {
return counter;
}
}