Java 8中使用数组流计算单词出现次数
如何使用数组流计算字符串中的词频?我正在使用Java8 这是我的密码:Java 8中使用数组流计算单词出现次数,java,arrays,java-8,java-stream,Java,Arrays,Java 8,Java Stream,如何使用数组流计算字符串中的词频?我正在使用Java8 这是我的密码: String sentence = "The cat has black fur and black eyes"; String[] bites = sentence.trim().split("\\s+"); String in = "black cat"; 计算句子中单词“黑”和“猫”的频率。单词“黑色”的频率为2,单词“猫”的频率为1 因此目标输出为3。Map count=Arrays.stream(比特) Map
String sentence = "The cat has black fur and black eyes";
String[] bites = sentence.trim().split("\\s+");
String in = "black cat";
计算句子中单词“黑”和“猫”的频率。单词“黑色”的频率为2,单词“猫”的频率为1
因此目标输出为3。Map count=Arrays.stream(比特)
Map<String, Long> count = Arrays.stream(bites)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
.collect(Collectors.groupingBy(Function.identity()、Collectors.counting());
怎么样
Map<String, Long> counts = yourStringStream
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
Map counts=yourStringStream
.collect(Collectors.groupingBy(Function.identity()、Collectors.counting());
这将为您提供从所有单词到其频率计数的映射。String-sense=“猫有黑色的皮毛和黑色的眼睛”;
String sentence = "The cat has black fur and black eyes";
String[] bites = sentence.trim().split("\\s+");
Map<String, Long> counts = Arrays.stream(bites)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
String[]bites=句子.trim().split(\\s+);
映射计数=数组。流(位)
.collect(Collectors.groupingBy(Function.identity()、Collectors.counting());
如果我能理解您的问题,您可以使用此解决方案获得预期结果:
String sentence = "The cat has black fur and black eyes";
String in = "black cat";
List<String> bites = Arrays.asList(sentence.trim().split("\\s+"));
List<String> listIn = Arrays.asList(in.split("\\s"));
long count = bites.stream().filter(listIn::contains).count();
另一种简单的方法是使用Java8中引入的方法
HashMap<String,LongAdder> wordCount= new LinkedHashMap<>();
for (String word:sentence.split("\\s")){
wordCount.computeIfAbsent(word, (k) -> new LongAdder()).increment();
}
HashMap wordCount=newlinkedhashmap();
for(字符串字:句子.split(\\s))){
computeIfAbsent(word,(k)->new LongAdder()).increment();
}
输出
{The=1,cat=1,has=1,black=2,fur=1,and=1,eyes=1}
尽管有很多例子展示了如何使用流来实现这一点非常好。您仍然不应该忘记,Collections
已经有一种方法可以为您实现这一点:
List<String> list = Array.asList(bites);
System.out.println(Collections.frequency(list, "black")); // prints 2
System.out.println(Collections.frequency(list, "cat")); // prints 1
List List=Array.asList(比特);
System.out.println(Collections.frequency(列表,“黑色”);//印刷品2
System.out.println(Collections.frequency(列表,“cat”);//印刷品1
final Collection ins=Arrays.asList(in.split(\\s+));
Arrays.stream(比特数)
.filter(ins::contains)
.mapToLong(咬合=>1L)
.sum()
@YCF\L谢谢你的提示。编辑了我的答案,从String.contains
更改为List.contains
。对每个单词重复数组。asList(in.split(\\s”)
操作是浪费资源的。这不是正确的答案,因为它不计算单独的计数…模式。编译(\\b(black | cat)\\b”)。splitAsStream(句子)。count()-1
@Holger:太好了!但必须用虚拟词填充句子,因为当最后一个词是cat
时,它不起作用。收集部分可以完全省略。您只需进行筛选,然后使用count()
即可获得完全相同的结果。完全忽略它,最终聚合结果不需要任何分组(顺便说一句)。您还可以将列表(Arrays.asList()
)保存一次,而不总是在流中创建新的丢弃对象
HashMap<String,LongAdder> wordCount= new LinkedHashMap<>();
for (String word:sentence.split("\\s")){
wordCount.computeIfAbsent(word, (k) -> new LongAdder()).increment();
}
List<String> list = Array.asList(bites);
System.out.println(Collections.frequency(list, "black")); // prints 2
System.out.println(Collections.frequency(list, "cat")); // prints 1
final Collection<String> ins = Arrays.asList(in.split("\\s+"));
Arrays.stream(bites)
.filter(ins::contains)
.mapToLong(bite => 1L)
.sum()