提取地图<；K、多集<；V>&燃气轮机；来自Java8中的流_Java_Java 8_Java Stream

提取地图<；K、多集<；V>&燃气轮机；来自Java8中的流

java java-8

提取地图<；K、多集<；V>&燃气轮机；来自Java8中的流,java,java-8,java-stream,Java,Java 8,Java Stream,我有文字流（此格式不是我设置的，不能更改）。前在Java8中，实现这一点的好方法是什么我试着使用平面图，但是内部流极大地限制了我的选择 Map=docs.flatMap( Map<String, List<Long>> map = docs.flatMap( inner -> inner.collect( Collectors.groupingBy(Function.identity(), Col

我有文字流（此格式不是我设置的，不能更改）。前

在Java8中，实现这一点的好方法是什么

我试着使用平面图，但是内部流极大地限制了我的选择

Map=docs.flatMap(
 Map<String, List<Long>> map = docs.flatMap(
            inner -> inner.collect(
                    Collectors.groupingBy(Function.identity(), Collectors.counting()))
                    .entrySet()
                    .stream())
            .collect(Collectors.groupingBy(
                    Entry::getKey,
                    Collectors.mapping(Entry::getValue, Collectors.toList())));

System.out.println(map);

// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}

内部->内部.collect(
Collectors.groupingBy（Function.identity（），Collectors.counting（））
.entrySet（）
.stream（））
.collect（收集器.groupingBy(
条目：：getKey，
Collectors.mapping（Entry:：getValue，Collectors.toList（））；
系统输出打印项次（map）；
//{upto=[1]，how=[1,2]，doing=[3]，what=[2,1]，are=[1,1]，you=[1,1]}

因为您使用的是番石榴，所以您可以利用它的实用程序来处理流。与

表

结构相同。代码如下：

Table<String, Long, Long> result =
    Streams.mapWithIndex(docs, (doc, i) -> doc.map(word -> new SimpleEntry<>(word, i)))
        .flatMap(Function.identity())
        .collect(Tables.toTable(
            Entry::getKey, Entry::getValue, p -> 1L, Long::sum, HashBasedTable::create));

注：您需要番石榴21才能使用。

以下是简单的解决方案：

Map m=Stream.of（doc1、doc2、doc3）
.flatMap（d->d.toMultiset（）.stream（））.collect（collector.toMap2（））；

MultiSet

似乎是一个奇怪的选择，它不仅会禁止在不同文档中出现相同的频率，而且顺序未定义，因此您不知道哪个计数属于哪个文档。映射的值可以是MultiSet列表，也可以是Map@JornVernee .. 我可以接受失去订单。这就是为什么选择它而不是列表。好的，我们可以在多集的不同文档中有相同的出现频率，对吗？对于{1,1,1}的多集是完全有效的，它使用

.count（Object）

来获取元素出现的次数，这对于单词counts本身似乎是不必要的。如果您只需要允许重复的任意顺序的值，列表仍然是最好的解决方案。不过，格式可能需要一些改进。我对它进行了一些重新构造。如果您不喜欢，可以随意回滚。您可以使用新的SimpleEntry（word，i）替换

Pair

。还要注意的是，OP并不关心哪些文档索引项来自：他不想要

how={0=1，2=2}

，而是

how={1,2}

@Eugene感谢您的反馈。我用

SimpleEntry

替换了

Pair

。关于文档索引，您可以在表上使用

rowMap（）

，然后使用

Maps.transformValues

将表惰性地更改为OP想要的内容。

 Map<String, List<Long>> map = docs.flatMap(
            inner -> inner.collect(
                    Collectors.groupingBy(Function.identity(), Collectors.counting()))
                    .entrySet()
                    .stream())
            .collect(Collectors.groupingBy(
                    Entry::getKey,
                    Collectors.mapping(Entry::getValue, Collectors.toList())));

System.out.println(map);

// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}

Map<String, Multiset<Integer>> result = docs
        .map(s -> s.collect(Collectors.toCollection(HashMultiset::create)))
        .flatMap(m -> m.entrySet().stream())
        .collect(Collectors.groupingBy(Multiset.Entry::getElement,
                Collectors.mapping(Multiset.Entry::getCount,
                        Collectors.toCollection(HashMultiset::create))));

// {upto=[1], how=[1, 2], doing=[3], what=[1, 2], are=[1 x 2], you=[1 x 2]}

ListMultimap<String, Integer> result = docs
        .map(s -> s.collect(Collectors.toCollection(HashMultiset::create)))
        .flatMap(m -> m.entrySet().stream())
        .collect(ArrayListMultimap::create,
                (r, e) -> r.put(e.getElement(), e.getCount()),
                Multimap::putAll);

// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}

Table<String, Long, Long> result =
    Streams.mapWithIndex(docs, (doc, i) -> doc.map(word -> new SimpleEntry<>(word, i)))
        .flatMap(Function.identity())
        .collect(Tables.toTable(
            Entry::getKey, Entry::getValue, p -> 1L, Long::sum, HashBasedTable::create));

Map<String, Multiset<Long>> map = Maps.transformValues(
    result.rowMap(),
    m -> HashMultiset.create(m.values()));

Map<String, List<Integer>> m = Stream.of(doc1, doc2, doc3)
          .flatMap(d -> d.toMultiset().stream()).collect(Collectors.toMap2());