Java 对csv文件中的类似字符串进行分组，并查找出现的次数_Java_Csv_Guava

Java 对csv文件中的类似字符串进行分组，并查找出现的次数

java csv

Java 对csv文件中的类似字符串进行分组，并查找出现的次数,java,csv,guava,Java,Csv,Guava,我需要对CSV列进行分组，以便 User ID Group ABC Group1 DEF Group2 ABC Group3 GHI Group4 XYZ Group2 UVW Group5 XYZ Group1 ABC Group1 DEF Group2 ABC Group1 ->2 A

我需要对CSV列进行分组，以便

User ID       Group
ABC           Group1   
DEF           Group2
ABC           Group3
GHI           Group4
XYZ           Group2
UVW           Group5
XYZ           Group1
ABC           Group1
DEF           Group2

ABC   Group1 ->2
ABC   Group3 ->1
DEF   Group2 ->2
GHI   Group4 ->1
UVW   Group5 ->1
XYZ   Group2 ->1
XYZ   Group1 ->1

输出应该是这样的

User ID       Group
ABC           Group1   
DEF           Group2
ABC           Group3
GHI           Group4
XYZ           Group2
UVW           Group5
XYZ           Group1
ABC           Group1
DEF           Group2

ABC   Group1 ->2
ABC   Group3 ->1
DEF   Group2 ->2
GHI   Group4 ->1
UVW   Group5 ->1
XYZ   Group2 ->1
XYZ   Group1 ->1

并且需要对数据进行分组，例如ABC-->（第1组发生两次）/（ABC发生总数）+（第3组发生一次）/（ABC发生总数）。所以ABC-->2/3+1/3

第一组结果是使用GUAVA lib获得的

Multiset<String> set = TreeMultiset.create();        
    BufferedReader reader = null;        
    try {
        reader = new BufferedReader(new FileReader("test.csv"));            
        String[] currLineSplitted;            
        while (reader.ready()) {
            currLineSplitted = reader.readLine().split(",");
            set.add(currLineSplitted[0] + "," + currLineSplitted[1]);                
        }            
        for (String key : set.elementSet()) {            
            System.out.println(key + " : " + set.count(key));
        }            
    } finally {
        if (reader != null) {
            reader.close();
        }
    }

Multiset set=TreeMultiset.create（）；
BufferedReader reader=null；
试一试{
reader=新的BufferedReader（新的文件阅读器（“test.csv”）；
字符串[]currLineSplitted；
while（reader.ready（））{
currLineSplitted=reader.readLine（）.split（“，”）；
set.add（currLineSplitted[0]+“，“+currLineSplitted[1]）；
}            
对于（字符串键：set.elementSet（））{
System.out.println（key+”：“+set.count（key））；
}            
}最后{
if（读卡器！=null）{
reader.close（）；
}
}

不确定如何通过分组获得第二个结果。

您应该使用集合映射，而不是普通集合。大概是这样的：

Map<String, Map<String,Integer>> supermap = new Hashmap();      
BufferedReader reader = null;        
try {
    reader = new BufferedReader(new FileReader("test.csv"));            
    String[] currLineSplitted;            
    while (reader.ready()) {
        currLineSplitted = reader.readLine().split(",");

        Map<String,Integer> innermap;

        if(supermap.contains(currLineSplitted[0]){
            innermap = supermap.get(currLineSplitted[0]);

            if(innermap.contains(currLineSplitted[1]){
                innermap.put(currLineSplitted[1],
                             innermap.get(currLineSplitted[1])++);
            } else {
                innermap.put(currLineSplitted[1],new Integer(1));//EDITED
            }
        } else {
            innermap=new Hashmap();
            innermap.put(currLineSplitted[1],new Integer(1));//EDITED
            supermap.put(currLineSplitted[0], innermap);
        }
    }         
    Collections.sort(supermap.keySet() , new YourOwnComparator() );//EDITED

    for (String userID : supermap.keySet()) {
        Map m = supermap.get(userID);
        //===========first result=============
        for(String group : m.keySet()){
            System.out.println(userID + group + " : " + m.get(group));
        }
        //=====================================
    } 
    for (String userID : supermap.keySet()) {
        Map m = supermap.get(userID);
        //===========second result=============
        int numberOfGroups = m.size();

        StringBuilder sb = new StringBuilder();             
        sb.append(userID+"-->");

        for(String group : m.keySet()){
            sb.append(m.get(group).toString()+"/"+numberOfGroups);
        }
        System.out.println(sb.toString());
        //=====================================
    } 

} finally {
    if (reader != null) {
        reader.close();
    }
}

Map supermap=newhashmap（）；
BufferedReader reader=null；
试一试{
reader=新的BufferedReader（新的文件阅读器（“test.csv”）；
字符串[]currLineSplitted；
while（reader.ready（））{
currLineSplitted=reader.readLine（）.split（“，”）；
地图内部地图；
if（超级地图包含（currLineSplitted[0]）{
innermap=supermap.get（currLineSplitted[0]）；
if（innermap.contains（currLineSplitted[1]）{
innermap.put（currLineSplitted[1]，
get（currLineSplitted[1]）+）；
}否则{
innermap.put（currLineSplitted[1]，新整数（1））；//已编辑
}
}否则{
innermap=新的Hashmap（）；
innermap.put（currLineSplitted[1]，新整数（1））；//已编辑
supermap.put（currLineSplitted[0]，innermap）；
}
}         
Collections.sort（supermap.keySet（），new YourOwnComparator（））；//已编辑
for（字符串userID:supermap.keySet（））{
Map m=supermap.get（userID）；
//=============第一个结果=============
对于（字符串组：m.keySet（））{
System.out.println（userID+group+：“+m.get（group））；
}
//=====================================
} 
for（字符串userID:supermap.keySet（））{
Map m=supermap.get（userID）；
//=============第二次结果=============
int numberOfGroups=m.size（）；
StringBuilder sb=新的StringBuilder（）；
sb.追加（userID+“-->”）；
对于（字符串组：m.keySet（））{
sb.append（m.get（group）.toString（）+“/”+numberOfGroups）；
}
System.out.println（sb.toString（））；
//=====================================
} 
}最后{
if（读卡器！=null）{
reader.close（）；
}
}

编辑：我的错：必须以1作为起始值创建整数。条目的排序可以相应地执行。

非常不清楚。所有数字都是什么意思？你到底想要什么？我没有第二个分组，你能解释一下语法吗？

XYZ-->1/2+1/2

是什么意思吗？你写了

2/2（ABC的发生次数）

所以我猜（但不清楚）第二个数字是发生次数，但第一个是什么？发生次数指的是什么？全局发生次数还是每组发生次数？更好地解释第二个输出将有助于给出解决方案。在ABC-->（第1组发生两次）/（ABC发生的总次数）+（（组3发生一次）/（ABC发生的总次数））。所以ABC-->2/3+1/3XYZGroup1:0 ABCGroup1:0 DEFGroup2:0 GHIGroup4:0 UVWGroup5:0 XYZ-->0/1 ABC-->0/1 DEF-->0/1 GHI-->0/1 UVW-->0/1这就是我从上述方法中得到的结果