Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/313.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 对csv文件中的类似字符串进行分组,并查找出现的次数_Java_Csv_Guava - Fatal编程技术网

Java 对csv文件中的类似字符串进行分组,并查找出现的次数

Java 对csv文件中的类似字符串进行分组,并查找出现的次数,java,csv,guava,Java,Csv,Guava,我需要对CSV列进行分组,以便 User ID Group ABC Group1 DEF Group2 ABC Group3 GHI Group4 XYZ Group2 UVW Group5 XYZ Group1 ABC Group1 DEF Group2 ABC Group1 ->2 A

我需要对CSV列进行分组,以便

User ID       Group
ABC           Group1   
DEF           Group2
ABC           Group3
GHI           Group4
XYZ           Group2
UVW           Group5
XYZ           Group1
ABC           Group1
DEF           Group2
ABC   Group1 ->2
ABC   Group3 ->1
DEF   Group2 ->2
GHI   Group4 ->1
UVW   Group5 ->1
XYZ   Group2 ->1
XYZ   Group1 ->1
输出应该是这样的

User ID       Group
ABC           Group1   
DEF           Group2
ABC           Group3
GHI           Group4
XYZ           Group2
UVW           Group5
XYZ           Group1
ABC           Group1
DEF           Group2
ABC   Group1 ->2
ABC   Group3 ->1
DEF   Group2 ->2
GHI   Group4 ->1
UVW   Group5 ->1
XYZ   Group2 ->1
XYZ   Group1 ->1
并且需要对数据进行分组,例如ABC-->(第1组发生两次)/(ABC发生总数)+(第3组发生一次)/(ABC发生总数)。所以ABC-->2/3+1/3

第一组结果是使用GUAVA lib获得的

Multiset<String> set = TreeMultiset.create();        
    BufferedReader reader = null;        
    try {
        reader = new BufferedReader(new FileReader("test.csv"));            
        String[] currLineSplitted;            
        while (reader.ready()) {
            currLineSplitted = reader.readLine().split(",");
            set.add(currLineSplitted[0] + "," + currLineSplitted[1]);                
        }            
        for (String key : set.elementSet()) {            
            System.out.println(key + " : " + set.count(key));
        }            
    } finally {
        if (reader != null) {
            reader.close();
        }
    }
Multiset set=TreeMultiset.create();
BufferedReader reader=null;
试一试{
reader=新的BufferedReader(新的文件阅读器(“test.csv”);
字符串[]currLineSplitted;
while(reader.ready()){
currLineSplitted=reader.readLine().split(“,”);
set.add(currLineSplitted[0]+“,“+currLineSplitted[1]);
}            
对于(字符串键:set.elementSet()){
System.out.println(key+”:“+set.count(key));
}            
}最后{
if(读卡器!=null){
reader.close();
}
}

不确定如何通过分组获得第二个结果。

您应该使用集合映射,而不是普通集合。大概是这样的:

Map<String, Map<String,Integer>> supermap = new Hashmap();      
BufferedReader reader = null;        
try {
    reader = new BufferedReader(new FileReader("test.csv"));            
    String[] currLineSplitted;            
    while (reader.ready()) {
        currLineSplitted = reader.readLine().split(",");

        Map<String,Integer> innermap;

        if(supermap.contains(currLineSplitted[0]){
            innermap = supermap.get(currLineSplitted[0]);

            if(innermap.contains(currLineSplitted[1]){
                innermap.put(currLineSplitted[1],
                             innermap.get(currLineSplitted[1])++);
            } else {
                innermap.put(currLineSplitted[1],new Integer(1));//EDITED
            }
        } else {
            innermap=new Hashmap();
            innermap.put(currLineSplitted[1],new Integer(1));//EDITED
            supermap.put(currLineSplitted[0], innermap);
        }
    }         
    Collections.sort(supermap.keySet() , new YourOwnComparator() );//EDITED

    for (String userID : supermap.keySet()) {
        Map m = supermap.get(userID);
        //===========first result=============
        for(String group : m.keySet()){
            System.out.println(userID + group + " : " + m.get(group));
        }
        //=====================================
    } 
    for (String userID : supermap.keySet()) {
        Map m = supermap.get(userID);
        //===========second result=============
        int numberOfGroups = m.size();

        StringBuilder sb = new StringBuilder();             
        sb.append(userID+"-->");

        for(String group : m.keySet()){
            sb.append(m.get(group).toString()+"/"+numberOfGroups);
        }
        System.out.println(sb.toString());
        //=====================================
    } 

} finally {
    if (reader != null) {
        reader.close();
    }
}
Map supermap=newhashmap();
BufferedReader reader=null;
试一试{
reader=新的BufferedReader(新的文件阅读器(“test.csv”);
字符串[]currLineSplitted;
while(reader.ready()){
currLineSplitted=reader.readLine().split(“,”);
地图内部地图;
if(超级地图包含(currLineSplitted[0]){
innermap=supermap.get(currLineSplitted[0]);
if(innermap.contains(currLineSplitted[1]){
innermap.put(currLineSplitted[1],
get(currLineSplitted[1])+);
}否则{
innermap.put(currLineSplitted[1],新整数(1));//已编辑
}
}否则{
innermap=新的Hashmap();
innermap.put(currLineSplitted[1],新整数(1));//已编辑
supermap.put(currLineSplitted[0],innermap);
}
}         
Collections.sort(supermap.keySet(),new YourOwnComparator());//已编辑
for(字符串userID:supermap.keySet()){
Map m=supermap.get(userID);
//=============第一个结果=============
对于(字符串组:m.keySet()){
System.out.println(userID+group+:“+m.get(group));
}
//=====================================
} 
for(字符串userID:supermap.keySet()){
Map m=supermap.get(userID);
//=============第二次结果=============
int numberOfGroups=m.size();
StringBuilder sb=新的StringBuilder();
sb.追加(userID+“-->”);
对于(字符串组:m.keySet()){
sb.append(m.get(group).toString()+“/”+numberOfGroups);
}
System.out.println(sb.toString());
//=====================================
} 
}最后{
if(读卡器!=null){
reader.close();
}
}

编辑:我的错:必须以1作为起始值创建整数。条目的排序可以相应地执行。

非常不清楚。所有数字都是什么意思?你到底想要什么?我没有第二个分组,你能解释一下语法吗?
XYZ-->1/2+1/2
是什么意思吗?你写了
2/2(ABC的发生次数)
所以我猜(但不清楚)第二个数字是发生次数,但第一个是什么?发生次数指的是什么?全局发生次数还是每组发生次数?更好地解释第二个输出将有助于给出解决方案。在ABC-->(第1组发生两次)/(ABC发生的总次数)+((组3发生一次)/(ABC发生的总次数))。所以ABC-->2/3+1/3XYZGroup1:0 ABCGroup1:0 DEFGroup2:0 GHIGroup4:0 UVWGroup5:0 XYZ-->0/1 ABC-->0/1 DEF-->0/1 GHI-->0/1 UVW-->0/1这就是我从上述方法中得到的结果