Java 对csv文件中的类似字符串进行分组,并查找出现的次数
我需要对CSV列进行分组,以便Java 对csv文件中的类似字符串进行分组,并查找出现的次数,java,csv,guava,Java,Csv,Guava,我需要对CSV列进行分组,以便 User ID Group ABC Group1 DEF Group2 ABC Group3 GHI Group4 XYZ Group2 UVW Group5 XYZ Group1 ABC Group1 DEF Group2 ABC Group1 ->2 A
User ID Group
ABC Group1
DEF Group2
ABC Group3
GHI Group4
XYZ Group2
UVW Group5
XYZ Group1
ABC Group1
DEF Group2
ABC Group1 ->2
ABC Group3 ->1
DEF Group2 ->2
GHI Group4 ->1
UVW Group5 ->1
XYZ Group2 ->1
XYZ Group1 ->1
输出应该是这样的
User ID Group
ABC Group1
DEF Group2
ABC Group3
GHI Group4
XYZ Group2
UVW Group5
XYZ Group1
ABC Group1
DEF Group2
ABC Group1 ->2
ABC Group3 ->1
DEF Group2 ->2
GHI Group4 ->1
UVW Group5 ->1
XYZ Group2 ->1
XYZ Group1 ->1
并且需要对数据进行分组,例如ABC-->(第1组发生两次)/(ABC发生总数)+(第3组发生一次)/(ABC发生总数)。所以ABC-->2/3+1/3
第一组结果是使用GUAVA lib获得的
Multiset<String> set = TreeMultiset.create();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("test.csv"));
String[] currLineSplitted;
while (reader.ready()) {
currLineSplitted = reader.readLine().split(",");
set.add(currLineSplitted[0] + "," + currLineSplitted[1]);
}
for (String key : set.elementSet()) {
System.out.println(key + " : " + set.count(key));
}
} finally {
if (reader != null) {
reader.close();
}
}
Multiset set=TreeMultiset.create();
BufferedReader reader=null;
试一试{
reader=新的BufferedReader(新的文件阅读器(“test.csv”);
字符串[]currLineSplitted;
while(reader.ready()){
currLineSplitted=reader.readLine().split(“,”);
set.add(currLineSplitted[0]+“,“+currLineSplitted[1]);
}
对于(字符串键:set.elementSet()){
System.out.println(key+”:“+set.count(key));
}
}最后{
if(读卡器!=null){
reader.close();
}
}
不确定如何通过分组获得第二个结果。您应该使用集合映射,而不是普通集合。大概是这样的:
Map<String, Map<String,Integer>> supermap = new Hashmap();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("test.csv"));
String[] currLineSplitted;
while (reader.ready()) {
currLineSplitted = reader.readLine().split(",");
Map<String,Integer> innermap;
if(supermap.contains(currLineSplitted[0]){
innermap = supermap.get(currLineSplitted[0]);
if(innermap.contains(currLineSplitted[1]){
innermap.put(currLineSplitted[1],
innermap.get(currLineSplitted[1])++);
} else {
innermap.put(currLineSplitted[1],new Integer(1));//EDITED
}
} else {
innermap=new Hashmap();
innermap.put(currLineSplitted[1],new Integer(1));//EDITED
supermap.put(currLineSplitted[0], innermap);
}
}
Collections.sort(supermap.keySet() , new YourOwnComparator() );//EDITED
for (String userID : supermap.keySet()) {
Map m = supermap.get(userID);
//===========first result=============
for(String group : m.keySet()){
System.out.println(userID + group + " : " + m.get(group));
}
//=====================================
}
for (String userID : supermap.keySet()) {
Map m = supermap.get(userID);
//===========second result=============
int numberOfGroups = m.size();
StringBuilder sb = new StringBuilder();
sb.append(userID+"-->");
for(String group : m.keySet()){
sb.append(m.get(group).toString()+"/"+numberOfGroups);
}
System.out.println(sb.toString());
//=====================================
}
} finally {
if (reader != null) {
reader.close();
}
}
Map supermap=newhashmap();
BufferedReader reader=null;
试一试{
reader=新的BufferedReader(新的文件阅读器(“test.csv”);
字符串[]currLineSplitted;
while(reader.ready()){
currLineSplitted=reader.readLine().split(“,”);
地图内部地图;
if(超级地图包含(currLineSplitted[0]){
innermap=supermap.get(currLineSplitted[0]);
if(innermap.contains(currLineSplitted[1]){
innermap.put(currLineSplitted[1],
get(currLineSplitted[1])+);
}否则{
innermap.put(currLineSplitted[1],新整数(1));//已编辑
}
}否则{
innermap=新的Hashmap();
innermap.put(currLineSplitted[1],新整数(1));//已编辑
supermap.put(currLineSplitted[0],innermap);
}
}
Collections.sort(supermap.keySet(),new YourOwnComparator());//已编辑
for(字符串userID:supermap.keySet()){
Map m=supermap.get(userID);
//=============第一个结果=============
对于(字符串组:m.keySet()){
System.out.println(userID+group+:“+m.get(group));
}
//=====================================
}
for(字符串userID:supermap.keySet()){
Map m=supermap.get(userID);
//=============第二次结果=============
int numberOfGroups=m.size();
StringBuilder sb=新的StringBuilder();
sb.追加(userID+“-->”);
对于(字符串组:m.keySet()){
sb.append(m.get(group).toString()+“/”+numberOfGroups);
}
System.out.println(sb.toString());
//=====================================
}
}最后{
if(读卡器!=null){
reader.close();
}
}
编辑:我的错:必须以1作为起始值创建整数。条目的排序可以相应地执行。非常不清楚。所有数字都是什么意思?你到底想要什么?我没有第二个分组,你能解释一下语法吗?
XYZ-->1/2+1/2
是什么意思吗?你写了2/2(ABC的发生次数)
所以我猜(但不清楚)第二个数字是发生次数,但第一个是什么?发生次数指的是什么?全局发生次数还是每组发生次数?更好地解释第二个输出将有助于给出解决方案。在ABC-->(第1组发生两次)/(ABC发生的总次数)+((组3发生一次)/(ABC发生的总次数))。所以ABC-->2/3+1/3XYZGroup1:0 ABCGroup1:0 DEFGroup2:0 GHIGroup4:0 UVWGroup5:0 XYZ-->0/1 ABC-->0/1 DEF-->0/1 GHI-->0/1 UVW-->0/1这就是我从上述方法中得到的结果