Java 按频率排序单词?(从最小到最大)
有没有人知道如何使用内置的Java 按频率排序单词?(从最小到最大),java,sorting,Java,Sorting,有没有人知道如何使用内置的集合.sort和比较器接口,按照单词频率的顺序(从最小到最大)对单词列表进行排序 我已经有了一个方法来获取文本文件中某个单词的计数。现在,我只需要创建一个方法,比较每个单词的计数,然后将它们放在一个列表中,按最小频率到最大频率排序 如果您有任何想法和建议,我们将不胜感激。我很难开始使用这种特殊的方法 public class Parser implements Comparator<String> { public Map<String, I
集合.sort
和比较器
接口,按照单词频率的顺序(从最小到最大)对单词列表进行排序
我已经有了一个方法来获取文本文件中某个单词的计数。现在,我只需要创建一个方法,比较每个单词的计数,然后将它们放在一个列表中,按最小频率到最大频率排序
如果您有任何想法和建议,我们将不胜感激。我很难开始使用这种特殊的方法
public class Parser implements Comparator<String> {
public Map<String, Integer> wordCount;
void parse(String filename) throws IOException {
File file = new File(filename);
Scanner scanner = new Scanner(file);
//mapping of string -> integer (word -> frequency)
Map<String, Integer> wordCount = new HashMap<String, Integer>();
//iterates through each word in the text file
while(scanner.hasNext()) {
String word = scanner.next();
if (scanner.next()==null) {
wordCount.put(word, 1);
}
else {
wordCount.put(word, wordCount.get(word) + 1);;
}
}
scanner.next().replaceAll("[^A-Za-z0-9]"," ");
scanner.next().toLowerCase();
}
public int getCount(String word) {
return wordCount.get(word);
}
public int compare(String w1, String w2) {
return getCount(w1) - getCount(w2);
}
//this method should return a list of words in order of frequency from least to greatest
public List<String> getWordsInOrderOfFrequency() {
List<Integer> wordsByCount = new ArrayList<Integer>(wordCount.values());
//this part is unfinished.. the part i'm having trouble sorting the word frequencies
List<String> result = new ArrayList<String>();
}
}
公共类解析器实现比较器{
公共地图字数;
无效解析(字符串文件名)引发IOException{
文件=新文件(文件名);
扫描仪=新扫描仪(文件);
//字符串->整数的映射(字->频率)
Map wordCount=new HashMap();
//遍历文本文件中的每个单词
while(scanner.hasNext()){
字符串字=scanner.next();
if(scanner.next()==null){
wordCount.put(word,1);
}
否则{
wordCount.put(word,wordCount.get(word)+1);;
}
}
scanner.next();
scanner.next().toLowerCase();
}
public int getCount(字符串字){
返回wordCount.get(word);
}
公共整数比较(字符串w1、字符串w2){
返回getCount(w1)-getCount(w2);
}
//此方法应按频率从最小到最大的顺序返回单词列表
公共列表GetWordSinoOrderofFrequency(){
List wordsByCount=newarraylist(wordCount.values());
//这一部分还没有完成。这一部分我对词频排序有困难
列表结果=新建ArrayList();
}
}
您可以从以下内容中比较和提取想法:
public class FrequencyCount {
public static void main(String[] args) {
// read in the words as an array
String s = StdIn.readAll();
// s = s.toLowerCase();
// s = s.replaceAll("[\",!.:;?()']", "");
String[] words = s.split("\\s+");
// sort the words
Merge.sort(words);
// tabulate frequencies of each word
Counter[] zipf = new Counter[words.length];
int M = 0; // number of distinct words
for (int i = 0; i < words.length; i++) {
if (i == 0 || !words[i].equals(words[i-1])) // short-circuiting OR
zipf[M++] = new Counter(words[i], words.length);
zipf[M-1].increment();
}
// sort by frequency and print
Merge.sort(zipf, 0, M); // sorting a subarray
for (int j = M-1; j >= 0; j--) {
StdOut.println(zipf[j]);
}
}
}
公共类频率计数{
公共静态void main(字符串[]args){
//以数组形式读入单词
字符串s=StdIn.readAll();
//s=s.toLowerCase();
//s=s.replaceAll(“[\”,!.:;?()”],”);
字符串[]字=s.split(\\s+);
//把单词分类
合并。排序(单词);
//将每个单词的频率制成表格
计数器[]zipf=新计数器[words.length];
int M=0;//不同字数
for(int i=0;i=0;j--){
StdOut.println(zipf[j]);
}
}
}
首先,您使用的是扫描仪。next()
似乎不正确。next()
将返回下一个单词,并在每次调用它时移到下一个单词,因此下面的代码:
if(scanner.next() == null){ ... }
而且
scanner.next().replaceAll("[^A-Za-z0-9]"," ");
scanner.next().toLowerCase();
会消耗然后扔掉单词。你可能想做的是:
String word = scanner.next().replaceAll("[^A-Za-z0-9]"," ").toLowerCase();
在while
循环的开头,这样对word所做的更改就保存在word
变量中,而不是丢弃
其次,wordCount
映射的用法有点不正确。您要做的是检查word
是否已经在映射中,以确定要设置的字数。要做到这一点,而不是检查扫描仪。下一步()==null
您应该查看映射,例如:
if(!wordCount.containsKey(word)){
//no count registered for the word yet
wordCount.put(word, 1);
}else{
wordCount.put(word, wordCount.get(word) + 1);
}
或者,您可以执行以下操作:
Integer count = wordCount.get(word);
if(count == null){
//no count registered for the word yet
wordCount.put(word, 1);
}else{
wordCount.put(word, count+1);
}
我更喜欢这种方法,因为它更简洁,每个单词只进行一次地图查找,而第一种方法有时进行两次查找
现在,要按频率降序获得单词列表,您可以先将地图转换为列表,然后按照中的建议应用Collections.sort()
。下面是适合您需要的简化版本:
static List<String> getWordInDescendingFreqOrder(Map<String, Integer> wordCount) {
// Convert map to list of <String,Integer> entries
List<Map.Entry<String, Integer>> list =
new ArrayList<Map.Entry<String, Integer>>(wordCount.entrySet());
// Sort list by integer values
Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
// compare o2 to o1, instead of o1 to o2, to get descending freq. order
return (o2.getValue()).compareTo(o1.getValue());
}
});
// Populate the result into a list
List<String> result = new ArrayList<String>();
for (Map.Entry<String, Integer> entry : list) {
result.add(entry.getKey());
}
return result;
}
静态列表getWordInDescendingFreqOrder(映射字数){
//将映射转换为条目列表
列表=
新的ArrayList(wordCount.entrySet());
//按整数值对列表排序
Collections.sort(list,newcomparator(){
公共整数比较(Map.Entry o1,Map.Entry o2){
//将o2与o1进行比较,而不是将o1与o2进行比较,以获得降序频率
return(o2.getValue()).compareTo(o1.getValue());
}
});
//将结果填充到列表中
列表结果=新建ArrayList();
用于(映射条目:列表){
add(entry.getKey());
}
返回结果;
}
希望这有帮助
编辑:
按照@dragon66的建议更改了比较函数。谢谢。rodions解决方案是一种仿制药地狱,但我没有让它更简单-只是不同而已 最终,他的解决方案更短更好 乍一看,树形图似乎是合适的,但它按键排序,对按值排序没有帮助,而且我们不能切换键值,因为我们是按键查找的 因此,下一个想法是生成一个HashMap,并使用Collections.sort,但它不需要映射,只需要列表进行排序。从映射中,有entrySet,它生成另一个集合,这是一个集合,而不是列表。这就是我采取的另一个方向: 我实现了一个迭代器:我在entrySet上迭代,只返回值为1的键。如果值为2,我缓冲它们以供以后使用。如果迭代器用完,我查看缓冲区,如果它不是空的,我在将来使用缓冲区的迭代器,增加我查找的最小值,并创建一个新的缓冲区 迭代器/Iterable对的优点是,可以通过简化的
import java.util.*;
// a short little declaration :)
public class WordFreq implements Iterator <Map.Entry <String, Integer>>, Iterable <Map.Entry <String, Integer>>
{
private Map <String, Integer> counter;
private Iterator <Map.Entry <String, Integer>> it;
private Set <Map.Entry <String, Integer>> buf;
private int maxCount = 1;
public Iterator <Map.Entry <String, Integer>> iterator () {
return this;
}
// The iterator interface expects a "remove ()" - nobody knows why
public void remove ()
{
if (hasNext ())
next ();
}
public boolean hasNext ()
{
return it.hasNext () || ! buf.isEmpty ();
}
public Map.Entry <String, Integer> next ()
{
while (it.hasNext ()) {
Map.Entry <String, Integer> mesi = it.next ();
if (mesi.getValue () == maxCount)
return mesi;
else
buf.add (mesi);
}
if (buf.isEmpty ())
return null;
++maxCount;
it = buf.iterator ();
buf = new HashSet <Map.Entry <String, Integer>> ();
return next ();
}
public WordFreq ()
{
it = fill ();
buf = new HashSet <Map.Entry <String, Integer>> ();
// The "this" here has to be an Iterable to make the foreach work
for (Map.Entry <String, Integer> mesi : this)
{
System.out.println (mesi.getValue () + ":\t" + mesi.getKey ());
}
}
public Iterator <Map.Entry <String, Integer>> fill ()
{
counter = new HashMap <String, Integer> ();
Scanner sc = new Scanner (System.in);
while (sc.hasNext ())
{
push (sc.next ());
}
Set <Map.Entry <String, Integer>> set = counter.entrySet ();
return set.iterator ();
}
public void push (String word)
{
Integer i = counter.get (word);
int n = 1 + ((i != null) ? i : 0);
counter.put (word, n);
}
public static void main (String args[])
{
new WordFreq ();
}
}
cat WordFreq.java | java WordFreq
import java.util.*;
public class Parser implements Comparator <String> {
public Map<String, Integer> wordCount;
void parse ()
{
Scanner scanner = new Scanner (System.in);
// don't redeclare it here - your attribute wordCount will else be shadowed
wordCount = new HashMap<String, Integer> ();
//iterates through each word in the text file
while (scanner.hasNext ()) {
String word = scanner.next ();
// operate on the word, not on next and next of next word from Scanner
word = word.replaceAll (" [^A-Za-z0-9]", " ");
word = word.toLowerCase ();
// look into your map:
if (! wordCount.containsKey (word))
wordCount.put (word, 1);
else
wordCount.put (word, wordCount.get (word) + 1);;
}
}
public int getCount (String word) {
return wordCount.get (word);
}
public int compare (String w1, String w2) {
return getCount (w1) - getCount (w2);
}
public List<String> getWordsInOrderOfFrequency () {
List<String> justWords = new ArrayList<String> (wordCount.keySet());
Collections.sort (justWords, this);
return justWords;
}
public static void main (String args []) {
Parser p = new Parser ();
p.parse ();
List<String> ls = p.getWordsInOrderOfFrequency ();
for (String s: ls)
System.out.println (s);
}
}