Java：从一个大文件中获取随机行_Java_File_Constraints

Java：从一个大文件中获取随机行

java file

Java：从一个大文件中获取随机行,java,file,constraints,Java,File,Constraints,我已经看到了，但是这里所述的方法（公认的答案）运行速度非常慢。它在我的598KB文本文件上运行得非常慢，在我的a版本的文本文件上运行得也很慢，该文本文件每20行中只有一行，为20KB。我从来没有通过“a”部分（这是一个字表）原始文件有64141行；缩短的有2138行。为了生成这些文件，我使用LinuxMint11/usr/share/dict/american englishwordlist并使用grep删除任何带有大写或撇号的内容（grep-v[[：upper:].\grep-v\\）我使

我已经看到了，但是这里所述的方法（公认的答案）运行速度非常慢。它在我的598KB文本文件上运行得非常慢，在我的a版本的文本文件上运行得也很慢，该文本文件每20行中只有一行，为20KB。我从来没有通过“a”部分（这是一个字表）

原始文件有64141行；缩短的有2138行。为了生成这些文件，我使用LinuxMint11

/usr/share/dict/american english

wordlist并使用

grep

删除任何带有大写或撇号的内容（

grep-v[[：upper:].\grep-v\\

）

我使用的代码是

String result = null;
final Random rand = new Random();
int n = 0;
for (final Scanner sc = new Scanner(wordList); sc.hasNext();) {
    n++;
    if (rand.nextInt(n) == 0) {
    final String line = sc.nextLine();
        boolean isOK = true;
        for (final char c : line.toCharArray()) {
            if (!(constraints.isAllowed(c))) {
                isOK = false;
                break;
            }
        }
        if (isOK) {
            result = line;
        }
        System.out.println(result);
    }
}
return result;

略为改编自

对象

约束

是一个

键盘约束

，它基本上有一个方法

是允许的（char）

：

其中，构造函数中提供了

允许的

和

允许的

。此处使用的

约束

变量将

的“aeouhtns.tocharray（）

作为其

允许键

，并禁用

允许

从本质上讲，我希望该方法能做的是选择一个满足约束条件的随机词（例如，对于这些约束条件，“投票权”可以起作用，但“工作者”不起作用，因为“w”不在

“aeouhtns.tocharray（）

）中）

我如何才能做到这一点？

您的实现中有一个bug。在你选择一个随机数之前，你应该先阅读这一行。更改此项：

n++;
if (rand.nextInt(n) == 0) {
    final String line = sc.nextLine();

对此（如中所述）：

在绘制随机数之前，还应检查约束。如果一条线未满足约束条件，则应忽略它，如下所示：

n++;

String line;
do {
    if (!sc.hasNext()) { return result; }
    line = sc.nextLine();
} while (!meetsConstraints(line));

if (rand.nextInt(n) == 0) {
    result = line; 
}

我会读入所有的行，将它们保存在某个地方，然后从中随机选择一行。这只需要很少的时间，因为现在一个小于1MB的文件的大小很小

public class Main {
    public static void main(String... args) throws IOException {
        long start = System.nanoTime();
        RandomDict dict = RandomDict.load("/usr/share/dict/american-english");
        final int count = 1000000;
        for (int i = 0; i < count; i++)
            dict.nextWord();
        long time = System.nanoTime() - start;
        System.out.printf("Took %.3f seconds to load and find %,d random words.", time / 1e9, count);
    }
}

class RandomDict {
    public static final String[] NO_STRINGS = {};
    final Random random = new Random();
    final String[] words;

    public RandomDict(String[] words) {
        this.words = words;
    }

    public static RandomDict load(String filename) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader(filename));
        Set<String> words = new LinkedHashSet<String>();
        try {
            for (String line; (line = br.readLine()) != null; ) {
                if (line.indexOf('\'') >= 0) continue;
                words.add(line.toLowerCase());
            }
        } finally {
            br.close();
        }
        return new RandomDict(words.toArray(NO_STRINGS));
    }

    public String nextWord() {
        return words[random.nextInt(words.length)];
    }
}

哦，我明白了-所以，当我不调用

nextLine（）

时，它只是停留在行中，直到数字恰好正确，因此它的指数速度会减慢。编辑：我想我也理解第二部分。让我来测试一下。哦，哇-那真的快多了（假设你没有使用某种超级计算机）！我当前的实现需要大约15秒的时间来加载100个4GB RAM—我肯定会更新我的代码。非常感谢！它使用的是一台5-yo的核心双核windows XP电脑

n++;

String line;
do {
    if (!sc.hasNext()) { return result; }
    line = sc.nextLine();
} while (!meetsConstraints(line));

if (rand.nextInt(n) == 0) {
    result = line; 
}

public class Main {
    public static void main(String... args) throws IOException {
        long start = System.nanoTime();
        RandomDict dict = RandomDict.load("/usr/share/dict/american-english");
        final int count = 1000000;
        for (int i = 0; i < count; i++)
            dict.nextWord();
        long time = System.nanoTime() - start;
        System.out.printf("Took %.3f seconds to load and find %,d random words.", time / 1e9, count);
    }
}

class RandomDict {
    public static final String[] NO_STRINGS = {};
    final Random random = new Random();
    final String[] words;

    public RandomDict(String[] words) {
        this.words = words;
    }

    public static RandomDict load(String filename) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader(filename));
        Set<String> words = new LinkedHashSet<String>();
        try {
            for (String line; (line = br.readLine()) != null; ) {
                if (line.indexOf('\'') >= 0) continue;
                words.add(line.toLowerCase());
            }
        } finally {
            br.close();
        }
        return new RandomDict(words.toArray(NO_STRINGS));
    }

    public String nextWord() {
        return words[random.nextInt(words.length)];
    }
}

Took 0.091 seconds to load and find 1,000,000 random words.