Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/364.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 从给定的文本文件中提取带有空格的字母表,而不在文件中包含哨兵_Java_Regex_String_Replaceall - Fatal编程技术网

Java 从给定的文本文件中提取带有空格的字母表,而不在文件中包含哨兵

Java 从给定的文本文件中提取带有空格的字母表,而不在文件中包含哨兵,java,regex,string,replaceall,Java,Regex,String,Replaceall,我有两个文本文件: 1Extract\u tweet.txt-文件格式为user\u id tweet\u text 12163922 5407952300 I think I just discovered the hour when the office thermostat changes. And it ain't a good time to be at work...brrrr 2009-11-03 19:22:54 2locations.txt-下面数据中的相关性是第

我有两个文本文件:

1
Extract\u tweet.txt
-文件格式为
user\u id tweet\u text

12163922    5407952300  I think I just discovered the hour when the office thermostat changes. And it ain't a good time to be at work...brrrr   2009-11-03 19:22:54
2
locations.txt
-下面数据中的相关性是第3列,其作用类似于搜索字符串

asciiname: name of geographical point in plain ascii characters, varchar(200)

4045431 Point Poker Point Poker     52.89508    173.29911   T   CAPE    US      AK  016         0       9   America/Adak    2013-10-26
我想从这些文件中提取一些数据。数据通常只能是a-z、a-z和任何空格。我之前考虑过将字符串标记化。然而,由于没有给出sentinal,我想到了使用正则表达式。PFB提取27个字符的代码段,即a-Z或a-Z或任何空格。我只想提取小写文本,即如果有大写字符,则应将其转换为小写

我将打开文件1-
Extract_tweet.txt
,并将完整文本作为单个字符串。然后我尝试用null替换每个非字母字符

   public void readfromFile() throws FileNotFoundException
    {
        Scanner inputStream;
        String source=null;
        FileInputStream file = new FileInputStream("Extract_tweet.txt");    
        inputStream = new Scanner(file);
        while(inputStream.hasNextLine())    //Read from file till the last line of the file.
        {
            source = inputStream.nextLine();
            System.out.println(source);
            replaceAll(source);

        }
        inputStream.close();
    }
    public String replaceAll(String source) 
    {
        String regex = "[A-Z]*"+"["+source.toLowerCase()+"|"+"[a-z]*"+"[\\s]";
        source = source.replaceAll(regex, "");
        System.out.println(source);
        return source;
    }

    public static void main(String[] args) {

        StringProcessing sp = new StringProcessing();
        try {
            sp.readfromFile();
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
一旦我运行这个代码,我的错误率就会降低

60730027    6320951896  @thediscovietnam coo.  thanks. just dropped you a line. 2009-12-03 18:41:07
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 88
[A-Z]*[60730027 6320951896  @thediscovietnam coo.  thanks. just dropped you a line. 2009-12-03 18:41:07|[a-z]*[\s]

请像那样换线

String regex = "[A-Z]* |"+"[a-z]*"+"[\\s]";
它会很好用的

我做了一些改变。但是,我想将大写改为小写,并用null替换所有字母数字值

   public void readfromFile() throws FileNotFoundException
    {
        Scanner inputStream;
        String source=null;
        FileInputStream file = new FileInputStream("Extract_tweet.txt");    
        inputStream = new Scanner(file);
        while(inputStream.hasNextLine())    //Read from file till the last line of the file.
        {
            source = inputStream.nextLine();
            System.out.println(source);
            replaceAll(source);

        }
        inputStream.close();
    }
    public String replaceAll(String source) 
    {
        String regex = "[A-Z]*"+"["+source.toLowerCase()+"|"+"[a-z]*"+"[\\s]";
        source = source.replaceAll(regex, "");
        System.out.println(source);
        return source;
    }

    public static void main(String[] args) {

        StringProcessing sp = new StringProcessing();
        try {
            sp.readfromFile();
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
展开您的方法:

public String replaceAll(String source) throws FileNotFoundException {
    String regex = "[A-Z]* |[a-z]*\\s";
    source = source.replaceAll(regex, "")
                   .replaceAll("\\d", "")
                   .toLowerCase();

    System.out.println(source);
    writetoFile(source);
    return source;
}

是的,至少它起作用了。但是,我希望删除所有字母数字值,并希望将大写值转换为小写值。有什么建议吗?请用例子简单解释一下。欢迎来到堆栈溢出。请坐一会儿。