Java 从给定的文本文件中提取带有空格的字母表,而不在文件中包含哨兵
我有两个文本文件: 1Java 从给定的文本文件中提取带有空格的字母表,而不在文件中包含哨兵,java,regex,string,replaceall,Java,Regex,String,Replaceall,我有两个文本文件: 1Extract\u tweet.txt-文件格式为user\u id tweet\u text 12163922 5407952300 I think I just discovered the hour when the office thermostat changes. And it ain't a good time to be at work...brrrr 2009-11-03 19:22:54 2locations.txt-下面数据中的相关性是第
Extract\u tweet.txt
-文件格式为user\u id tweet\u text
12163922 5407952300 I think I just discovered the hour when the office thermostat changes. And it ain't a good time to be at work...brrrr 2009-11-03 19:22:54
2locations.txt
-下面数据中的相关性是第3列,其作用类似于搜索字符串
asciiname: name of geographical point in plain ascii characters, varchar(200)
4045431 Point Poker Point Poker 52.89508 173.29911 T CAPE US AK 016 0 9 America/Adak 2013-10-26
我想从这些文件中提取一些数据。数据通常只能是a-z、a-z和任何空格。我之前考虑过将字符串标记化。然而,由于没有给出sentinal,我想到了使用正则表达式。PFB提取27个字符的代码段,即a-Z或a-Z或任何空格。我只想提取小写文本,即如果有大写字符,则应将其转换为小写
我将打开文件1-Extract_tweet.txt
,并将完整文本作为单个字符串。然后我尝试用null替换每个非字母字符
public void readfromFile() throws FileNotFoundException
{
Scanner inputStream;
String source=null;
FileInputStream file = new FileInputStream("Extract_tweet.txt");
inputStream = new Scanner(file);
while(inputStream.hasNextLine()) //Read from file till the last line of the file.
{
source = inputStream.nextLine();
System.out.println(source);
replaceAll(source);
}
inputStream.close();
}
public String replaceAll(String source)
{
String regex = "[A-Z]*"+"["+source.toLowerCase()+"|"+"[a-z]*"+"[\\s]";
source = source.replaceAll(regex, "");
System.out.println(source);
return source;
}
public static void main(String[] args) {
StringProcessing sp = new StringProcessing();
try {
sp.readfromFile();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
一旦我运行这个代码,我的错误率就会降低
60730027 6320951896 @thediscovietnam coo. thanks. just dropped you a line. 2009-12-03 18:41:07
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 88
[A-Z]*[60730027 6320951896 @thediscovietnam coo. thanks. just dropped you a line. 2009-12-03 18:41:07|[a-z]*[\s]
请像那样换线
String regex = "[A-Z]* |"+"[a-z]*"+"[\\s]";
它会很好用的
我做了一些改变。但是,我想将大写改为小写,并用null替换所有字母数字值
public void readfromFile() throws FileNotFoundException
{
Scanner inputStream;
String source=null;
FileInputStream file = new FileInputStream("Extract_tweet.txt");
inputStream = new Scanner(file);
while(inputStream.hasNextLine()) //Read from file till the last line of the file.
{
source = inputStream.nextLine();
System.out.println(source);
replaceAll(source);
}
inputStream.close();
}
public String replaceAll(String source)
{
String regex = "[A-Z]*"+"["+source.toLowerCase()+"|"+"[a-z]*"+"[\\s]";
source = source.replaceAll(regex, "");
System.out.println(source);
return source;
}
public static void main(String[] args) {
StringProcessing sp = new StringProcessing();
try {
sp.readfromFile();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
展开您的方法:
public String replaceAll(String source) throws FileNotFoundException {
String regex = "[A-Z]* |[a-z]*\\s";
source = source.replaceAll(regex, "")
.replaceAll("\\d", "")
.toLowerCase();
System.out.println(source);
writetoFile(source);
return source;
}
是的,至少它起作用了。但是,我希望删除所有字母数字值,并希望将大写值转换为小写值。有什么建议吗?请用例子简单解释一下。欢迎来到堆栈溢出。请坐一会儿。