令牌清理器Java方法
我需要一个TokenCleaner方法用于我正在做的WordCount项目。标记是由空格(通常是单词)包围的字符序列,需要“清除”任何标点符号和大写字母。我有一个模板,但我不知道如何做或启动它令牌清理器Java方法,java,word-count,Java,Word Count,我需要一个TokenCleaner方法用于我正在做的WordCount项目。标记是由空格(通常是单词)包围的字符序列,需要“清除”任何标点符号和大写字母。我有一个模板,但我不知道如何做或启动它 public class TokenCleaner { public static void main() { String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};
public class TokenCleaner
{
public static void main()
{
String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};
for(int i = 0; i < tokens.length; i++)
{
System.out.println("Original:\t"+tokens[i]);
System.out.println("Cleaned:\t"+cleanToken(tokens[i]));
}
}
private static String cleanToken(String token)
{
/** remove leading special characters and numbers **/
// while the token's length is greater than zero AND the first character isn't a letter
// remove the first character from the token
/** remove trailing special characters and numbers **/
// while the token's length is greater than zero AND the last character isn't a letter
// remove the last character from the token
// return a lowercase version of the token
/** Note: It is possible for the cleaned token to be an empty String if the given token
consisted of only non-letter characters */
return null; // placeholder return statement
}
公共类令牌清理器
{
公共静态void main()
{
String[]tokens={“那是”、“两手空空”、“42”、“idk…”、“quote\”;
for(int i=0;i
有人能帮忙吗
谢谢我不确定这是否是上述要求,但您可以使用以下方法:
private static String cleanToken(String token)
{
return token.replaceAll("\\P{L}", "").toLowerCase();
}
但这将删除所有位置的数字和特殊字符,而不仅仅是令牌的开始和结束
请务必让我知道这是否有帮助。我不确定这是否是上述要求,但您可以使用以下方法:
private static String cleanToken(String token)
{
return token.replaceAll("\\P{L}", "").toLowerCase();
}
但这将删除所有位置的数字和特殊字符,而不仅仅是令牌的开始和结束
如果有帮助,一定要告诉我
我不知道如何做或启动它
public class TokenCleaner
{
public static void main()
{
String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};
for(int i = 0; i < tokens.length; i++)
{
System.out.println("Original:\t"+tokens[i]);
System.out.println("Cleaned:\t"+cleanToken(tokens[i]));
}
}
private static String cleanToken(String token)
{
/** remove leading special characters and numbers **/
// while the token's length is greater than zero AND the first character isn't a letter
// remove the first character from the token
/** remove trailing special characters and numbers **/
// while the token's length is greater than zero AND the last character isn't a letter
// remove the last character from the token
// return a lowercase version of the token
/** Note: It is possible for the cleaned token to be an empty String if the given token
consisted of only non-letter characters */
return null; // placeholder return statement
}
您可以通过模式匹配来实现这一点。首先阅读模式
(实现Java正则表达式)的javadocs和字符串.replaceAll
方法
或者,您可以创建一个新的(空的)StringBuilder
,然后循环原始字符串中的字符,复制要保留在StringBuilder
中的字符。完成后,从StringBuilder
创建一个字符串
我不会给你相关Javadoc的链接,找到它们,搜索它们,阅读/理解它们是你需要学习的技能
我不知道如何做或启动它
public class TokenCleaner
{
public static void main()
{
String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};
for(int i = 0; i < tokens.length; i++)
{
System.out.println("Original:\t"+tokens[i]);
System.out.println("Cleaned:\t"+cleanToken(tokens[i]));
}
}
private static String cleanToken(String token)
{
/** remove leading special characters and numbers **/
// while the token's length is greater than zero AND the first character isn't a letter
// remove the first character from the token
/** remove trailing special characters and numbers **/
// while the token's length is greater than zero AND the last character isn't a letter
// remove the last character from the token
// return a lowercase version of the token
/** Note: It is possible for the cleaned token to be an empty String if the given token
consisted of only non-letter characters */
return null; // placeholder return statement
}
您可以通过模式匹配来实现这一点。首先阅读模式
(实现Java正则表达式)的javadocs和字符串.replaceAll
方法
或者,您可以创建一个新的(空的)StringBuilder
,然后循环原始字符串中的字符,复制要保留在StringBuilder
中的字符。完成后,从StringBuilder
创建一个字符串
我不会给你相关Javadoc的链接。查找、搜索、阅读/理解这些都是你需要学习的技能。我建议你解析每个字符,如果它等于你想删除的任何内容,你可以删除它,如果不是小写,例如:
private static String cleanToken(String token) {
// arraylist of new token
ArrayList<String> newtoken = new ArrayList<String>();
// arraylist of elements you wanna delete
ArrayList<String> todelete = new ArrayList<String>();
todelete.add("@"); // you can add all element u wanna delete
// parsing your token
for(int i=0 ; i < token.lentgh() ; i++ ) {
if ( todelete.contains( token.charAt(i) ) ) {
// you can delete it in the way you want
}
else {
// lowercase it
newtoken.add( (token.charAt(i)).toString().toLowerCase() ) ;
}
}
// and now you can merge all elements of your newtoken list to one String
String NewToken = "";
for ( String t : newtoken ) {
NewToken = NewToken + t ;
}
return NewToken;
}
私有静态字符串清除令牌(字符串令牌){
//新令牌的arraylist
ArrayList newtoken=新的ArrayList();
//要删除的元素的arraylist
ArrayList todelete=新的ArrayList();
todelete.add(“@”);//可以添加所有要删除的元素
//解析您的令牌
对于(int i=0;i
我可以建议您解析每个字符,如果它等于要删除的任何字符,您可以将其删除,如果不是小写,例如:
private static String cleanToken(String token) {
// arraylist of new token
ArrayList<String> newtoken = new ArrayList<String>();
// arraylist of elements you wanna delete
ArrayList<String> todelete = new ArrayList<String>();
todelete.add("@"); // you can add all element u wanna delete
// parsing your token
for(int i=0 ; i < token.lentgh() ; i++ ) {
if ( todelete.contains( token.charAt(i) ) ) {
// you can delete it in the way you want
}
else {
// lowercase it
newtoken.add( (token.charAt(i)).toString().toLowerCase() ) ;
}
}
// and now you can merge all elements of your newtoken list to one String
String NewToken = "";
for ( String t : newtoken ) {
NewToken = NewToken + t ;
}
return NewToken;
}
私有静态字符串清除令牌(字符串令牌){
//新令牌的arraylist
ArrayList newtoken=新的ArrayList();
//要删除的元素的arraylist
ArrayList todelete=新的ArrayList();
todelete.add(“@”);//可以添加所有要删除的元素
//解析您的令牌
对于(int i=0;i
到目前为止,您实际尝试了什么?另外,请查找replaceAll方法和regex。模板对手头任务的描述似乎非常具体。您遇到了哪些特定部分的困难?@KevinAnderson-我不知道如何删除特殊字符。对于方法f,我使用while而不是a或者删除第一个和最后一个不是字母的字符?我在哪里使用replaceAll以及如何使用?到目前为止,您实际尝试了什么?另外,请查看replaceAll方法和regex。模板对手头任务的描述似乎非常具体。您在哪些特定部分遇到困难?@KevinAnderson-i我不知道如何删除特殊字符。我使用while而不是a来删除第一个和最后一个不是字母的字符。我应该在哪里删除