Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/383.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
令牌清理器Java方法_Java_Word Count - Fatal编程技术网

令牌清理器Java方法

令牌清理器Java方法,java,word-count,Java,Word Count,我需要一个TokenCleaner方法用于我正在做的WordCount项目。标记是由空格(通常是单词)包围的字符序列,需要“清除”任何标点符号和大写字母。我有一个模板,但我不知道如何做或启动它 public class TokenCleaner { public static void main() { String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};

我需要一个TokenCleaner方法用于我正在做的WordCount项目。标记是由空格(通常是单词)包围的字符序列,需要“清除”任何标点符号和大写字母。我有一个模板,但我不知道如何做或启动它

public class TokenCleaner
{
    public static void main()
    {
        String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};
        for(int i = 0; i < tokens.length; i++)
        {
            System.out.println("Original:\t"+tokens[i]);
            System.out.println("Cleaned:\t"+cleanToken(tokens[i]));
        }
    }
private static String cleanToken(String token)
    {
        /** remove leading special characters and numbers **/
        // while the token's length is greater than zero AND the first character isn't a letter
            // remove the first character from the token
        /** remove trailing special characters and numbers **/
        // while the token's length is greater than zero AND the last character isn't a letter
            // remove the last character from the token
        // return a lowercase version of the token
        /** Note: It is possible for the cleaned token to be an empty String if the given token
            consisted of only non-letter characters */
        return null; // placeholder return statement
    }
公共类令牌清理器
{
公共静态void main()
{
String[]tokens={“那是”、“两手空空”、“42”、“idk…”、“quote\”;
for(int i=0;i
有人能帮忙吗


谢谢

我不确定这是否是上述要求,但您可以使用以下方法:

private static String cleanToken(String token)
    {
        return token.replaceAll("\\P{L}", "").toLowerCase();
    }
但这将删除所有位置的数字和特殊字符,而不仅仅是令牌的开始和结束


请务必让我知道这是否有帮助。

我不确定这是否是上述要求,但您可以使用以下方法:

private static String cleanToken(String token)
    {
        return token.replaceAll("\\P{L}", "").toLowerCase();
    }
但这将删除所有位置的数字和特殊字符,而不仅仅是令牌的开始和结束

如果有帮助,一定要告诉我

我不知道如何做或启动它

public class TokenCleaner
{
    public static void main()
    {
        String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};
        for(int i = 0; i < tokens.length; i++)
        {
            System.out.println("Original:\t"+tokens[i]);
            System.out.println("Cleaned:\t"+cleanToken(tokens[i]));
        }
    }
private static String cleanToken(String token)
    {
        /** remove leading special characters and numbers **/
        // while the token's length is greater than zero AND the first character isn't a letter
            // remove the first character from the token
        /** remove trailing special characters and numbers **/
        // while the token's length is greater than zero AND the last character isn't a letter
            // remove the last character from the token
        // return a lowercase version of the token
        /** Note: It is possible for the cleaned token to be an empty String if the given token
            consisted of only non-letter characters */
        return null; // placeholder return statement
    }
您可以通过模式匹配来实现这一点。首先阅读
模式
(实现Java正则表达式)的javadocs和
字符串.replaceAll
方法

或者,您可以创建一个新的(空的)
StringBuilder
,然后循环原始字符串中的字符,复制要保留在
StringBuilder
中的字符。完成后,从
StringBuilder
创建一个
字符串

我不会给你相关Javadoc的链接,找到它们,搜索它们,阅读/理解它们是你需要学习的技能

我不知道如何做或启动它

public class TokenCleaner
{
    public static void main()
    {
        String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};
        for(int i = 0; i < tokens.length; i++)
        {
            System.out.println("Original:\t"+tokens[i]);
            System.out.println("Cleaned:\t"+cleanToken(tokens[i]));
        }
    }
private static String cleanToken(String token)
    {
        /** remove leading special characters and numbers **/
        // while the token's length is greater than zero AND the first character isn't a letter
            // remove the first character from the token
        /** remove trailing special characters and numbers **/
        // while the token's length is greater than zero AND the last character isn't a letter
            // remove the last character from the token
        // return a lowercase version of the token
        /** Note: It is possible for the cleaned token to be an empty String if the given token
            consisted of only non-letter characters */
        return null; // placeholder return statement
    }
您可以通过模式匹配来实现这一点。首先阅读
模式
(实现Java正则表达式)的javadocs和
字符串.replaceAll
方法

或者,您可以创建一个新的(空的)
StringBuilder
,然后循环原始字符串中的字符,复制要保留在
StringBuilder
中的字符。完成后,从
StringBuilder
创建一个
字符串


我不会给你相关Javadoc的链接。查找、搜索、阅读/理解这些都是你需要学习的技能。

我建议你解析每个字符,如果它等于你想删除的任何内容,你可以删除它,如果不是小写,例如:

private static String cleanToken(String token) {
// arraylist of new token
ArrayList<String> newtoken = new ArrayList<String>();
// arraylist of elements you wanna delete
ArrayList<String> todelete = new ArrayList<String>();
todelete.add("@"); // you can add all element u wanna delete
// parsing your token
for(int i=0 ; i < token.lentgh() ; i++ ) {
    if ( todelete.contains( token.charAt(i) ) ) {
        // you can delete it in the way you want
    }
    else {
        // lowercase it
        newtoken.add( (token.charAt(i)).toString().toLowerCase() ) ;
    }
}
// and now you can merge all elements of your newtoken list to one String
String NewToken = "";
for ( String t : newtoken ) {
     NewToken = NewToken + t ;
}
return NewToken;
}
私有静态字符串清除令牌(字符串令牌){
//新令牌的arraylist
ArrayList newtoken=新的ArrayList();
//要删除的元素的arraylist
ArrayList todelete=新的ArrayList();
todelete.add(“@”);//可以添加所有要删除的元素
//解析您的令牌
对于(int i=0;i
我可以建议您解析每个字符,如果它等于要删除的任何字符,您可以将其删除,如果不是小写,例如:

private static String cleanToken(String token) {
// arraylist of new token
ArrayList<String> newtoken = new ArrayList<String>();
// arraylist of elements you wanna delete
ArrayList<String> todelete = new ArrayList<String>();
todelete.add("@"); // you can add all element u wanna delete
// parsing your token
for(int i=0 ; i < token.lentgh() ; i++ ) {
    if ( todelete.contains( token.charAt(i) ) ) {
        // you can delete it in the way you want
    }
    else {
        // lowercase it
        newtoken.add( (token.charAt(i)).toString().toLowerCase() ) ;
    }
}
// and now you can merge all elements of your newtoken list to one String
String NewToken = "";
for ( String t : newtoken ) {
     NewToken = NewToken + t ;
}
return NewToken;
}
私有静态字符串清除令牌(字符串令牌){
//新令牌的arraylist
ArrayList newtoken=新的ArrayList();
//要删除的元素的arraylist
ArrayList todelete=新的ArrayList();
todelete.add(“@”);//可以添加所有要删除的元素
//解析您的令牌
对于(int i=0;i
到目前为止,您实际尝试了什么?另外,请查找replaceAll方法和regex。模板对手头任务的描述似乎非常具体。您遇到了哪些特定部分的困难?@KevinAnderson-我不知道如何删除特殊字符。对于方法f,我使用while而不是a或者删除第一个和最后一个不是字母的字符?我在哪里使用replaceAll以及如何使用?到目前为止,您实际尝试了什么?另外,请查看replaceAll方法和regex。模板对手头任务的描述似乎非常具体。您在哪些特定部分遇到困难?@KevinAnderson-i我不知道如何删除特殊字符。我使用while而不是a来删除第一个和最后一个不是字母的字符。我应该在哪里删除