Java 如何清理\n\t等中的字符串。？_Java_String

Java 如何清理\n\t等中的字符串。？

java string

Java 如何清理\n\t等中的字符串。？,java,string,Java,String,这是我的示例字符串： “你好\n我是\t\n\n马可\t\n” 我想删除所有解码的白色字符。是否有任何通用解决方案不仅适用于\n\t？这将使用单个空格替换非word字符的字符运行。你不必知道你不想要什么角色。你只需说出你想要什么： import java.util.regex.Matcher; import java.util.regex.Pattern; class Test { public static void main(String[]args) { Str

这是我的示例字符串：

“你好\n我是\t\n\n马可\t\n”

我想删除所有解码的白色字符。是否有任何通用解决方案不仅适用于

\n\t

？

这将使用单个空格替换非word字符的字符运行。你不必知道你不想要什么角色。你只需说出你想要什么：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Test {
    public static void main(String[]args) {

        String data = "Hello\n I am\t \n \n Marco\t\n";

        data = data.replaceAll("[^\\w]+", " ");

        System.out.println(data);
    }
}

结果:

Hello I am Marco

正则表达式

“[^\\w]+”

表示匹配非单词字符的字符组。单词字符有A-Z、A-Z、0-9和“389;”。调用

replaceAll

表示用单个空格字符替换这些字符组中的每一个

如果这不是您想要的，您可以通过调整正则表达式和替换字符串来选择其他选项。例如，您可以在表达式

“[^\\w]+”

中保留空格，并将替换字符串更改为

”

，但在某些单词之间会有多个空格

通过将其他字符添加到

“[^\\w]+”

表达式中，可以将其添加到未删除的字符列表中。

只需将所有空白（即

\s+

）替换为

”

输出：

HelloIamMarco

您也可以使用java流，我认为这些流更可读：

String noWhitespace=“你好\n我是\t\n\n马可\t\n”.chars（）
.filter（c->！Character.isWhitespace（c））
.collect（StringBuilder:：new、StringBuilder:：appendCodePoint、StringBuilder:：append）
.toString（）；

我在Java中使用正则表达式处理空白方面运气不好（我不同意Java对空白的定义，当你开始处理Unicode字符时，它会变得很奇怪）。对于细粒度控制，我使用以下方法：

public static String strip(final String text)
{
    if ((text == null) || (text.length() == 0))
    {
        return text; // nothing to do
    }

    final StringBuilder str = new StringBuilder();

    for (char c : text.toCharArray())
    {
        switch (c)
        {
            // https://stackoverflow.com/a/4731164/2074605
            case ' ':  // '\u0020' SPACE
            case '\t': // '\u0009' CHARACTER TABULATION
            case '\n':
            case '\r':
            case '\f': // '\u000c'
            case '\u00a0': // NO-BREAK SPACE
            case '\u2002': // EN SPACE
            case '\u2003': // EM SPACE
            case '\u2009': // THIN SPACE
            case '\u200a': // HAIR SPACE
            case '\u000b': // vertical tab
            {
                break;
            }
            default:
            {
                str.append(c);
                break;
            }
        }
    }

    return str.toString();
}

这种方法也有助于轻松构建其他核心字符串实用程序（trim、normalize等）

例如：

/**
 * Normalizes text. This replaces multiple white spaces with a single character.
 * This preserves the first whitespace character but ignores following whitespace until a non-whitespace character is encountered.
 *
 * @param text The text to normalize.
 * @return The normalized text.
 */
public static String normalize(final String text)
{
    if (text == null)
    {
        return null;
    }

    final StringBuilder strbuf = new StringBuilder();

    boolean previousSpace = false;
    for (char c : text.toCharArray())
    {
        switch (c)
        {
            // https://stackoverflow.com/a/4731164/2074605
            case ' ':  // '\u0020' SPACE
            case '\t': // '\u0009' CHARACTER TABULATION
            case '\n':
            case '\r':
            case '\f': // '\u000c'
            case '\u00a0': // NO-BREAK SPACE
            case '\u2002': // EN SPACE
            case '\u2003': // EM SPACE
            case '\u2009': // THIN SPACE
            case '\u200a': // HAIR SPACE
            case '\u000b': // vertical tab
            {
                if (!previousSpace)
                {
                    strbuf.append(c);
                }
                previousSpace = true;
                break;
            }
            default:
            {
                strbuf.append(c);
                previousSpace = false;
                break;
            }
        }
    }

    return strbuf.toString();
}

以及：

我的工具箱课上有：

/**
     * This method formats a String. <br>
     * <br>
     * It places the first non-white space character at the left, and removes all extra spaces. <br>
     * So "&nbsp;a&nbsp;bc&nbsp;&nbsp;&nbsp;cd" will be returned as "a&nbsp;bc&nbsp;cd"
     * @param format
     */
    public static String stringLeftJustify( String theValue, JustifyFormat format )
    {
        char charArray[];

        try
        {
            charArray = theValue.toCharArray();
        }
        catch (NullPointerException e)
        {
            return "";
        }

        StringBuilder out = new StringBuilder( charArray.length + 1 );

        // remove any leading whitespace
        boolean isSpace = true;

        for (int c = 0; c < charArray.length; c++)
        {
            if (format == JustifyFormat.MULTI_LINE)
            {
                // leave CRLF for multi-line inputs
                if (!(charArray[c] == '\n' || charArray[c] == '\r') && Character.isWhitespace( charArray[c] ))
                {
                    if (!isSpace)
                        out.append( ' ' );

                    isSpace = true;
                }
                else
                {
                    out.append( charArray[c] );
                    isSpace = false;
                }
            }
            else
            {
                if (Character.isWhitespace( charArray[c] ))
                {
                    if (!isSpace)
                        out.append( ' ' );

                    isSpace = true;
                }
                else
                {
                    out.append( charArray[c] );
                    isSpace = false;
                }
            }
        }

        // remove trailing space
        if (isSpace && out.length() > 0)
        {
            String justified = out.toString();

            return justified.substring( 0, justified.length() - 1 );
        }

        return out.toString();
    }

/**
*此方法格式化字符串

*

*它将第一个非空白字符放置在左侧，并删除所有多余的空格

*因此，“bc cd”将作为“bc cd”返回
*@param格式
*/
公共静态字符串stringLeftJustify（字符串值，JustifyFormat格式）
{
charchararray[]；
尝试
{
charArray=value.toCharArray（）；
}
捕获（NullPointerException e）
{
返回“”；
}
StringBuilder out=新StringBuilder（charArray.length+1）；
//删除任何前导空格
布尔isSpace=true；
for（int c=0；c0）
{
字符串对齐=out.toString（）；
返回justified.substring（0，justified.length（）-1）；
}
return out.toString（）；
}

使用带有

String.replaceAll（）的正则表达式

是的，但使用正则表达式时，我需要知道所有字符，但我不确定它们的大小。我的链接中的第一个答案适用于您的example@pawel033-您对中提到的

\s+

有任何问题吗？@ArvindKumarAvinash是的，我有，我认为这可能与输入有关，但我不知道原因是什么。您可以将模式中的正则表达式简化为

\\W+

（

\\W

[^\\W]

）您可以通过解释您正在做的事情和原因来改进您的答案。例如“\w”是什么意思，为什么要将每个匹配项替换为“”，等等，如果文本中有“\b”作为上下文，该怎么办？

/**
 * Trims leading and trailing whitespace.
 * This method understands more forms of white space than String.trim().
 *
 * @param text The text to trim.
 * @return The trimmed text.
 */
public static String trim(final String text)
{
    if ((text == null) || (text.length() == 0))
    {
        return text; // nothing to do
    }

    // Find the first and last non-space characters in the text.
    Integer firstNonSpaceIdx = null;
    Integer lastNonSpaceIdx = null;

    int currentIdx = 0;

    for (char c : text.toCharArray())
    {
        switch (c)
        {
            // https://stackoverflow.com/a/4731164/2074605
            case ' ':  // '\u0020' SPACE
            case '\t': // '\u0009' CHARACTER TABULATION
            case '\n':
            case '\r':
            case '\f': // '\u000c'
            case '\u00a0': // NO-BREAK SPACE
            case '\u2002': // EN SPACE
            case '\u2003': // EM SPACE
            case '\u2009': // THIN SPACE
            case '\u200a': // HAIR SPACE
            case '\u000b': // vertical tab
            {
                break;
            }
            default:
            {
                if (firstNonSpaceIdx == null)
                {
                    firstNonSpaceIdx = currentIdx;
                }

                lastNonSpaceIdx = currentIdx;
                break;
            }
        }

        ++currentIdx;
    }

    if (firstNonSpaceIdx == null)
    {
        return text; // nothing to do
    }

    return text.substring(firstNonSpaceIdx, lastNonSpaceIdx + 1);
}

/**
 * Normalizes text. This replaces multiple white spaces with a single space character.
 * It also trims any whitespace from the beginning and end of the string.
 *
 * @param text The text to normalize.
 * @return The normalized text.
 */
public static String whitespaceToSingleSpace(final String text)
{
    if (text == null)
    {
        return null;
    }

    final StringBuilder strbuf = new StringBuilder();

    boolean previousSpace = false;
    for (char c : text.toCharArray())
    {
        switch (c)
        {
            // https://stackoverflow.com/a/4731164/2074605
            case ' ':  // '\u0020' SPACE
            case '\t': // '\u0009' CHARACTER TABULATION
            case '\n':
            case '\r':
            case '\f': // '\u000c'
            case '\u00a0': // NO-BREAK SPACE
            case '\u2002': // EN SPACE
            case '\u2003': // EM SPACE
            case '\u2009': // THIN SPACE
            case '\u200a': // HAIR SPACE
            case '\u000b': // vertical tab
            {
                if (!previousSpace)
                {
                    strbuf.append(' ');
                }
                previousSpace = true;
                break;
            }
            default:
            {
                strbuf.append(c);
                previousSpace = false;
                break;
            }
        }
    }

    return trim(strbuf.toString());
}

/**
     * This method formats a String. <br>
     * <br>
     * It places the first non-white space character at the left, and removes all extra spaces. <br>
     * So "&nbsp;a&nbsp;bc&nbsp;&nbsp;&nbsp;cd" will be returned as "a&nbsp;bc&nbsp;cd"
     * @param format
     */
    public static String stringLeftJustify( String theValue, JustifyFormat format )
    {
        char charArray[];

        try
        {
            charArray = theValue.toCharArray();
        }
        catch (NullPointerException e)
        {
            return "";
        }

        StringBuilder out = new StringBuilder( charArray.length + 1 );

        // remove any leading whitespace
        boolean isSpace = true;

        for (int c = 0; c < charArray.length; c++)
        {
            if (format == JustifyFormat.MULTI_LINE)
            {
                // leave CRLF for multi-line inputs
                if (!(charArray[c] == '\n' || charArray[c] == '\r') && Character.isWhitespace( charArray[c] ))
                {
                    if (!isSpace)
                        out.append( ' ' );

                    isSpace = true;
                }
                else
                {
                    out.append( charArray[c] );
                    isSpace = false;
                }
            }
            else
            {
                if (Character.isWhitespace( charArray[c] ))
                {
                    if (!isSpace)
                        out.append( ' ' );

                    isSpace = true;
                }
                else
                {
                    out.append( charArray[c] );
                    isSpace = false;
                }
            }
        }

        // remove trailing space
        if (isSpace && out.length() > 0)
        {
            String justified = out.toString();

            return justified.substring( 0, justified.length() - 1 );
        }

        return out.toString();
    }