Java 标记字符串但忽略引号中的分隔符
我希望有以下字符串Java 标记字符串但忽略引号中的分隔符,java,Java,我希望有以下字符串 !cmd 45 90 "An argument" Another AndAnother "Another one in quotes" 成为以下内容的数组 { "!cmd", "45", "90", "An argument", "Another", "AndAnother", "Another one in quotes" } 我试过了 new StringTokenizer(cmd, "\"") 但这会将“另一个”和“另一个”返回为“另一个”而不是期望的效果 谢谢
!cmd 45 90 "An argument" Another AndAnother "Another one in quotes"
成为以下内容的数组
{ "!cmd", "45", "90", "An argument", "Another", "AndAnother", "Another one in quotes" }
我试过了
new StringTokenizer(cmd, "\"")
但这会将“另一个”和“另一个”返回为“另一个”而不是期望的效果
谢谢
编辑:
我再次更改了示例,这次我相信它最好地解释了这种情况,尽管它与第二个示例没有什么不同。这里的示例只需按双引号字符分割。尝试以下操作:
String str = "One two \"three four\" five \"six seven eight\" nine \"ten\"";
String strArr[] = str.split("\"|\s");
这有点棘手,因为需要转义双引号。此正则表达式应使用空格(\s)或双引号标记字符串
您应该使用String的split
方法,因为它接受正则表达式,而StringTokenizer
中delimiter的构造函数参数不接受。在我上面提供的内容的末尾,您可以添加以下内容:
String s;
for(String k : strArr) {
s += k;
}
StringTokenizer strTok = new StringTokenizer(s);
试试这个:
String str = "One two \"three four\" five \"six seven eight\" nine \"ten\"";
String[] strings = str.split("[ ]?\"[ ]?");
我不知道您试图做什么的上下文,但它看起来像是您试图解析命令行参数。一般来说,这对于所有转义问题来说都是相当棘手的;如果这是您的目标,我个人会考虑类似JCommander的内容。用老式的方式来做。制作一个函数,该函数会查看for循环中的每个字符。如果角色是一个空格,则将所有内容都保留到该空格(不包括空格)并将其作为条目添加到数组中。注意位置,然后再次执行相同操作,将下一部分添加到数组中的空格后。遇到双引号时,将名为“inQuote”的布尔值标记为true,并在inQuote为true时忽略空格。当inQuote为true时单击引号时,将其标记为false,然后在遇到空格。然后您可以根据需要扩展它以支持转义符等
这可以用正则表达式来完成吗?我不知道,我想。但是整个函数的编写时间比这个回复要少。用一种老式的方式:
public static String[] split(String str) {
str += " "; // To detect last token when not quoted...
ArrayList<String> strings = new ArrayList<String>();
boolean inQuote = false;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
if (c == '"' || c == ' ' && !inQuote) {
if (c == '"')
inQuote = !inQuote;
if (!inQuote && sb.length() > 0) {
strings.add(sb.toString());
sb.delete(0, sb.length());
}
} else
sb.append(c);
}
return strings.toArray(new String[strings.size()]);
}
公共静态字符串[]拆分(字符串str){
str+=“”;//要在未引用时检测最后一个标记。。。
ArrayList字符串=新的ArrayList();
布尔inQuote=false;
StringBuilder sb=新的StringBuilder();
对于(int i=0;i0){
添加(sb.toString());
sb.删除(0,sb.length());
}
}否则
sb.附加(c);
}
返回strings.toArray(新字符串[strings.size()]);
}
我假设嵌套引号是非法的,而且空标记可以省略。在这种情况下,使用和执行find()
比任何类型的split
要容易得多
也就是说,您不是为令牌之间的分隔符定义模式,而是为令牌本身定义模式
下面是一个例子:
String text = "1 2 \"333 4\" 55 6 \"77\" 8 999";
// 1 2 "333 4" 55 6 "77" 8 999
String regex = "\"([^\"]*)\"|(\\S+)";
Matcher m = Pattern.compile(regex).matcher(text);
while (m.find()) {
if (m.group(1) != null) {
System.out.println("Quoted [" + m.group(1) + "]");
} else {
System.out.println("Plain [" + m.group(2) + "]");
}
}
以上打印内容():
这种模式基本上是:
"([^"]*)"|(\S+)
\_____/ \___/
1 2
有两种备选方案:
- 第一个备选方案匹配开始的双引号,一个除了双引号(在组1中捕获)以外的任何序列,然后匹配结束的双引号
- 第二个备选字符匹配组2中捕获的任何非空白字符序列
- 在这种模式中,交替事件的顺序
Matcher
解决方案仍然有效
工具书类
- ,
- -对于带转义引号的模式
附录 请注意,这是一个遗留类。建议使用或,当然,为了获得最大的灵活性 相关问题
- -有很多例子
- 这是一个老问题,但这是我作为有限状态机的解决方案
高效、可预测且无花招
100%的测试覆盖率
拖放到代码中
/**
* Splits a command on whitespaces. Preserves whitespace in quotes. Trims excess whitespace between chunks. Supports quote
* escape within quotes. Failed escape will preserve escape char.
*
* @return List of split commands
*/
static List<String> splitCommand(String inputString) {
List<String> matchList = new LinkedList<>();
LinkedList<Character> charList = inputString.chars()
.mapToObj(i -> (char) i)
.collect(Collectors.toCollection(LinkedList::new));
// Finite-State Automaton for parsing.
CommandSplitterState state = CommandSplitterState.BeginningChunk;
LinkedList<Character> chunkBuffer = new LinkedList<>();
for (Character currentChar : charList) {
switch (state) {
case BeginningChunk:
switch (currentChar) {
case '"':
state = CommandSplitterState.ParsingQuote;
break;
case ' ':
break;
default:
state = CommandSplitterState.ParsingWord;
chunkBuffer.add(currentChar);
}
break;
case ParsingWord:
switch (currentChar) {
case ' ':
state = CommandSplitterState.BeginningChunk;
String newWord = chunkBuffer.stream().map(Object::toString).collect(Collectors.joining());
matchList.add(newWord);
chunkBuffer = new LinkedList<>();
break;
default:
chunkBuffer.add(currentChar);
}
break;
case ParsingQuote:
switch (currentChar) {
case '"':
state = CommandSplitterState.BeginningChunk;
String newWord = chunkBuffer.stream().map(Object::toString).collect(Collectors.joining());
matchList.add(newWord);
chunkBuffer = new LinkedList<>();
break;
case '\\':
state = CommandSplitterState.EscapeChar;
break;
default:
chunkBuffer.add(currentChar);
}
break;
case EscapeChar:
switch (currentChar) {
case '"': // Intentional fall through
case '\\':
state = CommandSplitterState.ParsingQuote;
chunkBuffer.add(currentChar);
break;
default:
state = CommandSplitterState.ParsingQuote;
chunkBuffer.add('\\');
chunkBuffer.add(currentChar);
}
}
}
if (state != CommandSplitterState.BeginningChunk) {
String newWord = chunkBuffer.stream().map(Object::toString).collect(Collectors.joining());
matchList.add(newWord);
}
return matchList;
}
private enum CommandSplitterState {
BeginningChunk, ParsingWord, ParsingQuote, EscapeChar
}
/**
*在空格上拆分命令。保留引号中的空格。在块之间修剪多余的空格。支持引号
*在引号内转义。失败的转义将保留转义字符。
*
*@返回拆分命令列表
*/
静态列表拆分命令(字符串输入字符串){
列表匹配列表=新的LinkedList();
LinkedList charList=inputString.chars()
.mapToObj(i->(char)i)
.collect(Collectors.toCollection(LinkedList::new));
//用于解析的有限状态自动机。
CommandSplitterState=CommandSplitterState.BeginingChunk;
LinkedList chunkBuffer=新建LinkedList();
用于(字符currentChar:charList){
开关(状态){
案例开始语块:
开关(currentChar){
案例'':
state=CommandSplitterState.ParsingQuote;
打破
案例“”:
打破
违约:
state=CommandSplitterState.ParsingWord;
chunkBuffer.add(currentChar);
}
打破
大小写分隔词:
开关(currentChar){
案例“”:
state=CommandSplitterState.beginingchunk;
String newWord=chunkBuffer.stream().map(Object::toString.collect(Collectors.joining());
matchList.add(newWord);
chunkBuffer=新的LinkedList();
打破
违约:
chunkBuffer.add(currentChar);
}
打破
案例分析引述:
开关(currentChar){
/**
* Splits a command on whitespaces. Preserves whitespace in quotes. Trims excess whitespace between chunks. Supports quote
* escape within quotes. Failed escape will preserve escape char.
*
* @return List of split commands
*/
static List<String> splitCommand(String inputString) {
List<String> matchList = new LinkedList<>();
LinkedList<Character> charList = inputString.chars()
.mapToObj(i -> (char) i)
.collect(Collectors.toCollection(LinkedList::new));
// Finite-State Automaton for parsing.
CommandSplitterState state = CommandSplitterState.BeginningChunk;
LinkedList<Character> chunkBuffer = new LinkedList<>();
for (Character currentChar : charList) {
switch (state) {
case BeginningChunk:
switch (currentChar) {
case '"':
state = CommandSplitterState.ParsingQuote;
break;
case ' ':
break;
default:
state = CommandSplitterState.ParsingWord;
chunkBuffer.add(currentChar);
}
break;
case ParsingWord:
switch (currentChar) {
case ' ':
state = CommandSplitterState.BeginningChunk;
String newWord = chunkBuffer.stream().map(Object::toString).collect(Collectors.joining());
matchList.add(newWord);
chunkBuffer = new LinkedList<>();
break;
default:
chunkBuffer.add(currentChar);
}
break;
case ParsingQuote:
switch (currentChar) {
case '"':
state = CommandSplitterState.BeginningChunk;
String newWord = chunkBuffer.stream().map(Object::toString).collect(Collectors.joining());
matchList.add(newWord);
chunkBuffer = new LinkedList<>();
break;
case '\\':
state = CommandSplitterState.EscapeChar;
break;
default:
chunkBuffer.add(currentChar);
}
break;
case EscapeChar:
switch (currentChar) {
case '"': // Intentional fall through
case '\\':
state = CommandSplitterState.ParsingQuote;
chunkBuffer.add(currentChar);
break;
default:
state = CommandSplitterState.ParsingQuote;
chunkBuffer.add('\\');
chunkBuffer.add(currentChar);
}
}
}
if (state != CommandSplitterState.BeginningChunk) {
String newWord = chunkBuffer.stream().map(Object::toString).collect(Collectors.joining());
matchList.add(newWord);
}
return matchList;
}
private enum CommandSplitterState {
BeginningChunk, ParsingWord, ParsingQuote, EscapeChar
}
import org.apache.commons.text.StringTokenizer
import org.apache.commons.text.matcher.StringMatcher
import org.apache.commons.text.matcher.StringMatcherFactory
@Grab(group='org.apache.commons', module='commons-text', version='1.3')
def str = /is this 'completely "impossible"' or """slightly"" impossible" to parse?/
StringTokenizer st = new StringTokenizer( str )
StringMatcher sm = StringMatcherFactory.INSTANCE.quoteMatcher()
st.setQuoteMatcher( sm )
println st.tokenList
private static final AbstractStringMatcher.CharSetMatcher QUOTE_MATCHER = new AbstractStringMatcher.CharSetMatcher(
"'\"".toCharArray());
public StringTokenizer setQuoteMatcher(final StringMatcher quote) {
if (quote != null) {
this.quoteMatcher = quote;
}
return this;
}
private int readWithQuotes(final char[] srcChars ...
// If we've found a quote character, see if it's followed by a second quote. If so, then we need to actually put the quote character into the token rather than end the token.
public static void main(String[] args) {
String text = "One two \"three four\" five \"six seven eight\" nine \"ten\"";
String[] splits = text.split(" ");
List<String> list = new ArrayList<>();
String token = null;
for(String s : splits) {
if(s.startsWith("\"") ) {
token = "" + s;
} else if (s.endsWith("\"")) {
token = token + " "+ s;
list.add(token);
token = null;
} else {
if (token != null) {
token = token + " " + s;
} else {
list.add(s);
}
}
}
System.out.println(list);
}
/opt/jboss-eap/bin/jboss-cli.sh
--connect
--controller=localhost:9990
-c
command="deploy /app/jboss-eap-7.1/standalone/updates/sample.war --force"
private static void findWords(String str) {
boolean flag = false;
StringBuilder sb = new StringBuilder();
for(int i=0;i<str.length();i++) {
if(str.charAt(i)!=' ' && str.charAt(i)!='"') {
sb.append(str.charAt(i));
}
else {
System.out.println(sb.toString());
sb = new StringBuilder();
if(str.charAt(i)==' ' && !flag)
continue;
else if(str.charAt(i)=='"') {
if(!flag) {
flag=true;
}
i++;
while(i<str.length() && str.charAt(i)!='"') {
sb.append(str.charAt(i));
i++;
}
flag=false;
System.out.println(sb.toString());
sb = new StringBuilder();
}
}
}
}
public final class StringUtilities {
private static final List<Character> WORD_DELIMITERS = Arrays.asList(' ', '\t');
private static final List<Character> QUOTE_CHARACTERS = Arrays.asList('"', '\'');
private static final char ESCAPE_CHARACTER = '\\';
private StringUtilities() {
}
public static String[] splitWords(String string) {
StringBuilder wordBuilder = new StringBuilder();
List<String> words = new ArrayList<>();
char quote = 0;
for (int i = 0; i < string.length(); i++) {
char c = string.charAt(i);
if (c == ESCAPE_CHARACTER && i + 1 < string.length()) {
wordBuilder.append(string.charAt(++i));
} else if (WORD_DELIMITERS.contains(c) && quote == 0) {
words.add(wordBuilder.toString());
wordBuilder.setLength(0);
} else if (quote == 0 && QUOTE_CHARACTERS.contains(c)) {
quote = c;
} else if (quote == c) {
quote = 0;
} else {
wordBuilder.append(c);
}
}
if (wordBuilder.length() > 0) {
words.add(wordBuilder.toString());
}
return words.toArray(new String[0]);
}
}