如何在Java中创建包含带引号字符串的字符串?
我想在Weka中加入一个带有选项的字符串。选项字符串内为weka标记器字符串,标记器字符串内为分隔符选项字符串。我收到错误消息“没有为-分隔符选项指定值”。如何格式化字符串 这是我的密码:如何在Java中创建包含带引号字符串的字符串?,java,string,machine-learning,weka,tokenize,Java,String,Machine Learning,Weka,Tokenize,我想在Weka中加入一个带有选项的字符串。选项字符串内为weka标记器字符串,标记器字符串内为分隔符选项字符串。我收到错误消息“没有为-分隔符选项指定值”。如何格式化字符串 这是我的密码: String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector " + "-R first-last -W 1000 -prune-rate
String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector "
+ "-R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer "
+ "-stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer "
+ "\"weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");
StringToWordVector remove = new StringToWordVector();
没有解决我的问题。传递给
splitOptions
的字符串内容是:
weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer -stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer "weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters " \\r\\n\\t.,;:\\\'\\"()?!"
我不确定-tokenizer
的参数应该是什么,但是传递给它的字符串有一个-delimiters
标志,没有任何值,这与您报告的错误是一致的
也许您打算将此传递给-tokenizer
:
"weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters \" \\r\\n\\t.,;:\\\'\\"()?!\""
其中,
-delimiters
的参数是一个字符串。您收到的错误消息说它在-delimeters
选项后找不到任何值。原因是Weka检测到字符串在-delimeter
查询参数之后立即以双引号结束。根本原因是出现在属于-tokenizer
查询参数的weka.core.tokenizers.NGramTokenizer
术语之前的恶意引号:
String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector "
+ "-R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer "
+ "-stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer "
+ "\"weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");
^ rogue quotation mark. Bad.
将字符串更改为以下值,所有操作都应正常:
String[] options =
weka.core.Utils.splitOptions(
"weka.filters.unsupervised.attribute.StringToWordVector "
+ "-R first-last -W 1000 -prune-rate -1.0 -N 0 "
+ "-stemmer weka.core.stemmers.NullStemmer "
+ "-stopwords-handler weka.core.stopwords.Null -M 1 "
+ "-tokenizer weka.core.tokenizers.NGramTokenizer -max 5 -min 1 "
+ "-delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");
可能使用
\
String[] options = weka.core.Utils.splitOptions("\"weka.filters.unsupervised.attribute.StringToWordVector\"" + "\"-R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer\""+ "\"-stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer\""+ "\"weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");
谢谢,@Tim、@salc2和@Scott。我还在努力。下面是从Weka GUI版本复制的字符串,它在那里肯定可以工作:
code
Weka.filters.unsupervised.attribute.StringToWordVector-R first-last-W 1000-prune rate-1.0-n0-stemmer-Weka.core.stemmers.NullStemmer-stopwords处理程序Weka.core.stopwords.Null-m1-tokenizer“Weka.core.tokenizers.word-delimiters\”\\r\\n\\t.;:\\'\\\“()?!\”问题似乎是在标记器标志后面有一个字符串,需要为分隔符标志包含一个字符串。再次感谢大家。使用“我让它工作起来了。”-R first last-W 1000-删减率-1.0-n0-词干分析器weka.core.stemmers.NullStemmer-stopwords处理程序weka.core.stopwords.Null-m1-标记器\“weka.core.tokenizers.WordTokenizer-delimiters\\\”\\\\r\\\\n\\\\t;:\:\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\\\"\""