在Scala中将TXT文件作为停止字列表传递_Scala_Stop Words

在Scala中将TXT文件作为停止字列表传递

scala

在Scala中将TXT文件作为停止字列表传递,scala,stop-words,Scala,Stop Words,我正在使用斯坦福主题建模工具箱（TMT），我想准备我的文本数据集。我有一个stopwords的txt文件但是, TermStopListFilter() 它从我的CSV数据集中过滤出停止字，只接受脚本中的列表，例如： TermStopListFilter(List("positively","scrumptious")) 如何导入stopwords.txt文件并将其用作stopwords列表我使用的代码的完整片段： val source = CSVFile("filtered.csv

我正在使用斯坦福主题建模工具箱（TMT），我想准备我的文本数据集。我有一个stopwords的txt文件

但是,

TermStopListFilter()

它从我的CSV数据集中过滤出停止字，只接受脚本中的列表，例如：

TermStopListFilter(List("positively","scrumptious"))

如何导入stopwords.txt文件并将其用作stopwords列表

我使用的代码的完整片段：

val source = CSVFile("filtered.csv"); 

val text = {
  source ~>                              
  Column(1) ~>                           
  TokenizeWith(tokenizer) ~>             
  TermCounter() ~>                       
  TermMinimumDocumentCountFilter(100) ~>   
  TermStopListFilter(TXTFile("stopwords.txt"))  
  TermDynamicStopListFilter(10) ~>       
  DocumentMinimumLengthFilter(5)
}

好吧，如果你的stopwords是“，”分隔的，你可以试试这个：

 . 
 .
      TermStopListFilter(Source("stopwords.txt").getLines().map(_.split(",")).toList) 
 .
 .

如果stopwords.txt中的stopwords由其他字符分隔，请在

split（“，”）

中相应地更改它，并且很可能您应该删除以下行：

TermStopListFilter（List（“肯定”、“完美”））

。我要分析的数据在filtered.csv中，我的stopwords列表在stopwords.txt中。当我需要的stopwords实际在stopwords.txt中时，这段代码不认为我的stopwords列表是filtered.csv吗？aaa我知道了，给我一分钟-我会更改答案