Mallet可以在Linux下工作,但不能在Windows下工作
好的,我正在尝试使用Mallet对Windows中的一些文档进行分类 我是在Linux上实现的。只是无法让它在Windows(目标环境)中执行任务 我已将数据导入.mallet文件 然后使用这些输入数据创建一个分类器Mallet可以在Linux下工作,但不能在Windows下工作,linux,windows,mallet,Linux,Windows,Mallet,好的,我正在尝试使用Mallet对Windows中的一些文档进行分类 我是在Linux上实现的。只是无法让它在Windows(目标环境)中执行任务 我已将数据导入.mallet文件 然后使用这些输入数据创建一个分类器 -rw-r--r-- 1 henry henry 15197116 Feb 23 15:56 nntp.classifier 及 但是,当我在Linux中运行时: bin/mallet分类目录——输入。/testfolder——输出——分类器 分类器 它迭代testfolder中
-rw-r--r-- 1 henry henry 15197116 Feb 23 15:56 nntp.classifier
及
但是,当我在Linux中运行时:
bin/mallet分类目录——输入。/testfolder——输出——分类器
分类器
它迭代testfolder中的任何文件,并转储它认为每个文件都是什么类
但如果我在Windows中运行相同的命令:
bin\mallet classify-dir --input ./testfolder --output - --classifier nntp.classifier
它只是转储命令列表:
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load a single SVMLight format data file into mallet instances (one per line)
train-classifier train a classifier from Mallet data files
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
estimate-topics estimate the probability of new documents given a trained model
hlda train a topic model using Hierarchical LDA
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
Include --help with any option for more information
我确实注意到了一件事:我
如果我在linux中运行bin/mallet-classify-dir--help
,我会得到帮助文件,即每个命令的描述,但是在Windowsbin\mallet-classify-dir--help中的相同内容不会产生相同的结果-只是上面的命令列表。。。(如果输入junk作为命令,则执行相同的操作)
而前面的一个命令,例如bin/mallet import dir--help
和bin\mallet import dir--help
生成相同的完整帮助文件输出。bin目录中的mallet.bat文件有问题。
您应该在以下位置对其进行修改:
@echo off
rem This batch file serves as a wrapper for several
rem MALLET command line tools.
if not "%MALLET_HOME%" == "" goto gotMalletHome
echo MALLET requires an environment variable MALLET_HOME.
goto :eof
:gotMalletHome
set MALLET_CLASSPATH=%MALLET_HOME%\class;%MALLET_HOME%\lib\mallet-deps.jar
set MALLET_MEMORY=1G
set MALLET_ENCODING=UTF-8
set CMD=%1
shift
set CLASS=
if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
if "%CMD%"=="import-smvlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.Vectors2Topics
if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
if "%CMD%"=="estimate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
if "%CMD%"=="hlda" set CLASS=cc.mallet.topics.tui.HierarchicalLDATUI
if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
if "%CMD%"=="run" set CLASS=%1 & shift
if not "%CLASS%" == "" goto gotClass
echo Mallet 2.0 commands:
echo import-dir load the contents of a directory into mallet instances (one per file)
echo import-file load a single file into mallet instances (one per line)
echo import-svmlight load a single SVMLight format data file into mallet instances (one per line)
echo train-classifier train a classifier from Mallet data files
echo classify-dir classify the contents of a directory with a saved classifier
echo classify-file classify a file with a saved classifier
echo train-topics train a topic model from Mallet data files
echo infer-topics use a trained topic model to infer topics for new documents
echo estimate-topics estimate the probability of new documents given a trained model
echo hlda train a topic model using Hierarchical LDA
echo prune remove features based on frequency or information gain
echo split divide data into testing, training, and validation portions
echo Include --help with any option for more information
goto :eof
:gotClass
set MALLET_ARGS=
:getArg
if "%1"=="" goto run
set MALLET_ARGS=%MALLET_ARGS% %1
shift
goto getArg
:run
java -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
:eof
用于在Windows环境中进行分类
我希望这能有所帮助
Ignazio请注意,Ignazio提供的.bat文件第23行有一个打字错误(不幸的是,该文件包含在mallet-2.0.7下载中),导致它查找“import smvlight”而不是“import svmlight”,这是帮助信息中指定的。如果要使用此功能,请确保切换“m”和“v”。。效果很好。你报名来告诉我这件事!我已经安装了Cygwin,并编制了一个完整的列表,其中包括Ant、Bant、Chant和Rant等名称。这将使下一个Windows guy tho.的工作更加轻松。/bin中的其他脚本仍然只是linux。
@echo off
rem This batch file serves as a wrapper for several
rem MALLET command line tools.
if not "%MALLET_HOME%" == "" goto gotMalletHome
echo MALLET requires an environment variable MALLET_HOME.
goto :eof
:gotMalletHome
set MALLET_CLASSPATH=%MALLET_HOME%\class;%MALLET_HOME%\lib\mallet-deps.jar
set MALLET_MEMORY=1G
set MALLET_ENCODING=UTF-8
set CMD=%1
shift
set CLASS=
if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
if "%CMD%"=="import-smvlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.Vectors2Topics
if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
if "%CMD%"=="estimate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
if "%CMD%"=="hlda" set CLASS=cc.mallet.topics.tui.HierarchicalLDATUI
if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
if "%CMD%"=="run" set CLASS=%1 & shift
if not "%CLASS%" == "" goto gotClass
echo Mallet 2.0 commands:
echo import-dir load the contents of a directory into mallet instances (one per file)
echo import-file load a single file into mallet instances (one per line)
echo import-svmlight load a single SVMLight format data file into mallet instances (one per line)
echo train-classifier train a classifier from Mallet data files
echo classify-dir classify the contents of a directory with a saved classifier
echo classify-file classify a file with a saved classifier
echo train-topics train a topic model from Mallet data files
echo infer-topics use a trained topic model to infer topics for new documents
echo estimate-topics estimate the probability of new documents given a trained model
echo hlda train a topic model using Hierarchical LDA
echo prune remove features based on frequency or information gain
echo split divide data into testing, training, and validation portions
echo Include --help with any option for more information
goto :eof
:gotClass
set MALLET_ARGS=
:getArg
if "%1"=="" goto run
set MALLET_ARGS=%MALLET_ARGS% %1
shift
goto getArg
:run
java -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
:eof