String 通过weka机器学习传递双引号_String_Bash_Escaping_Weka_Double Quotes

String 通过weka机器学习传递双引号

string bash

String 通过weka机器学习传递双引号,string,bash,escaping,weka,double-quotes,String,Bash,Escaping,Weka,Double Quotes,我正在使用Weka的CLI，即Primer，我尝试了许多不同的组合，传递了几个参数，但都没有成功。当我经过像这样的东西时： weka_options=("weka.classifiers.functions.SMOreg -C 1.0 -N 0") weka_options=("weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I \"weka.classifiers.functions.supportVector.RegSMOImproved -L

我正在使用Weka的

CLI

，即

Primer

，我尝试了许多不同的组合，传递了几个参数，但都没有成功。当我经过像这样的东西时：

weka_options=("weka.classifiers.functions.SMOreg -C 1.0 -N 0")

weka_options=("weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I \"weka.classifiers.functions.supportVector.RegSMOImproved -L 1.0e-3 -W 1 -P 1.0e-12 -T 0.001 -V\" -K \"weka.classifiers.functions.supportVector.NormalizedPolyKernel -C 250007 -E 8.0\"")

程序运行时没有问题，但传递如下内容：

weka_options=("weka.classifiers.functions.SMOreg -C 1.0 -N 0")

weka_options=("weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I \"weka.classifiers.functions.supportVector.RegSMOImproved -L 1.0e-3 -W 1 -P 1.0e-12 -T 0.001 -V\" -K \"weka.classifiers.functions.supportVector.NormalizedPolyKernel -C 250007 -E 8.0\"")

使用/不使用转义字符，甚至使用单引号`，会在bash脚本中导致错误：

bash ./weka.sh "$sub_working_dir" $train_percentage "$weka_options" $files_string > $predictions

其中

weka.sh

包含：

java -Xmx1024m -classpath ".:$WEKAPATH" $weka_options -t "$train_set" -T "$test_set" -c 53 -p 53

以下是我得到的：

---Registering Weka Editors---
Trying to add database driver (JDBC): jdbc.idbDriver - Error, not in CLASSPATH?

Weka exception: Can't open file No suitable converter found for '0.001'!.

有人能指出这个问题吗

更新问题：以下是代码：

    # Usage:
# 
# ./aca2_explore.sh working-dir datasets/*
# e.g.
# ./aca2_explore.sh "aca2-explore-working-dir/" datasets/*
#
# Place this script in the same folder as aca2.sh and the folder containing the datasets. 
# 
#
# Please note that:
#   - All the notes contained in aca2.sh apply
#   - This script will erase the contents of working-dir

# to properly sort negative floating numbers, independently of local language options

export LC_ALL=C

# parameters parsing

output_directory=$1

first_file_index=2
files=${@:$first_file_index}

# global constants

datasets=$(($# - 1))
output_row=$(($datasets + 3))
output_columns_range="2-7"
learned_model_mae_column=4
results_learned_model_mae_column=4

# parameters

working_dir="$output_directory"
if [ -d "$working_dir" ];
then
    rm -r "$working_dir"
fi
mkdir "$working_dir"

sub_working_dir="$working_dir""aca2-explore-sub-working-dir/"
path_to_results_file="$sub_working_dir""results.csv"

train_percentage=25

logfile="$working_dir""aca2_explore_log.csv"
echo "" > "$logfile"

reduced_log_header="Options,average_test_set_speedup,null_model_mae,learned_model_mae,learned_model_rmse,mae_ratio,R^2"

reduced_logfile="$working_dir""aca2_explore_reduced_log.csv"
echo "$reduced_log_header" > "$reduced_logfile"

sorted_reduced_logfile="$working_dir""aca2_explore_sorted_reduced_log.csv"

weka_options_list=(
"weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8"
"weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 100 -V 0 -S 0 -E 20 -H a"
"weka.classifiers.meta.AdditiveRegression -S 1.0 -I 10 -W weka.classifiers.trees.DecisionStump"
"weka.classifiers.meta.Bagging -P 100 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.REPTree -- -M 2 -V 0.001 -N 3 -S 1 -L -1 -I 0.0"
"weka.classifiers.meta.CVParameterSelection -X 10 -S 1 -W weka.classifiers.rules.ZeroR"
"weka.classifiers.meta.MultiScheme -X 0 -S 1 -B \"weka.classifiers.rules.ZeroR \""
"weka.classifiers.meta.RandomCommittee -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.RandomTree -- -K 0 -M 1.0 -V 0.001 -S 1"
"weka.classifiers.meta.RandomizableFilteredClassifier -S 1 -F \"weka.filters.unsupervised.attribute.RandomProjection -N 10 -R 42 -D Sparse1\" -W weka.classifiers.lazy.IBk -- -K 1 -W 0 -A \"weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\"\""
"weka.classifiers.meta.RandomSubSpace -P 0.5 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.REPTree -- -M 2 -V 0.001 -N 3 -S 1 -L -1 -I 0.0"
"weka.classifiers.meta.RegressionByDiscretization -B 10 -K weka.estimators.UnivariateEqualFrequencyHistogramEstimator -W weka.classifiers.trees.J48 -- -C 0.25 -M 2"
"weka.classifiers.meta.Stacking -X 10 -M \"weka.classifiers.rules.ZeroR \" -S 1 -num-slots 1 -B \"weka.classifiers.rules.ZeroR \""
"weka.classifiers.meta.Vote -S 1 -B \"weka.classifiers.rules.ZeroR \" -R AVG"
"weka.classifiers.rules.DecisionTable -X 1 -S \"weka.attributeSelection.BestFirst -D 1 -N 5\""
"weka.classifiers.rules.M5Rules -M 4.0"
"weka.classifiers.rules.ZeroR"
"weka.classifiers.trees.DecisionStump"
"weka.classifiers.trees.M5P -M 4.0"
"weka.classifiers.trees.RandomForest -I 100 -K 0 -S 1 -num-slots 1"
"weka.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1"
"weka.classifiers.trees.REPTree -M 2 -V 0.001 -N 3 -S 1 -L -1 -I 0.0")

files_string=""

for file in ${files[@]}
do
    files_string="$files_string""$file"" "
done

#echo $files_string

for weka_options in "${weka_options_list[@]}"
do
        echo "$weka_options"
        echo "$weka_options" >> "$logfile"
        bash ./aca2.sh "$sub_working_dir" $train_percentage "$weka_options" $files_string
        cat "$path_to_results_file" >> "$logfile"

        result_columns=$(tail -n +"$output_row" "$path_to_results_file"  | head -1 | cut -d, -f"$output_columns_range")
        echo "$weka_options"",""$result_columns" >> "$reduced_logfile"

        echo "" >> "$logfile"
done

tail -n +2 "$reduced_logfile" > "$sorted_reduced_logfile"
sort --field-separator=',' --key="$results_learned_model_mae_column" "$sorted_reduced_logfile" -o "$sorted_reduced_logfile"".tmp"
echo "$reduced_log_header" > "$sorted_reduced_logfile"
cat "$sorted_reduced_logfile"".tmp" >> "$sorted_reduced_logfile"
rm "$sorted_reduced_logfile"".tmp"

其中文件

aca2.sh

为：

#!/bin/bash

# Run this script as ./script.sh working-directory train-set-filter-percentage "weka_options" datasets/*
#
# e.g.
# Place this script in a folder together with a directory containing your datasets. Call then the script as 
# ./aca2.sh "aca2-working-dir/" 25 "weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8" datasets_folder/*
#
# NOTE: the script will erase the content of working-directory
#       for correct behaviour $WEKAHOME environment variable must be set to the folder containing weka.jar, otherwise modify the call to the weka classifier below
#
# To define the error measures used in this script, I made use of some of the notions found in this article:
# http://scott.fortmann-roe.com/docs/MeasuringError.html


# parameters parsing

output_directory=$1
train_set_percentage=$2

if [ $train_set_percentage -lt 1 ] || [ $train_set_percentage -gt 100 ];
then
    echo "Invalid train set percentage: "$train_set_percentage
    exit 1
fi


weka_options=$3

first_file_index=4
files=${@:$first_file_index}

# global constants

predictions_characters_range_value="23-28"
predictions_characters_range_error="34-39"

tmp_dir="$output_directory"
if [ -d "$tmp_dir" ];
then
    rm -r "$tmp_dir"
fi
mkdir "$tmp_dir"

results_header="testfile,average_test_set_speedup,null_model_mae,learned_model_mae,learned_model_rmse,mae_ratio,R^2"
results_file=$tmp_dir"results.csv"

echo "$results_header" > "$results_file"

arff_header="% ARFF conversion of CSV dataset

@RELATION program

@ATTRIBUTE ...

@DATA"

# global constants

datasets_per_program=5
entries_per_dataset=128

train_set_instances_to_select=$((datasets_per_program*entries_per_dataset*train_set_percentage/100))

all_prediction="$tmp_dir""all_predictions.txt"

count=0
prediction_efficiency_ideal_avg=0

arff_header_file="$tmp_dir""arff_header.txt"
echo "$arff_header" > "$arff_header_file"



count=0

for filename in ${files[@]}
do
    echo "Test set: $filename"

    echo "$filename" >> "$all_prediction"

    cur_dir="$tmp_dir$filename.dir/"
    mkdir -p $cur_dir

    testfile=$filename

    train_set="$cur_dir""train_set.arff"

    echo "$arff_header" > $train_set

    selected_train_subset="$cur_dir""selected_train_subset.csv"

    for trainfile in ${files[@]}
    do
        if [ "$trainfile" != "$testfile" ]; then
#           filter train set to feed only top 25% for model generation
            sort --field-separator=',' --key=53 "$trainfile" -o "$selected_train_subset"
            head -$train_set_instances_to_select "$selected_train_subset" >> $train_set
        fi
    done

    test_set="$cur_dir""test_set.arff"

    #echo "$arff_header" > $test_set
    cp "$testfile" "$test_set"


    # This file will contain the full configuration space dataset relative to the test program
    complete_test_set="$cur_dir""complete_test_set.csv"
    cp "$test_set" "$complete_test_set"

    sort --field-separator=',' --key=53 "$test_set" -o "$test_set"
    head -8 "$test_set" > "$test_set"".tmp"
    mv "$test_set"".tmp" "$test_set"

    cur_prediction="$cur_dir""cur_prediction.tmp"

    # generate basis for predicted test set file by copying the actual test set, removing speedups
    predicted_test_set="$cur_dir""predicted_test_set.csv"
    cp "$test_set" "$predicted_test_set"
    cut -d, -f53 --complement "$predicted_test_set" > "$predicted_test_set"".tmp"
    mv "$predicted_test_set"".tmp" "$predicted_test_set"

    cat "$arff_header_file" "$test_set" > "$test_set"".tmp"
    mv "$test_set"".tmp" "$test_set"    

    java -Xmx1024m -classpath ".:$WEKAHOME/weka.jar:$WEKAJARS/*" $weka_options -t "$train_set" -T "$test_set" -c 53 -p 53 | tail -n +6 | head -8  > "$cur_prediction"

    predictions_file="$cur_dir""predictions.csv"
    cut -c"$predictions_characters_range_value" "$cur_prediction" | tr -d " " > "$predictions_file"

    paste -d',' "$actual_speedups" "$predictions_file" > "$predictions_file"".tmp"
    mv "$predictions_file"".tmp" "$predictions_file"
done

你几乎有这个权利。你试图做正确的事情（或者只是不小心走近了）

不能将字符串用于任意引用的参数（这是错误的）

您需要改用数组。但是您需要一个数组，每个参数有一个单独的元素。不仅仅是一个论点

weka_options=(weka.classifiers.functions.SMOreg -C 1.0 -N 0)

或

（我假设字符串

weka.classifiers.functions.supportVector.regsmoomproved-l1.0e-3-w1-p1.0e-12-t0.001-V

是

-I

标志的参数，字符串

weka.classifiers.functions.supportVector.NormalizedPolyKernel-c250007-e8.0

是

-K

标志的参数。如果不是case则这些引用可能也要删除。）

然后，当您使用数组时，您需要使用

“${weka_options[@]}”

将数组的元素作为单独的引用词

java -Xmx1024m -classpath ".:$WEKAPATH" "${weka_options[@]}" -t "$train_set" -T "$test_set" -c 53 -p 53

感谢@Etan的回答，如果我的

weka_options

是我想要传递的不止一个选项呢？那些不是不止一个选项吗？整个字符串是java模块的一个长选项吗？如果整个字符串真的需要作为一个长参数传递给java，那么您所拥有的可能就是您想要的，并且您可能需要当你使用它时，必须在脚本中引用

“$weka_options”

。很抱歉让你困惑，我的意思是，如果我想在这种情况下探索7-10个不同的机器学习选项：

weka_options_list=（“weka.classifiers.functions.LinearRegression-S 0-R 1.0E-8”“weka.classifiers.functions.MultilayerPerceptron-L0.3-M0.2-N100-V0-S0-E20-HA”“weka.classifiers.meta.AdditiveRegression-S1.0-I10-Wweka.classifiers.trees.DecisionStump”“weka.classifiers.meta.Bagging-P100-S1-num插槽1-I10-Wweka.classifiers.trees.REPTree---M2-V0.001-N3-S1-L-I0.0”weka.classifiers.meta.CVParameterSelection-x10-s1-W）

顺便说一句，我测试了您提到的答案，但它不起作用，它只接受第一个参数前的第一个字符串，并将其传递给java如果您手动编写java命令（或原始命令），它会是什么样子？