Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java ArrayIndexOutOfBoundsException:-1_Java_List_Arraylist_Reinforcement Learning - Fatal编程技术网

Java ArrayIndexOutOfBoundsException:-1

Java ArrayIndexOutOfBoundsException:-1,java,list,arraylist,reinforcement-learning,Java,List,Arraylist,Reinforcement Learning,我在一个pacman机器人上编写一些RL行为,我在我的一个函数arg_allmax或chooseAction中的一个列表中把事情搞砸了 这是我的班级代码: package rl; import java.util.ArrayList; import java.util.Hashtable; public class Qlearn { private double epsilon = 0.1; // Epsilon parameter for the Epsilon Greedy St

我在一个pacman机器人上编写一些RL行为,我在我的一个函数arg_allmaxchooseAction中的一个列表中把事情搞砸了

这是我的班级代码:

package rl;

import java.util.ArrayList;
import java.util.Hashtable;

public class Qlearn {
    private double epsilon = 0.1; // Epsilon parameter for the Epsilon Greedy Strategy 
    private double alpha = 0.2; // Alpha parameter: used to influence o the refresh of Q
    private double gamma = 0.9; // used to notice or not the feedback of the next action ; if =0 -> no feed back

private int actions[];
private Hashtable< Tuple<Integer,Integer>, Double> q; // Q(s,a) : hashTable : <state,action> -> value of q


public Qlearn(int[] actions) {
    this.actions = actions;
    q = new Hashtable< Tuple<Integer,Integer>, Double>();
}

public Qlearn(int[] actions, double epsilon, double alpha, double gamma) {
    this.actions = actions;
    this.epsilon = epsilon;
    this.alpha = alpha;
    this.gamma = gamma;
    q = new Hashtable< Tuple<Integer,Integer>, Double>();
}

public Double getQ(int id_state, int id_action) {
    // get the value of Q for the state of id_state and the action id_action ( return 0 if the value is not in the hashtable ) 
    Tuple<Integer,Integer> t = new Tuple<Integer,Integer> (id_state, id_action); // we creatte a new integer object Tubple with the value of id_state and id_action 
    Double v = q.get(t);
    if(v != null) return v;
    else return 0.0;
}

// get the argmax of a list
public int argmax(double[] list) {
    int arg=-1;
    double max= 0;
    for ( int i = 0; i<list.length; i++){
        if ( list[i]>max ){
            max = list[i];
            arg = i;
        }
    }
    return arg;
}

// get all the argmax if the argmax has several iterations
public ArrayList<Integer> arg_allmax(double[] list) {
    ArrayList<Integer> args = new ArrayList<Integer>();
    int a = argmax(list);
    for ( int i = 0; i< list.length; i++){
        if (list[i] == list[a]){
            args.add(i);
        }
    }
    return args;
}

// get the max of the list
public double max(double[] list) {
    double max_ = -1e20;
    int a = argmax(list);
    max_ = list[a];
    return max_;
}


/*
 * Fonction that updates the hashtable
 *      for the action  id_action and the state  id_state
 *      if Q(s,a) had an old value, we allocate it the new value+ alpha(value - old_value)
 *      if Q(s,a) had not an old value : we allocate reward
 */
public void learnQ(int id_state, int id_action, double reward, double value) {
    Tuple<Integer,Integer> t = new Tuple<Integer,Integer>(id_state,id_action);
    Double oldv = q.get(t);

    if(oldv == null) {

        q.put(t, reward);
    } else {

        q.put(t, oldv+alpha*(value-oldv));
    }
}

/*
 * Here is the Epsilon Greedy strategy
 *      with proba epsilon :we choose a random action
 *      avec proba 1-eps : we choose the most favorable action in fonction of  Q(s,a)
 */
public int chooseAction(int id_state) {
    int action = -1;
    if(Math.random() < epsilon) {

        int i = (int)(Math.random()*actions.length);
        action = actions[i];

    } else { 
        double[] tab = new double[actions.length];
        ArrayList<Integer> argmaxarray = new ArrayList<Integer>();
        for ( int i=0; i>actions.length; i++){
            tab[i]=actions[i];
        }
        argmaxarray=arg_allmax(tab);
        int i=(int)(Math.random()*argmaxarray.size());
        action=argmaxarray.get(i);

    }

    return action;
}


/*
 * Learning after the occurence of a move
 *      1) get the most profitable potential action from  Q(s',a)
 *      2) call learnQ
 */
public void learn(int id_state1, int id_action1, double reward, int id_state2) {
    int futureAction = 0;
    futureAction = chooseAction(id_state2);
    double maxqnew = 0; // REMPLIR  
    maxqnew = getQ(futureAction, id_state2);


    learnQ(id_state1, id_action1, reward, reward + gamma*maxqnew);

}

// Affiche Q(s,a)
private void printQvalue(int id_state) {
    for(int action : actions) {
        Tuple<Integer,Integer> t = new Tuple<Integer,Integer>(id_state,action);
        Double v = q.get(t);
        System.out.print(v+" ");
    }
    System.out.println();
}
我认为它来自于使用all_argmax函数的chooseAction方法的其他地方,但我找不到确切的错误

以下是两种涉及的方法(因此更易于阅读):

所有_argmax:

public ArrayList<Integer> arg_allmax(double[] list) {
    ArrayList<Integer> args = new ArrayList<Integer>();
    int a = argmax(list);
    for ( int i = 0; i< list.length; i++){
        if (list[i] == list[a]){
            args.add(i);
        }
    }
    return args;
}
public ArrayList arg_allmax(双[]列表){
ArrayList args=新的ArrayList();
int a=argmax(列表);
for(int i=0;i
选择操作:

public int chooseAction(int id_state) {
    int action = -1;
    if(Math.random() < epsilon) {

        int i = (int)(Math.random()*actions.length);
        action = actions[i];

    } else { 
        double[] tab = new double[actions.length];
        ArrayList<Integer> argmaxarray = new ArrayList<Integer>();
        for ( int i=0; i>actions.length; i++){
            tab[i]=actions[i];
        }
        argmaxarray=arg_allmax(tab);
        int i=(int)(Math.random()*argmaxarray.size());
        action=argmaxarray.get(i);

    }

    return action;
}
public int-chooseAction(int-id\u-state){
int action=-1;
if(Math.random()actions.length;i++){
表[i]=行动[i];
}
argmaxarray=arg_allmax(制表符);
int i=(int)(Math.random()*argmaxarray.size());
action=argmaxarray.get(i);
}
返回动作;
}

您的
IndexOutOfBoundsException
发生是因为您的
argmax([])
方法,或者是因为空数组,或者是因为列表中的所有双精度都是负数

在这两种情况下,
int arg=-1
变量从未设置为
-1
以外的其他值,这在任何情况下都是明显超出范围的,因为
-1
不是有效的数组位置


最好的做法是在将数组传递给
argmax
之前检查数组是否为空,或者在使用它之前检查返回值是否有效(不是
-1
)。另外,将
double max=0
改为
double max=double.NEGATIVE_∞

如果你能:a)把你所有的评论翻译成英语会有帮助;b) 格式化你的代码;c) 在示例代码中遵循Java命名约定;d) 将问题简化为一个(最小和完整,目前都不是这样)。我现在删除了“关于堆栈溢出的第一篇文章,感谢阅读我的小问题。我的代码面临一个我无法解决的问题”部分两次。这与问题无关——更糟糕的是,因为它是在问题的开头,这就是主要问题页面上显示的内容。好的,Jon Skeet我会按照你的要求去做,如果我的文章结构笨拙的话,对不起。in
arg_allmax()
int a=argmax(list)可能返回-1。您是否检查过
列表
肯定不是空的?您如何创建
QLearn
对象?我所能看到的情况是,
actions
可能是一个空数组(或所有负数),但我们不能确定
Double.MIN\u值
是否会导致意外行为(请参阅),我会在这个用例中使用
Double.NEGATIVE\u INFINITY
。@d.j.brown巧合的是,就在您的评论之前,我读到了相同的答案。很好的反馈tho,我已经更新了我的answer@Gelunox谢谢你,我刚刚改变了你显示的双倍最大值,并添加了一个小allert消息,以防列表为空,现在一切都恢复正常了!
public int chooseAction(int id_state) {
    int action = -1;
    if(Math.random() < epsilon) {

        int i = (int)(Math.random()*actions.length);
        action = actions[i];

    } else { 
        double[] tab = new double[actions.length];
        ArrayList<Integer> argmaxarray = new ArrayList<Integer>();
        for ( int i=0; i>actions.length; i++){
            tab[i]=actions[i];
        }
        argmaxarray=arg_allmax(tab);
        int i=(int)(Math.random()*argmaxarray.size());
        action=argmaxarray.get(i);

    }

    return action;
}