在python中使用多维密钥对存储值_Python_Arrays_Numpy

在python中使用多维密钥对存储值

python arrays numpy

在python中使用多维密钥对存储值,python,arrays,numpy,Python,Arrays,Numpy,是否有方法使用多维密钥对存储值（例如，在numpy数组中）下面的代码尝试将带有两个numpy数组的奖励值存储为具有形状（1,25）和（1,3）的密钥对非常感谢 num_episodes=500 # this is the table that will hold our summated rewards for # each action in each state r_table = np.zeros((10000, 10000)) for g in range(num_episodes

是否有方法使用多维密钥对存储值（例如，在numpy数组中）

下面的代码尝试将带有两个numpy数组的奖励值存储为具有形状（1,25）和（1,3）的密钥对

非常感谢

num_episodes=500

# this is the table that will hold our summated rewards for
# each action in each state
r_table = np.zeros((10000, 10000))
for g in range(num_episodes):
    s = np.array(state.sample(), dtype=np.int)
    done = False
    count = 0
    while not done:
        if np.sum(r_table[s, :]) == 0:
            # make a random selection of actions
            EUR_elec_sell = 0.050
            EUR_elec_buy = 0.100
            EUR_gas = 0.030
            rranges = ((0, 1250),(0, 2000),(0, 3000))
            res0 = brute(reward, rranges, finish=None)
            res1 = minimize(reward, res0, bounds=[(0, 1250),(0, 2000),(0, 3000)])
            a = res1.x
            a = list(map(int, a.round(decimals=-1)))
        else:
            # select the action with highest cummulative reward
            a = np.argmax(r_table[s, :])
        s_t1 = model.predict([np.append(s, a)]).astype(int)
        new_s = np.append(s_t1, np.delete(s, 1))
        r = reward(a)
        count += 1
        if count == 1000: done=True
        r_table[s, a] += r
        s = new_s

您可以使用像

tuple（s[0]）+tuple（a）

这样的键，但实际上您需要的更复杂，因为您需要查询给定

向量的所有值。您可以将

table\u r

作为

dict

s的

dict

，其中

tuple（s[0]）

是第一个键，而

tuple（a）

是第二个键：

num_episodes=500

# this is the table that will hold our summated rewards for
# each action in each state
r_table = {}
for g in range(num_episodes):
    s = np.array(state.sample(), dtype=np.int)
    done = False
    count = 0
    while not done:
        s_key = tuple(s[0])
        if sum(r_table.setdefault(s_key, {}).values()) == 0:
            # make a random selection of actions
            EUR_elec_sell = 0.050
            EUR_elec_buy = 0.100
            EUR_gas = 0.030
            rranges = ((0, 1250),(0, 2000),(0, 3000))
            res0 = brute(reward, rranges, finish=None)
            res1 = minimize(reward, res0, bounds=[(0, 1250),(0, 2000),(0, 3000)])
            a = res1.x
            a = list(map(int, a.round(decimals=-1)))
        else:
            # select the action with highest cummulative reward
            a = max(r_table[s_key].items(), key=lambda it: -it[1])[0]
        s_t1 = model.predict([np.append(s, a)]).astype(int)
        new_s = np.append(s_t1, np.delete(s, 1))
        r = reward(a)
        count += 1
        if count == 1000: done=True
        a_key = tuple(a)
        r_table[s_key][a_key] = r_table[s_key].get(a_key, 0) + r
        s = new_s

谢谢你的快速回复！这会在第一次计数中触发“ValueError:max（）arg是一个空序列”（因此不需要经过循环）