在python中使用多维密钥对存储值

在python中使用多维密钥对存储值,python,arrays,numpy,Python,Arrays,Numpy,是否有方法使用多维密钥对存储值(例如,在numpy数组中) 下面的代码尝试将带有两个numpy数组的奖励值存储为具有形状(1,25)和(1,3)的密钥对 非常感谢 num_episodes=500 # this is the table that will hold our summated rewards for # each action in each state r_table = np.zeros((10000, 10000)) for g in range(num_episodes

是否有方法使用多维密钥对存储值(例如,在numpy数组中)

下面的代码尝试将带有两个numpy数组的奖励值存储为具有形状(1,25)和(1,3)的密钥对

非常感谢

num_episodes=500

# this is the table that will hold our summated rewards for
# each action in each state
r_table = np.zeros((10000, 10000))
for g in range(num_episodes):
    s = np.array(state.sample(), dtype=np.int)
    done = False
    count = 0
    while not done:
        if np.sum(r_table[s, :]) == 0:
            # make a random selection of actions
            EUR_elec_sell = 0.050
            EUR_elec_buy = 0.100
            EUR_gas = 0.030
            rranges = ((0, 1250),(0, 2000),(0, 3000))
            res0 = brute(reward, rranges, finish=None)
            res1 = minimize(reward, res0, bounds=[(0, 1250),(0, 2000),(0, 3000)])
            a = res1.x
            a = list(map(int, a.round(decimals=-1)))
        else:
            # select the action with highest cummulative reward
            a = np.argmax(r_table[s, :])
        s_t1 = model.predict([np.append(s, a)]).astype(int)
        new_s = np.append(s_t1, np.delete(s, 1))
        r = reward(a)
        count += 1
        if count == 1000: done=True
        r_table[s, a] += r
        s = new_s

您可以使用像
tuple(s[0])+tuple(a)
这样的键,但实际上您需要的更复杂,因为您需要查询给定
s
向量的所有值。您可以将
table\u r
作为
dict
s的
dict
,其中
tuple(s[0])
是第一个键,而
tuple(a)
是第二个键:

num_episodes=500

# this is the table that will hold our summated rewards for
# each action in each state
r_table = {}
for g in range(num_episodes):
    s = np.array(state.sample(), dtype=np.int)
    done = False
    count = 0
    while not done:
        s_key = tuple(s[0])
        if sum(r_table.setdefault(s_key, {}).values()) == 0:
            # make a random selection of actions
            EUR_elec_sell = 0.050
            EUR_elec_buy = 0.100
            EUR_gas = 0.030
            rranges = ((0, 1250),(0, 2000),(0, 3000))
            res0 = brute(reward, rranges, finish=None)
            res1 = minimize(reward, res0, bounds=[(0, 1250),(0, 2000),(0, 3000)])
            a = res1.x
            a = list(map(int, a.round(decimals=-1)))
        else:
            # select the action with highest cummulative reward
            a = max(r_table[s_key].items(), key=lambda it: -it[1])[0]
        s_t1 = model.predict([np.append(s, a)]).astype(int)
        new_s = np.append(s_t1, np.delete(s, 1))
        r = reward(a)
        count += 1
        if count == 1000: done=True
        a_key = tuple(a)
        r_table[s_key][a_key] = r_table[s_key].get(a_key, 0) + r
        s = new_s

谢谢你的快速回复!这会在第一次计数中触发“ValueError:max()arg是一个空序列”(因此不需要经过循环)