Python 如何从选择的数据框中随机创建首选项数据框？_Python_Python 3.x_Dataframe_Random

Python 如何从选择的数据框中随机创建首选项数据框？

python python-3.x dataframe random

Python 如何从选择的数据框中随机创建首选项数据框？,python,python-3.x,dataframe,random,Python,Python 3.x,Dataframe,Random,我有一个投票的数据框，我想创建一个偏好。例如，这里是每个城市社区中各政党P1、P2、P3的票数，社区2 Comm Votes P1 P2 P3 0 comm1 1315.0 2.0 424.0 572.0 1 comm2 4682.0 117.0 2053.0 1584.0 2 comm3 2397.0 2.0 40.0 192.0 3 comm4 931.0 2.0 12.

我有一个投票的数据框，我想创建一个偏好。例如，这里是每个城市社区中各政党P1、P2、P3的票数，社区2

    Comm    Votes   P1      P2      P3
0   comm1   1315.0  2.0     424.0   572.0
1   comm2   4682.0  117.0   2053.0  1584.0
2   comm3   2397.0  2.0     40.0    192.0
3   comm4   931.0   2.0     12.0    345.0
4   comm5   842.0   47.0    209.0   76.0
... ... ... ... ... ...
1524    comm1525    10477.0 13.0    673.0   333.0
1525    comm1526    2674.0  1.0 55.0    194.0
1526    comm1527    1691.0  331.0   29.0    78.0

我想测试一下，这些选举结果足以让投票系统首次通过。因此，对于每个政党，我都需要获得偏好

因为我不知道偏好，我想用随机数来做。我认为选民是诚实的。例如，我们知道P1党在镇上社区有2人投了票，有1315名选民。我需要创建首选项，看看人们是否会将其作为第一、第二或第三个选项。也就是说，对于各方而言：

     Comm      Votes    P1_1        P1_2    P1_3    P2_1    P2_2    P2_3    P3_1     P3_2   P3_3
0    comm1      1315.0  2.0         1011.0  303.0   424.0   881.0   10.0    570.0    1.0    1.0
... ... ... ... ... ...
1526 comm1527   1691.0  331.0   1300.0  60.0    299.0   22.0    10.0    ...

因此，我必须：

# for each column in parties I create (parties -1) other columns
# I rename them all Party_i. The former 1 becomes Party_1.
# In the other columns I put a random number. 
# For a given line, the sum of all Party_i for i in [1, parties] mus t be equal to Votes

到目前为止，我试过：

缔约方=[如果项目不在['Comm'，'vowers']中，则df.列中的项目对应项目] 对于索引，df_test.it中的行如箭头所示：在其他列中，我输入了一个随机数。缔约方：对于parties中的每个列，我创建parties-1个其他列对于0范围内的i，请参见-1： printrandom.randrange0，第['Votes'行] 我把它们都改名为Party_I。前一方成为第1方。行[{party}{preference}.formatparty=party，preference=i]=random.randrange0，行['voces']如果行[party]<行['voces']否则0为false，因为投票总数不等于df['voces'] 结果是：

     Comm      Votes    ... P1_1    P1_2   P1_3    P2_1    P2_2    P2_3    P3_1     P3_2   P3_3
0    comm1      1315.0  ... 1003    460    1588    1284    1482    1613    1429   345
1    comm2      1691.0  ... 1003    460    1588    1284    1482    1613    ...  
...

但是：

每行的数字相同 Pi_1行中的值不等于作为给定方的Pi行中的值。 [0，parties]中所有j的Pi_j之和不等于投票列中的数字使现代化我用安蒂黑德自己的数据尝试了他的答案，效果很好。但当应用到我自己的数据时，它并没有。它给我留下了一个空数据框：

import collections

def fill_cells(cell):
    v_max = cell['Votes']
    all_dict = {}
    #iterate over parties.copy()
    for p in parties:
        tmp_l = parties.copy()
        tmp_l.remove(p)
        # sample new data with equal choices
        sampled = np.random.choice(tmp_l, int(v_max-cell[p]))
        # transform into dictionary
        c_sampled = dict(collections.Counter(sampled))
        c_sampled.update({p:cell[p]})
        # batch update of the dictio~nary keys
        all_dict.update(
            dict(zip([p+'_%s' %k[1] for k in c_sampled.keys()], c_sampled.values()))
            )
    return pd.Series(all_dict)

    Comm    Votes   LPC     CPC     BQ
0   comm1   1315.0  2.0     424.0   572.0
1   comm2   4682.0  117.0   2053.0  1584.0
2   comm3   2397.0  2.0     40.0    192.0
3   comm4   931.0   2.0     12.0    345.0
4   comm5   842.0   47.0    209.0   76.0
...     ...     ...     ...     ...     ...
1522    comm1523    23808.0     1588.0  4458.0  13147.0
1523    comm1524    639.0   40.0    126.0   40.0
1524    comm1525    10477.0     13.0    673.0   333.0
1525    comm1526    2674.0  1.0     55.0    194.0
1526    comm1527    1691.0  331.0   29.0    78.0

实际上，使用以下数据帧：

import collections

def fill_cells(cell):
    v_max = cell['Votes']
    all_dict = {}
    #iterate over parties.copy()
    for p in parties:
        tmp_l = parties.copy()
        tmp_l.remove(p)
        # sample new data with equal choices
        sampled = np.random.choice(tmp_l, int(v_max-cell[p]))
        # transform into dictionary
        c_sampled = dict(collections.Counter(sampled))
        c_sampled.update({p:cell[p]})
        # batch update of the dictio~nary keys
        all_dict.update(
            dict(zip([p+'_%s' %k[1] for k in c_sampled.keys()], c_sampled.values()))
            )
    return pd.Series(all_dict)

    Comm    Votes   LPC     CPC     BQ
0   comm1   1315.0  2.0     424.0   572.0
1   comm2   4682.0  117.0   2053.0  1584.0
2   comm3   2397.0  2.0     40.0    192.0
3   comm4   931.0   2.0     12.0    345.0
4   comm5   842.0   47.0    209.0   76.0
...     ...     ...     ...     ...     ...
1522    comm1523    23808.0     1588.0  4458.0  13147.0
1523    comm1524    639.0   40.0    126.0   40.0
1524    comm1525    10477.0     13.0    673.0   333.0
1525    comm1526    2674.0  1.0     55.0    194.0
1526    comm1527    1691.0  331.0   29.0    78.0

我有一个空数据框：

import collections

def fill_cells(cell):
    v_max = cell['Votes']
    all_dict = {}
    #iterate over parties.copy()
    for p in parties:
        tmp_l = parties.copy()
        tmp_l.remove(p)
        # sample new data with equal choices
        sampled = np.random.choice(tmp_l, int(v_max-cell[p]))
        # transform into dictionary
        c_sampled = dict(collections.Counter(sampled))
        c_sampled.update({p:cell[p]})
        # batch update of the dictio~nary keys
        all_dict.update(
            dict(zip([p+'_%s' %k[1] for k in c_sampled.keys()], c_sampled.values()))
            )
    return pd.Series(all_dict)

    Comm    Votes   LPC     CPC     BQ
0   comm1   1315.0  2.0     424.0   572.0
1   comm2   4682.0  117.0   2053.0  1584.0
2   comm3   2397.0  2.0     40.0    192.0
3   comm4   931.0   2.0     12.0    345.0
4   comm5   842.0   47.0    209.0   76.0
...     ...     ...     ...     ...     ...
1522    comm1523    23808.0     1588.0  4458.0  13147.0
1523    comm1524    639.0   40.0    126.0   40.0
1524    comm1525    10477.0     13.0    673.0   333.0
1525    comm1526    2674.0  1.0     55.0    194.0
1526    comm1527    1691.0  331.0   29.0    78.0

这是否有效：

数据列=['Comm'、'vows'、'P1'、'P2'、'P3'] 数据=['comm1'，1315.0,2.0424.0572.0]， [comm2'，4682.0，117.0，2053.0，1584.0]， [comm3'，2397.0,2.0,40.0,192.0]， [comm4'，931.0,2.0,12.0,345.0]， [comm5'，842.0,47.0,209.0,76.0]， [comm1525'，10477.0,13.0,673.0,333.0]， [comm1526'，2674.0,1.0,55.0,194.0]， [comm1527'，1691.0331.0,29.0,78.0]] df=pd.DataFramedata=data，columns=columns 导入集合 def加注槽： v_max=单元格[“投票数”] 全部_dict={} 迭代各方对于['P1'，'P2'，'P3']中的p： tmp_l=['P1'，'P2'，'P3'] tmp_l.移除以相同的选择采样新数据采样=np.random.choicetmp\u l，intv\u max-cell[p] 转化为词典 c_sampled=dictcollections.Countersampled c_sampled.update{p:cell[p]} 批量更新字典键所有目录更新 dictzip[p+''s%k[1]用于c_sampled.keys]中的k，c_sampled.values 返回pd.Seriesall_dict 获取数据帧 df.applyfill_单元格，轴=1 如果需要重新合并数据帧，请执行以下操作：

新建_df=df.applyfill_单元格，轴=1 pd.concat[df，新的_-df]，轴=1

根据Antihead的回答，针对以下数据集：

    Comm    Votes   LPC     CPC     BQ
0   comm1   1315.0  2.0     424.0   572.0
1   comm2   4682.0  117.0   2053.0  1584.0
2   comm3   2397.0  2.0     40.0    192.0
3   comm4   931.0   2.0     12.0    345.0
4   comm5   842.0   47.0    209.0   76.0
...     ...     ...     ...     ...     ...
1522    comm1523    23808.0     1588.0  4458.0  13147.0
1523    comm1524    639.0   40.0    126.0   40.0
1524    comm1525    10477.0     13.0    673.0   333.0
1525    comm1526    2674.0  1.0     55.0    194.0
1526    comm1527    1691.0  331.0   29.0    78.0

我试过：

def加注槽：投票数=单元格[“投票数”] 全部_dict={} 迭代各方 parties\u temp=parties.copy 对于第三方的p\U温度：首选项=['1'，'2'，'3'] 有关首选项中的首选项：首选项以相同的选择采样新数据采样=np.random.choicepreferences，intvoces\u max-cell[p] 转化为词典 c_sampled=dictcollections.Countersampled c_sampled.update{p:cell[p]} c_sampled['1']=c_sampled.popp 批量更新字典键所有目录更新 dictzip[p+'''%s'%k代表c_sampled.keys中的k]，c_sampled.values 返回pd.Seriesall_dict 它回来了

LPC_2 LPC_3 LPC_1 CPC_2 CPC_3 CPC_1 BQ_2 BQ_3 BQ_1 0 891.0 487.0 424.0 743.0 373.0 572.0 1313.0 683.0 2.0 1 2629.0 1342.0 2053.0 3098.0 1603.0 1584.0 4565.0 2301.0 117.0 2 2357.0 1186.0 40.0 2205.0 1047.0 192.0 2395.0 1171.0 2.0 3 919.0 451.0 12.0 586.0 288.0 345.0 929.0 455.0 2.0 4 633.0 309.0 209.0 766.0 399.0 76.0 795.0 396.0 47.0 ... ... ... ... ... ... ... ... ... ... 1520 1088.0 536.0 42.0 970.0 462.0 160.0 1117.0 540.0 13.0 1521 4742.0 2341.0 219.0 3655.0 1865.0 1306.0 4705.0 2375.0 256.0 1522 19350.0 9733.0 4458.0 10661.0 5352.0 13147.0 22220.0 11100.0 1588.0 1523 513.0 264.0 126.0 599.0 267.0 40.0 599.0 306.0 40.0 1524 9804.0 4885.0 673.0 10144.0 5012.0 333.0 10464.0 5162.0 13.0 它是

差不多好了。我希望首选项是动态编码的，而不是硬编码的['1'、'2'、'3']。

为了更好地理解，您是否能够重新表述这个问题？例如，您想计算未知社区中一个人投票给P1、P2、P3的机会吗？@Antihead Sure。例如，如果我们取一行的comm1和一个参与方P1，我只需要[1，参与方数量]中I的每个单元格P1_I中的随机数，它们的总和必须等于投票数，P1_1必须等于P1。这有意义吗？我理解正确了吗：对于每个单元格：您想要分割剩余的投票：|投票|-P|i，到P|j}j元素，从[1,2,3]到i=J随机chance@Antihead对P_{j}是随机的，意味着政党的票数Ptmp_l=政党不会复制您的列表，而是引用它。您需要复制列表，而不是tmp_l=parties，copy不幸的是否：I get ValueError:“不允许负维度”，“发生在索引0”和KeyError:“PI_3”，“发生在索引0”是否使用python3？尝试使用我在回答中首先定义的数据框，看看代码是否运行。我运行它没有问题。。是的，我使用python3，我试过你的数据帧。我已经更新了我的问题，尝试了你的答案。非常感谢，你的代码现在可以工作了！然而，标题的名称是P1_P1、P1_P2、P1_P3、P2_P1、。。。我想创建P1_1，P1_2。。。为了表达偏好。例如，P1_2将是选择第二选择方P1的人数列。目前，我正在使用您的代码来实现此目的，但尚未实现。yetI相应地调整了代码，表达式p+''%s'%k[1]现在为您提供了Pi_j，而不是Pi_Pj