Python 熊猫：来自数据帧的顺序值_Python_Pandas_Indexing_Dataframe

Python 熊猫：来自数据帧的顺序值

python pandas indexing dataframe

Python 熊猫：来自数据帧的顺序值,python,pandas,indexing,dataframe,Python,Pandas,Indexing,Dataframe,我有以下数据： Third party unique identifier Qsex Qage Qfamilystatus QeducationSingle Qincomeevaluation Qjobstatus QRuCitySize QRuDistrict Qcountry 9ea3e3cb6719f3d336d324c446f486bd 1 32 1 5 1 1 1 1 cb570bb986808a5f4d2

我有以下数据：

Third party unique identifier   Qsex    Qage    Qfamilystatus       QeducationSingle    Qincomeevaluation   Qjobstatus  QRuCitySize QRuDistrict Qcountry
9ea3e3cb6719f3d336d324c446f486bd    1   32  1       5   1   1   1   1
cb570bb986808a5f4d2629287297b902    2   25          5   2   1   1   1
78b3a44eb7c7f7c687ffbcfed57647a4    1   30          4   1   3   6   1
1c728b223a4c2c267f3a3630b4a63f6e    2   45          4   1   1   1   1
8852ecd198fddfa557186c863f2c6fdf    2   41          4   1   7   7   1
1adc146b9ec35f7c632902f480d7e95c    1   70          5   3   1   1   1
0fb0c903a6b2b68f1b0a7cd1962f353c    1   29          5   1   5   7   1

另一个df：

QRuDistrict 1   ЦФО
QRuDistrict 2   ЮФО
QRuDistrict 3   СЗФО
QRuDistrict 4   ДВФО
QRuDistrict 5   СФО
QRuDistrict 6   УФО
QRuDistrict 7   ПФО
QRuDistrict 8   СКФО
QRuDistrict 9   Крымский ФО

我尝试将第一个df中的值替换为第二个df中的数据，计算百分比并将其写入

excel

我使用：

d = (df_1[df_1['sign']=='Qcountry'].set_index('number')['result'].to_dict())
df['Country'] = df.Qcountry.map(d)
df2 = pd.crosstab(df.Country, df.Qcountry, margins=True)
df3 = np.round(df2[["All"]] / df['Country'].count() * 100, 2).rename(columns={"All": '%'})
country = pd.concat([df2[["All"]], df3], axis=1)
less = country[country['%'] < 5]

country = country[country['%'] >= 5]
country['All'] = ((all_users * df3.divide(100)).astype(int))
country['%'] = country['%'].astype(str) + '%'
country.to_excel(writer, sheet_name=sheet_name, startrow=48, startcol=4)

但我想得到第二个数据帧中的序列。我想得到：

Federal Districts   Россия  
N   %
ЦФО 764 31.08%
ЮФО 205 8.35%
СЗФО    420 17.09%
ДВФО    131 5.33%
СФО 259 10.53%
УФО 208 8.48%
ПФО 416 16.91%
СКФО    43  1.75%
Крымский ФО 11  0.48%
Total   2461    100.0%

如何按此顺序排序？

我认为您可以使用第二个数据帧，但有必要将最后一项添加到

列表中：
print (df)
               a            b
0  QRuDistrict 1          ЦФО
1  QRuDistrict 2          ЮФО
2  QRuDistrict 3         СЗФО
3  QRuDistrict 4         ДВФО
4  QRuDistrict 5          СФО
5  QRuDistrict 6          УФО
6  QRuDistrict 7          ПФО
7  QRuDistrict 8         СКФО
8  QRuDistrict 9  Крымский ФО

print (df1)
            Federal Districts  Россия  
                            N         %
ДВФО                      131     5.33%
Крымский ФО                11     0.48%
ПФО                       416    16.91%
СЗФО                      420    17.09%
СКФО                       43     1.75%
СФО                       259    10.53%
УФО                       208     8.48%
ЦФО                       764    31.08%
ЮФО                       205     8.35%
Total                    2461    100.0%


如果使用，则顺序不同：
df1 = df1.sort_index(ascending=False)
print (df1)
            Federal Districts  Россия  
                            N         %
ЮФО                       205     8.35%
ЦФО                       764    31.08%
УФО                       208     8.48%
СФО                       259    10.53%
СКФО                       43     1.75%
СЗФО                      420    17.09%
ПФО                       416    16.91%
Крымский ФО                11     0.48%
ДВФО                      131     5.33%
Total                    2461    100.0%

按注释编辑：
我更改了列名，您似乎只需要列符号
的值，其中第一列编号
包含QRuDistrict
。然后，您可以使用和遮罩：
你看了吗？只需使用第二个数据帧中的第二列。附加它并使用它对数据进行排序，因为它是从1到9按您需要的顺序排列的。@EdChum如果我理解为true，我可以将它用于数值，但我不知道，如何指定在另一个值中进行排序dataframe@Ev.Kounis我在df['Country']=df.Qcountry.map（d）df2=pd.crosstab之前尝试df=df.sort_值（'Qcountry'）
（df.Country，df.Qcountry，margins=True）df3=np.round（df2[[“All”]/df['Country'].count（）*100，2）.rename（columns={“All”：“%”}）
但是它没有帮助。你可以说，如果我的df
在第一列中不仅包含QRuDistrict
，我如何从那里只获得QRuDistrict
但是它返回KeyError:False
第二个数据帧的列是什么？打印（df.columns）？索引（[u'sign'，u'number'，u'result']，dtype='object'），然后我尝试idx=df['result'='QRuDistrict']）。tolist（）+['Total']
谢谢。所以第一列带有QRuDistrict 1
，QRuDistrict 2
是结果
，带有ФФ〇
，Ф〇的列名是什么？
idx = df.b.tolist() + ['Total']
print (idx)
['ЦФО', 'ЮФО', 'СЗФО', 'ДВФО', 'СФО', 'УФО', 'ПФО', 'СКФО', 'Крымский ФО', 'Total']
df1 = df1.reindex(idx)
print (df1)
            Federal Districts  Россия  
                            N         %
ЦФО                       764    31.08%
ЮФО                       205     8.35%
СЗФО                      420    17.09%
ДВФО                      131     5.33%
СФО                       259    10.53%
УФО                       208     8.48%
ПФО                       416    16.91%
СКФО                       43     1.75%
Крымский ФО                11     0.48%
Total                    2461    100.0%

df1 = df1.sort_index(ascending=False)
print (df1)
            Federal Districts  Россия  
                            N         %
ЮФО                       205     8.35%
ЦФО                       764    31.08%
УФО                       208     8.48%
СФО                       259    10.53%
СКФО                       43     1.75%
СЗФО                      420    17.09%
ПФО                       416    16.91%
Крымский ФО                11     0.48%
ДВФО                      131     5.33%
Total                    2461    100.0%

print (df)
          number         sign
0  QRuDistrict 1          ЦФО
1  QRuDistrict 2          ЮФО
2  QRuDistrict 3         СЗФО
3  QRuDistrict 4         ДВФО
4  QRuDistrict 5          СФО
5  QRuDistrict 6          УФО
6  QRuDistrict 7          ПФО
7  QRuDistrict 8         СКФО
8  QRuDistrict 9  Крымский ФО

idx = df.ix[df.number.str.contains('QRuDistrict'), 'sign'].tolist() + ['Total']
print (idx)
['ЦФО', 'ЮФО', 'СЗФО', 'ДВФО', 'СФО', 'УФО', 'ПФО', 'СКФО', 'Крымский ФО', 'Total']