Python 3.x 在过滤掉某些数据,然后应用自定义排序后,如何在pandas中查找行中的值?

Python 3.x 在过滤掉某些数据,然后应用自定义排序后,如何在pandas中查找行中的值?,python-3.x,Python 3.x,我有一个带有8种货币(加元、澳元、挪威克朗、瑞典克朗、新西兰元、欧元、英镑、日元)和2个不同数据点的df:1200万美元的PR(12个月的价格回报)和1200万新西兰元(12个月的z分数) 我想首先应用一个过滤器,通过检索1200万欧元的最低4种货币以及相应的1200万欧元。在第一个过滤器之后,df2应该如下所示: df2 = {'Date': ('2015-10-6', '2015-10-7'), 'CAD 12M PR': (-7.4, -4.9), 'AUD 12M

我有一个带有8种货币(加元、澳元、挪威克朗、瑞典克朗、新西兰元、欧元、英镑、日元)和2个不同数据点的df:1200万美元的PR(12个月的价格回报)和1200万新西兰元(12个月的z分数)

我想首先应用一个过滤器,通过检索1200万欧元的最低4种货币以及相应的1200万欧元。在第一个过滤器之后,df2应该如下所示:

df2 = {'Date': ('2015-10-6', '2015-10-7'),
      'CAD 12M PR': (-7.4, -4.9),
      'AUD 12M PR': (-2.3, -1.6),
      'EUR 12M PR': (2.2, 4.7),
      'GBP 12M PR': (-3.6, -2.5),
      'CAD 12M ZS': (3.1, 2.5),
      'AUD 12M ZS': (-1.7, 3.0),
      'EUR 12M ZS': (-3.8, -3.7),
      'GBP 12M ZS': (-1.6, -2.7),
     }
df3 = {'Date': ('2015-10-6', '2015-10-7'),
      'AUD 12M PR': (-2.3, -1.6),
      'EUR 12M PR': (2.2, 4.7),
      'AUD 12M ZS': (-1.7, 3.0),
      'EUR 12M ZS': (-3.8, -3.7),
     }
在应用过滤器后,我想从已过滤的列表中检索具有最低1200万ZS的2种货币。在上述过滤列表中,澳元和欧元在这两个日期的最低汇率为1200万兹罗提(但这可能会改变)。按12M ZS进行排序后,df3应如下所示:

df2 = {'Date': ('2015-10-6', '2015-10-7'),
      'CAD 12M PR': (-7.4, -4.9),
      'AUD 12M PR': (-2.3, -1.6),
      'EUR 12M PR': (2.2, 4.7),
      'GBP 12M PR': (-3.6, -2.5),
      'CAD 12M ZS': (3.1, 2.5),
      'AUD 12M ZS': (-1.7, 3.0),
      'EUR 12M ZS': (-3.8, -3.7),
      'GBP 12M ZS': (-1.6, -2.7),
     }
df3 = {'Date': ('2015-10-6', '2015-10-7'),
      'AUD 12M PR': (-2.3, -1.6),
      'EUR 12M PR': (2.2, 4.7),
      'AUD 12M ZS': (-1.7, 3.0),
      'EUR 12M ZS': (-3.8, -3.7),
     }
因此,第一个过滤器会找到最低1200万欧元的4种货币,第二个过滤器会找到最低1200万欧元的4种货币中最低1200万欧元的2种货币。但我不知道如何从df转换到df3

我可以使用以下代码获得最低1200万欧元(df2)的4种货币:

Short = {
              'Short 1':
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].T.apply(lambda x: x.nsmallest(1).idxmax()).str[0:3],
             'Short 2':
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].T.apply(lambda x: x.nsmallest(2).idxmax()).str[0:3],
             'Short 3':
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].T.apply(lambda x: x.nsmallest(3).idxmax()).str[0:3],
             'Short 4':
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].T.apply(lambda x: x.nsmallest(4).idxmax()).str[0:3],
             'Short 1 12M PR':
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].apply(lambda row: row.nsmallest(1).values[-1],axis=1),
             'Short 2 12M PR':
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].apply(lambda row: row.nsmallest(2).values[-1],axis=1),
             'Short 3 12M PR':
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].apply(lambda row: row.nsmallest(3).values[-1],axis=1),
             'Short 4 12M PR':
             df[['CAD 12M PR', 'AUD 12M PR', 'NOK 12M PR', 'SEK 12M PR', 'NZD 12M PR', 'EUR 12M PR', 'GBP 12M PR', 'JPY 12M PR']].apply(lambda row: row.nsmallest(4).values[-1],axis=1),
                }

一旦我有了最低1200万美元的4种货币(df2),我就不知道如何应用基于过滤列表(df2)的最后一种排序来到达df3

这里有一个解决方案,尽管可能不是最有效的解决方案

import pandas as pd


df = {
    "Date": ("2015-10-6", "2015-10-7"),
    "CAD 12M PR": (-7.4, -4.9),
    "AUD 12M PR": (-2.3, -1.6),
    "NOK 12M PR": (2.6, 6.4),
    "SEK 12M PR": (6.7, 8.6),
    "NZD 12M PR": (3.1, 2.9),
    "EUR 12M PR": (2.2, 4.7),
    "GBP 12M PR": (-3.6, -2.5),
    "JPY 12M PR": (13.8, 15.7),
    "CAD 12M ZS": (3.1, 2.5),
    "AUD 12M ZS": (-1.7, 3.0),
    "NOK 12M ZS": (2.0, 1.8),
    "SEK 12M ZS": (2.6, 2.6),
    "NZD 12M ZS": (-4.5, -5.6),
    "EUR 12M ZS": (-3.8, -3.7),
    "GBP 12M ZS": (-1.6, -2.7),
    "JPY 12M ZS": (3.0, 2.1),
}

df2 = {
    "Date": ("2015-10-6", "2015-10-7"),
    "CAD 12M PR": (-7.4, -4.9),
    "AUD 12M PR": (-2.3, -1.6),
    "EUR 12M PR": (2.2, 4.7),
    "GBP 12M PR": (-3.6, -2.5),
    "CAD 12M ZS": (3.1, 2.5),
    "AUD 12M ZS": (-1.7, 3.0),
    "EUR 12M ZS": (-3.8, -3.7),
    "GBP 12M ZS": (-1.6, -2.7),
}


df3 = {
    "Date": ("2015-10-6", "2015-10-7"),
    "AUD 12M PR": (-2.3, -1.6),
    "EUR 12M PR": (2.2, 4.7),
    "AUD 12M ZS": (-1.7, 3.0),
    "EUR 12M ZS": (-3.8, -3.7),
}

pd_df = pd.DataFrame(df)

# setup
n_PR = 4
n_ZS = 2
target_date = "2015-10-6"

# only look at target date data for now
pd_target_date = pd_df.loc[pd_df["Date"] == target_date]

# separate 12M PR and 12M ZS
pd_PR_df = pd_target_date.filter(regex=".*12M PR")
pd_ZS_df = pd_target_date.filter(regex=".*12M ZS")

# get the smallest n values for PR and ZS
pd_PR_df = pd_PR_df.transpose().nsmallest(n=n_PR, columns=0).transpose()

# get the country names of those that passed the first filter
# 3 is hard coded for 3-letter symbol for currency
lowest_countries = [x[:3] for x in pd_PR_df.columns]

# get the lowest countries' ZS
regex_str = "(" + ".*|".join(lowest_countries) + ".*)"
pd_ZS_df = pd_ZS_df.filter(regex=regex_str)

# aggregate results back to original data frame and sanity check
pd_df2_test = pd_df[pd_PR_df.columns].join(pd_df[pd_ZS_df.columns], how="outer")
pd_df2 = pd.DataFrame(df2)
pd_df2 = pd_df2.drop(columns=["Date"])
# absurd assert to make sure they match, this was significnatly more complicated than it should have been, there is probably a better way
assert set(pd_df2_test.columns) == set(pd_df2.columns) and all(
    [
        len(pd_df2[pd_df2[col] == pd_df2_test[col]]) == len(pd_df2[col])
        for col in pd_df2.columns
    ]
), "DataFrames did not match"


# second filter
pd_ZS_df = pd_ZS_df.transpose().nsmallest(n=n_ZS, columns=0).transpose()

lowest_countries = [x[:3] for x in pd_ZS_df.columns]

# get the lowest countries' PR
regex_str = "(" + ".*|".join(lowest_countries) + ".*)"
pd_PR_df = pd_PR_df.filter(regex=regex_str)

# aggregate results back to original data frame and sanity check
pd_df3_test = pd_df[pd_PR_df.columns].join(pd_df[pd_ZS_df.columns], how="outer")
pd_df3 = pd.DataFrame(df3)
pd_df3 = pd_df3.drop(columns=["Date"])
# absurd assert to make sure they match, this was significnatly more complicated than it should have been, there is probably a better way
assert set(pd_df3_test.columns) == set(pd_df3.columns) and all(
    [
        len(pd_df3[pd_df3[col] == pd_df3_test[col]]) == len(pd_df3[col])
        for col in pd_df3.columns
    ]
), "DataFrames did not match"

final_result = pd_df3_test
print(final_result)

下面是通过重新格式化数据来解决问题的另一种方法。这是Quantopian上的一位用户发送的(非常感谢他和所有帮助过他的人)


你是个英雄,先生