Python 熊猫将每行与参考行进行比较-仅限某些列_Python_Python 2.7_Pandas_Comparison

Python 熊猫将每行与参考行进行比较-仅限某些列

python python-2.7 pandas

Python 熊猫将每行与参考行进行比较-仅限某些列,python,python-2.7,pandas,comparison,Python,Python 2.7,Pandas,Comparison,我在Python中有以下数据帧**** Temp_Fact Oscillops_read A B C D E F G H I J 0 A Today 0.710213 0.222015 0.814710 0.597732 0.634099 0.338913 0.452534 0.6

我在Python中有以下数据帧****

   Temp_Fact Oscillops_read         A         B         C         D         E         F         G         H         I         J
0          A          Today  0.710213  0.222015  0.814710  0.597732  0.634099  0.338913  0.452534  0.698082  0.706486  0.433162
1          B          Today  0.653489  0.452543  0.618755  0.555629  0.490342  0.280299  0.026055  0.138876  0.053148  0.899734
2          A          Aactl  0.129211  0.579690  0.641324  0.615772  0.927384  0.199651  0.652395  0.249467  0.262301  0.049795
3          A            DFE  0.743794  0.355085  0.637794  0.633634  0.810033  0.509244  0.470418  0.972145  0.647222  0.610636
4          C    Real_Mt_Olv  0.724282  0.332965  0.063078  0.004550  0.585398  0.869376  0.232148  0.630162  0.102206  0.232981
5          E         Q_Mont  0.221685  0.224834  0.110734  0.397999  0.814153  0.552924  0.981098  0.536750  0.251941  0.383994
6          D            DFE  0.655386  0.561297  0.305310  0.140998  0.433054  0.118187  0.479206  0.556546  0.556017  0.025070
7          F           Bryo  0.257884  0.228650  0.413149  0.285651  0.814095  0.275627  0.775620  0.392448  0.827725  0.935581
8          C          Aactl  0.017388  0.133848  0.939049  0.159416  0.923788  0.375638  0.331078  0.939089  0.098718  0.785569
9          C          Today  0.197419  0.595253  0.574718  0.373899  0.363200  0.289378  0.698455  0.252657  0.357485  0.020484
10         C           Pars  0.037771  0.683799  0.184114  0.545062  0.857000  0.295918  0.733196  0.613165  0.180642  0.254839
11         B           Pars  0.637346  0.090000  0.848710  0.596883  0.027026  0.792180  0.843743  0.461608  0.552165  0.215250
12         B           Pars  0.768422  0.017828  0.090141  0.108061  0.456734  0.803175  0.454479  0.501713  0.687016  0.625260
13         E       Tomorrow  0.860112  0.532859  0.091641  0.768896  0.635966  0.007211  0.656367  0.053136  0.482367  0.680557
14         D            DFE  0.801734  0.365921  0.243407  0.826373  0.904416  0.062448  0.801726  0.049983  0.433135  0.351150
15         F         Q_Mont  0.360710  0.330745  0.598830  0.582379  0.828019  0.467044  0.287276  0.470980  0.355386  0.404299
16         D      Last_Week  0.867126  0.600093  0.813257  0.005423  0.617543  0.657219  0.635255  0.314910  0.016516  0.689257
17         E      Last_Week  0.551499  0.724981  0.821087  0.175279  0.301397  0.304105  0.379553  0.971244  0.558719  0.154240
18         F           Bryo  0.511370  0.208831  0.260223  0.089106  0.121442  0.120513  0.099722  0.750769  0.860541  0.838855
19         E           Bryo  0.323441  0.663328  0.951847  0.782042  0.909736  0.512978  0.999549  0.225423  0.789240  0.155898
20         C       Tomorrow  0.267086  0.357918  0.562190  0.700404  0.961047  0.513091  0.779268  0.030190  0.460805  0.315814
21         B       Tomorrow  0.951356  0.570077  0.867533  0.365708  0.791373  0.232377  0.478656  0.003857  0.805882  0.989754
22         F          Today  0.963750  0.118826  0.264858  0.571066  0.761669  0.967419  0.565773  0.468971  0.466120  0.174815
23         B      Last_Week  0.291186  0.126748  0.154725  0.527029  0.021485  0.224272  0.259218  0.052286  0.205569  0.617701
24         F          Aactl  0.269308  0.655920  0.595518  0.404817  0.290342  0.447246  0.627082  0.306856  0.868357  0.979879

我还为每列提供了一系列值：

df_base = df[df['Oscillops_read'] == 'Last_Week']
df_base_val = df_base.mean(axis=0)

如您所见，这是一个Pandas系列，它是

Oscillops\u read==“Last\u Week”

中每列的平均值。以下是系列：

[0.56993702256121603, 0.48394061768804786, 0.59635616273775061, 0.23591030688019868, 0.31347492150330231, 0.39519847430740507, 0.42467546792253791, 0.4461465888887961, 0.26026797943899194, 0.48706569569369912]

我还有两个清单：

一,

此列表给出了在某些条件下（如下所述）必须添加到数据帧

df

的值

二,

这些是列名列表。必须将

df

中的这些列与上述平均序列进行比较。因此，例如，对于第6列列表第6列，必须将数据帧

df

每行的

和

列与序列的

和

列进行比较

问题： 如上所述，我需要比较数据帧

df

和基本系列

df_base_val

中的特定列。要比较的列在

列1、列2、列3、…、列7中列出。以下是我需要做的：

如果列在col_1
中的数据框列名的行（例如，列a
和C
的行）大于这两列中的基系列df_base_val
，则对于该行，在新列Range
中，输入列表Range_name_list
中的第6个值

示例：
例如，使用col_6
-这是第6个列表，它有A
和C
列名
步骤1：对于df
的第1行，列A
和C大于
df_base_val[A]
和df_base_val[C]
步骤2：因此，对于第1行，在一个新的列Range
，输入列表Range\u name\u list
中的第6个元素-第6个元素是气压计输出
示例输出：
执行此操作后，第一行变为：
0          A          Today  0.710213  0.222015  0.814710  0.597732  0.634099  0.338913  0.452534  0.698082  0.706486  0.433162  'Barometer_Output'

现在，如果此行不大于A
和C
列中的序列，并且不大于col_1
、col_2
等列中的序列，则必须为Range
列指定值“NOT_in_Range”。在这种情况下，此行将成为：
0          A          Today  0.710213  0.222015  0.814710  0.597732  0.634099  0.338913  0.452534  0.698082  0.706486  0.433162  'Not_in_Range'

简化和问题：
在本例中：
我只比较了第一排和基本系列。我需要比较一下
df
的所有行分别添加到基本系列中，并添加适当的值
我只使用了第6列-这是col_6
。类似地，我需要浏览每个列名列表-col\u 1
，col\u 2
，…，col\u 7

如果要比较的行不大于指定列中的任何列表col_1
到col_7
，则必须为列Range
分配值“not_in_Range”
有办法做到这一点吗？也许使用循环
****要创建上述数据帧，请从上面选择它并复制。然后使用以下代码：
import pandas as pd
df = pd.read_clipboard()
print df

编辑：
如果满足多个条件，我需要将它们全部列出。i、 如果行属于“Swg”和“Curnt”，那么我需要在Range列中列出这两个，或者为每个匹配结果创建单独的Range列，或者只是Python列表。Range1将列出“Swg”，Range2列将列出“Curnt”，等等。
对于初学者，我将创建一个包含条件集的字典，其中键可以用作范围列表的索引：
conditions = {0: list('DFA'),
              1: list('ACEF'),
              2: list('CEF'),
              3: list('ABDF'),
              4: list('DEF'),
              5: list('AC'),
              6: list('ABCDE')}

以下代码将完成我理解为您的任务：
# Create your Range column to be filled in later.
df['Range'] = '|'
for index, row in df.iterrows():
  for ix, list in conditions.iteritems():
    # Create a list of the outcomes of checking whether the
    # value for each condition column is greater than the 
    # df_base_val average.
    truths = [row[column] > df_base_val[column] for column in list]
    # See if all checks evaluated to True
    if sum(truths) == len(truths):
      # If so, set the 'Range' column's value for the current row
      # to the appropriate 'range_name'
      df.ix[index, 'Range'] = df.ix[index, 'Range'] + range_name_list[ix] + "|"
# Fill in all rows where no conditions were met with 'Not_in_Range'
df['Range'][df['Range'] == '|'] = 'Not_in_Range'

对于初学者，我将创建一个包含条件集的字典，其中键可以用作范围\名称\列表的索引：
conditions = {0: list('DFA'),
              1: list('ACEF'),
              2: list('CEF'),
              3: list('ABDF'),
              4: list('DEF'),
              5: list('AC'),
              6: list('ABCDE')}

以下代码将完成我理解为您的任务：
# Create your Range column to be filled in later.
df['Range'] = '|'
for index, row in df.iterrows():
  for ix, list in conditions.iteritems():
    # Create a list of the outcomes of checking whether the
    # value for each condition column is greater than the 
    # df_base_val average.
    truths = [row[column] > df_base_val[column] for column in list]
    # See if all checks evaluated to True
    if sum(truths) == len(truths):
      # If so, set the 'Range' column's value for the current row
      # to the appropriate 'range_name'
      df.ix[index, 'Range'] = df.ix[index, 'Range'] + range_name_list[ix] + "|"
# Fill in all rows where no conditions were met with 'Not_in_Range'
df['Range'][df['Range'] == '|'] = 'Not_in_Range'

请尝试以下代码：
df = pd.read_csv(BytesIO(txt), delim_whitespace=True)
df_base = df[df['Oscillops_read'] == 'Last_Week']
df_base_val = df_base.mean(axis=0)
columns = ['DFA', 'ACEF', 'CEF', 'ABDF', 'DEF', 'AC', 'ABCDE']
range_name_list = ['Base','Curnt','Prediction','Graph','Swg','Barometer_Output','Test_Cntr']

ranges = pd.Series(["NOT_IN_RANGE" for _ in range(df.shape[0])], index=df.index)

for name, cols in zip(range_name_list, columns):
    cols = list(cols)
    idx = df.index[(df[cols] > df_base_val[cols]).all(axis=1)]
    ranges[idx] = name

print ranges

但是如果一行有多个范围匹配，我不知道您想要什么。
请尝试以下代码：
df = pd.read_csv(BytesIO(txt), delim_whitespace=True)
df_base = df[df['Oscillops_read'] == 'Last_Week']
df_base_val = df_base.mean(axis=0)
columns = ['DFA', 'ACEF', 'CEF', 'ABDF', 'DEF', 'AC', 'ABCDE']
range_name_list = ['Base','Curnt','Prediction','Graph','Swg','Barometer_Output','Test_Cntr']

ranges = pd.Series(["NOT_IN_RANGE" for _ in range(df.shape[0])], index=df.index)

for name, cols in zip(range_name_list, columns):
    cols = list(cols)
    idx = df.index[(df[cols] > df_base_val[cols]).all(axis=1)]
    ranges[idx] = name

print ranges

但是如果一行有多个范围匹配，我不知道您想要什么。
谢谢。您可以发布输出数据帧吗？当我运行此代码时，对于输出的前3行df
，似乎df.loc[0:3，'F']
。列F
的值小于range\u name\u list
中的第6个值，这意味着range列将获得“Not\u in\u range”，而它似乎没有这样做。通过检查输出的前3行df
，所有其他列都工作正常。似乎只有F列有问题。。对于第1行，F
列为0.33891，而range\u list
第6项的值为0.39519867。我误解了您的问题，并将其与错误的df\u base\u val图进行了比较。已编辑，它现在应该适合您的需要。您能解释一下break
行的作用吗？如果满足If
语句，则适当填充“Range”列，并转到下一行。这对我来说似乎都是有道理的。但是为什么需要break
呢？如果只需要一个范围列，其中所有列都包含在其中，并用管道分隔呢？以上面的代码为例进行编辑。是的，因此我更改了df['Range']='|'
，使默认值为管道而不是空字符串——这样，Range列中的字符串将根据您注意到的更改（包括去掉break语句）变成| condition1 | condition2 |等（+range_name_list[ix]+“|”
）。最后一个更改是填充值为“|”而不是“.”的行（df['range'][df['range']==''Not_in_range'
）。谢谢。您能发布输出数据帧吗？当我运行此代码时，针对输出的前3行<