Python 基于关联的值将值连接到多列中_Python_Pandas_Pandas Groupby

Python 基于关联的值将值连接到多列中

python pandas

Python 基于关联的值将值连接到多列中,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,给定一个数据帧 +----+-------+------+-----------+-----------+---------------+ | | Key | ID | Status1 | Status2 | OrderID | |----+-------+------+-----------+-----------+---------------| | 0 | 1 | A1 | False | True | 1234-USF-0

给定一个数据帧

+----+-------+------+-----------+-----------+---------------+
|    |   Key | ID   | Status1   | Status2   | OrderID       |
|----+-------+------+-----------+-----------+---------------|
|  0 |     1 | A1   | False     | True      | 1234-USF-0025 |
|  1 |     1 | A1   | False     | True      | 1234-USF-0026 |
|  2 |     1 | A1   | False     | True      | 1234-USF-0027 |
|  3 |     2 | A1   | True      | True      | 1234-USF-0025 |
|  4 |     2 | A1   | True      | True      | 1234-USF-0026 |
|  5 |     2 | A1   | True      | True      | 1234-USF-0027 |
|  6 |     3 | A1   | Anything  | True      | 1234-USF-0025 |
|  7 |     3 | A1   | False     | True      | 1234-USF-0026 |
|  8 |     3 | A1   | False     | Anything  | 1234-USF-0027 |
|  9 |     4 | A2   | True      | True      | 1234-USF-0028 |
| 10 |     4 | A2   | True      | True      | 1234-USF-0029 |
| 11 |     4 | A2   | True      | True      | 1234-USF-0030 |
| 12 |     5 | A3   | True      | True      | 1234-USF-0031 |
| 13 |     5 | A3   | True      | True      | 1234-USF-0032 |
| 14 |     5 | A3   | True      | True      | 1234-USF-0033 |
| 15 |     6 | A4   | True      | True      | 1234-USF-0034 |
| 16 |     6 | A4   | True      | True      | 1234-USF-0035 |
| 17 |     6 | A4   | True      | True      | 1234-USF-0036 |
+----+-------+------+-----------+-----------+---------------+

如何转换为列出每个

OrderID

每个

ID

并基于每个

状态连接键
。如果两个states
均为True，则串联的键应位于True
列中。如果其中一个为Flase
，则键应位于FALSE
列中。如果状态
中的一个（或两个）
不是真
或假
，则键
将连接到其他
列中
期望结果df
Order ID        ID  TRUE    FALSE  OTHER
1234-USF-0025   A1   2       1       3
1234-USF-0026   A1   2       1,3
1234-USF-0027   A1   2       1       3
1234-USF-0028   A2   4  
1234-USF-0029   A2   4  
1234-USF-0030   A2   4  
1234-USF-0031   A3   5  
1234-USF-0032   A3   5  
1234-USF-0033   A3   5  
1234-USF-0034   A4   6  
1234-USF-0035   A4   6  
1234-USF-0036   A4   6  

我尝试过的
df=df.groupby（['OrderID'，'ID']）['Key'].apply（'，'join.）.reset_index（）

上面的内容确实让我很接近，但我不确定如何将键
分解到各自的列中（TRUE
、FALSE
和其他
）
注释
我以前将键
列转换为字符串
订单ID
可以为ID
复制，但会有不同的键
这是一个可行的解决方案，但肯定有一种更快更干净的方法。首先为布尔逻辑添加一列，然后执行groupby
压缩表，然后使用键和结果列遍历并填充True
、False
和其他
列。最后，我删除不需要的列并聚合行
import pandas as pd
import numpy as np
# Your dataframe for testing purposes
df = pd.DataFrame({'Key': '1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6'.split(),
                   'ID': 'A1 A1 A1 A1 A1 A1 A1 A1 A1 A2 A2 A2 A3 A3 A3 A4 A4 A4'.split(),
                   'Status1': 'False False False True True True Anything False False True True True True True True True True True'.split(),
                   'Status2': 'True True True True True True True True Anything True True True True True True True True True'.split(),
                   'OrderID': '25 26 27 25 26 27 25 26 27 28 29 30 31 32 33 34 35 36'.split()})



# First we need to do this boolean logic
df["Result"] = ""
for index, row in df.iterrows():
  stat1 = row["Status1"]
  stat2 = row["Status2"]

  if stat1 == "True" and stat2 == "True":
    row["Result"] = "True"
  elif stat1 == "False" and stat2 == "False" or stat1 == "True" and stat2 == "False" or stat1 == "False" and stat2 == "True":
    row["Result"] = "False"
  else:
    row["Result"] = "Other"


# Now we do your group by
df = df.groupby(['OrderID','ID', 'Result'])['Key'].apply(','.join).reset_index()


# Now we populate the columns you wanted populated
df["True"] = ""
df["False"] = ""
df["Other"] = ""
for index, row in df.iterrows():
  if row[row["Result"]]:
    row[row["Result"]] += "," + row["Key"]
  else:
    row[row["Result"]] += row["Key"]
del df['Result']
del df['Key']


# Final we aggregate the rows to flatten it.
df = df.groupby(['OrderID','ID'], as_index=False).agg(lambda x: "%s" % ''.join(x))

我可以帮忙，但我不知道你想要的最终结果是什么样的。你能展示你想要的数据框是什么样子吗？因为每个键有两种状态，并且每个订单ID可以有多个键，所以我不明白您希望帧是什么样子。@错误-语法自责，我将我的问题编辑得更清楚，即…如果两个状态都是真的
，则串联的键应该放在真的
列中。如果其中一个为Flase
，则键应位于FALSE
列中。如果状态
中的一个（或两个）
不是真
或假
，则键
会连接到其他
列中。因此，您根本不希望该键列为真-假和其他列？是的，该键列会被解析为真
，FALSE
或OTHER列。最后一个清晰问题：数据帧中的“True”布尔值或字符串是否正确？得到了有效的解决方案（已测试）。我会更新我的答案。
import pandas as pd
import numpy as np
# Your dataframe for testing purposes
df = pd.DataFrame({'Key': '1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6'.split(),
                   'ID': 'A1 A1 A1 A1 A1 A1 A1 A1 A1 A2 A2 A2 A3 A3 A3 A4 A4 A4'.split(),
                   'Status1': 'False False False True True True Anything False False True True True True True True True True True'.split(),
                   'Status2': 'True True True True True True True True Anything True True True True True True True True True'.split(),
                   'OrderID': '25 26 27 25 26 27 25 26 27 28 29 30 31 32 33 34 35 36'.split()})



# First we need to do this boolean logic
df["Result"] = ""
for index, row in df.iterrows():
  stat1 = row["Status1"]
  stat2 = row["Status2"]

  if stat1 == "True" and stat2 == "True":
    row["Result"] = "True"
  elif stat1 == "False" and stat2 == "False" or stat1 == "True" and stat2 == "False" or stat1 == "False" and stat2 == "True":
    row["Result"] = "False"
  else:
    row["Result"] = "Other"


# Now we do your group by
df = df.groupby(['OrderID','ID', 'Result'])['Key'].apply(','.join).reset_index()


# Now we populate the columns you wanted populated
df["True"] = ""
df["False"] = ""
df["Other"] = ""
for index, row in df.iterrows():
  if row[row["Result"]]:
    row[row["Result"]] += "," + row["Key"]
  else:
    row[row["Result"]] += row["Key"]
del df['Result']
del df['Key']


# Final we aggregate the rows to flatten it.
df = df.groupby(['OrderID','ID'], as_index=False).agg(lambda x: "%s" % ''.join(x))