Python使用条件逻辑合并行_Python_Pandas_Dataframe

Python使用条件逻辑合并行

python pandas dataframe

Python使用条件逻辑合并行,python,pandas,dataframe,Python,Pandas,Dataframe,Python新手，请原谅发音不好。我在一个数据框中有一些数据，我已经将drop_复制应用到该数据框中，以便识别项中的状态更改。数据如下所示。我的目标是在项目Id上建立一些老化。（注意：特定项目Id的所有记录上的创建日期相同）。我编辑了这篇文章，展示了我所做的尝试和得到的结果 Item Id State Created Date Date Severity 0 327863 New 2019-02-11 2019-10-0

Python新手，请原谅发音不好。我在一个数据框中有一些数据，我已经将drop_复制应用到该数据框中，以便识别项中的状态更改。数据如下所示。我的目标是在项目Id上建立一些老化。（注意：特定项目Id的所有记录上的创建日期相同）。我编辑了这篇文章，展示了我所做的尝试和得到的结果

    Item Id      State  Created Date     Date         Severity
0    327863        New   2019-02-11    2019-10-03         1
9    327863   Approved   2019-02-11    2019-12-05         1
12   327863  Committed   2019-02-11    2019-12-26         1
16   327863       Done   2019-02-11    2020-01-23         1
27   327864        New   2019-02-11    2019-10-03         1
33   327864  Committed   2019-02-11    2019-11-14         1
42   327864       Done   2019-02-11    2020-01-16         1
53   341283   Approved   2019-03-11    2019-10-03         1
57   341283       Done   2019-03-11    2019-10-31         1

State                                             Approved  Committed       Done  Duplicate        New
Item Id      Created Date            Severity                                                       
194795       2018-09-18              3.0        2019-10-10 2019-10-17 2019-10-24        NaT 2019-10-03
194808       2018-09-18              3.0               NaT        NaT        NaT 2019-10-03        NaT

我正在执行以下操作以合并行

s = dfdr.groupby(['Item Id','Created Date', 'Severity']).cumcount()
df1 = dfdr.set_index(['Item Id','Created Date', 'Severity', s]).unstack().sort_index(level=1, axis=1)

df1=df1.reset_index()
print(df1[['Item Id', 'Created Date', 'Severity', 'State','Date']])

输出向我展示了我被告知要避免的内容，链式索引

       Item Id            Created Date   Severity      State                                   Date
                                                           0          1          2     3          0          1          2          3
0       194795 2018-09-18 16:11:25.330        3.0        New   Approved  Committed  Done 2019-10-03 2019-10-10 2019-10-17 2019-10-24
1       194808 2018-09-18 16:11:25.330        3.0  Duplicate        NaN        NaN   NaN 2019-10-03        NaT        NaT        NaT
2       270787 2018-11-27 15:55:02.207        1.0        New  Duplicate        NaN   NaN 2019-10-03 2019-10-10        NaT        NaT

要在绘图中使用数据，我相信我想要的不是嵌套的数据，而是类似于以下内容的数据，但不确定如何到达那里

Item Id    Created Date   Severity   New   NewDate      Approved      AppDate   Committed   CommDate   Done   Done Date
123456     3/25/2020         3       New   2019-10-03   Approved   2019-11-05         NaN        NaT   Done  2020-02-17

在添加pivot_表并根据Sikan答案重置_索引后，我更接近了，但我没有得到相同的输出。这是我得到的输出

    Item Id      State  Created Date     Date         Severity
0    327863        New   2019-02-11    2019-10-03         1
9    327863   Approved   2019-02-11    2019-12-05         1
12   327863  Committed   2019-02-11    2019-12-26         1
16   327863       Done   2019-02-11    2020-01-23         1
27   327864        New   2019-02-11    2019-10-03         1
33   327864  Committed   2019-02-11    2019-11-14         1
42   327864       Done   2019-02-11    2020-01-16         1
53   341283   Approved   2019-03-11    2019-10-03         1
57   341283       Done   2019-03-11    2019-10-31         1

State                                             Approved  Committed       Done  Duplicate        New
Item Id      Created Date            Severity                                                       
194795       2018-09-18              3.0        2019-10-10 2019-10-17 2019-10-24        NaT 2019-10-03
194808       2018-09-18              3.0               NaT        NaT        NaT 2019-10-03        NaT

这就是我现在的代码

df = pd.read_excel(r'C:\Users\xxx\Documents\Excel\DataSample.xlsx')
df = df.drop_duplicates(subset=['Item Id', 'State','Created Date'], keep='first')
df['Severity'] = df['Severity'].replace(np.nan,3)
df = pd.pivot_table(df, index=['Item Id', 'Created Date', 'Severity'], columns=['State'], values='Date', aggfunc=lambda x: x)
df.reset_index()
print(df)

这是输出

State                                     Approved  Committed       Done  Duplicate        New
Item Id      Created Date    Severity                                                       
194795       2018-09-18      3.0        2019-10-10 2019-10-17 2019-10-24        NaT 2019-10-03     
194808       2018-09-18      3.0               NaT        NaT        NaT 2019-10-03        NaT
270787       2018-11-27      1.0               NaT        NaT        NaT 2019-10-10 2019-10-03

谢谢

您可以使用pd.pivot\u表来完成以下任务：

df = pd.pivot_table(dfdr, index=['Item Id', 'Created Date', 'Severity'], columns=['State'], values='Date', aggfunc=lambda x: x)
df = df.reset_index()

输出：

    ItemId  CreatedDate     Severity    Approved    Committed   Done        New
0   327863  2019-02-11      1           2019-12-05  2019-12-26  2020-01-23  2019-10-03
1   327864  2019-02-11      1           NaN         2019-11-14  2020-01-16  2019-10-03
2   341283  2019-03-11      1           2019-10-03  NaN         2019-10-31  NaN

您可以使用pd.pivot_表执行以下操作：

df = pd.pivot_table(dfdr, index=['Item Id', 'Created Date', 'Severity'], columns=['State'], values='Date', aggfunc=lambda x: x)
df = df.reset_index()

输出：

    ItemId  CreatedDate     Severity    Approved    Committed   Done        New
0   327863  2019-02-11      1           2019-12-05  2019-12-26  2020-01-23  2019-10-03
1   327864  2019-02-11      1           NaN         2019-11-14  2020-01-16  2019-10-03
2   341283  2019-03-11      1           2019-10-03  NaN         2019-10-31  NaN

感谢您提供原始数据和输出。感谢您提供原始数据和输出。感谢@Sikan。桌子几乎把它给了我。我已经编辑了关于结果的细节，这仍然给了我一些层次性的东西。至少从标题在打印上的显示方式来看，它是分层的。感谢您显示新结果。您是否在groupby之后使用了pivot_表？我的答案应该替换groupby。在删除groupby之后，我已经提供了上面的确切代码和确切输出。同样的结果。谢谢@Sikan。我从pivot_表中获得的输出在Jupyter中用于绘图。谢谢@Sikan。桌子几乎把它给了我。我已经编辑了关于结果的细节，这仍然给了我一些层次性的东西。至少从标题在打印上的显示方式来看，它是分层的。感谢您显示新结果。您是否在groupby之后使用了pivot_表？我的答案应该替换groupby。在删除groupby之后，我已经提供了上面的确切代码和确切输出。同样的结果。谢谢@Sikan。我从pivot_表获得的输出在Jupyter中用于绘图。