Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/315.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 创建一个新列,该列根据条件值查找日期差异_Python_Pandas_Dataframe_Conditional_Multiple Columns - Fatal编程技术网

Python 创建一个新列,该列根据条件值查找日期差异

Python 创建一个新列,该列根据条件值查找日期差异,python,pandas,dataframe,conditional,multiple-columns,Python,Pandas,Dataframe,Conditional,Multiple Columns,我有以下数据帧: df= 我想创建一个新列,根据team1的条件查找日期差异。当芝加哥队是第一队时,我想知道他们上次比赛的天数,不管他们在上一场比赛中是第一队还是第二队 df= 您的预期输出接近,但我将创建一个多索引 使用melt和diff然后使用pivot # melt to get Teams as one columns melt = df.melt('Date').sort_values('Date') # groupby and find the difference melt['d

我有以下数据帧:

df=

我想创建一个新列,根据team1的条件查找日期差异。当芝加哥队是第一队时,我想知道他们上次比赛的天数,不管他们在上一场比赛中是第一队还是第二队

df=


您的预期输出接近,但我将创建一个多索引

使用
melt
diff
然后使用
pivot

# melt to get Teams as one columns
melt = df.melt('Date').sort_values('Date')

# groupby and find the difference
melt['diff'] = melt.groupby('value')['Date'].diff()

# pivot to go back to the original df format
melt.pivot('Date','variable') 

                  value              diff
variable     Team1    Team2     Team1     Team2
      Date              
2018-06-01  Boston   New York    NaT       NaT
2018-06-13  New York Chicago     12 days   NaT
2018-06-27  Boston   New York    26 days   14 days
2018-06-28  Chicago  Boston      15 days   1 days
更新 以下是根据您的评论进行的更新:

# assume this df
    Date         Team1   Team2
0   2018-06-01  Boston    New York
1   2018-06-13  New York  Chicago
2   2018-06-27  Boston    New York
3   2018-06-28  Chicago   Boston
4   2018-06-28  New York  Detroit
代码:


请注意,这一次与第1队上次比赛的结果不同

您是对的。。我已经更正了原始帖子中的错误。不幸的是,当应用到更大的数据帧时,我遇到了这个错误。ValueError:索引包含重复的条目,无法重塑。。。此错误特别发生在上面列出的第三个也是最后一个步骤中。@RyanG73该错误表示日期不唯一。您的数据是否包含多个在同一日期比赛的球队?是的。它包括一个完整的赛季的比赛时,有晚上,其中有几个games@RyanG73见upadte
# melt to get Teams as one columns
melt = df.melt('Date').sort_values('Date')

# groupby and find the difference
melt['diff'] = melt.groupby('value')['Date'].diff()

# pivot to go back to the original df format
melt.pivot('Date','variable') 

                  value              diff
variable     Team1    Team2     Team1     Team2
      Date              
2018-06-01  Boston   New York    NaT       NaT
2018-06-13  New York Chicago     12 days   NaT
2018-06-27  Boston   New York    26 days   14 days
2018-06-28  Chicago  Boston      15 days   1 days
# assume this df
    Date         Team1   Team2
0   2018-06-01  Boston    New York
1   2018-06-13  New York  Chicago
2   2018-06-27  Boston    New York
3   2018-06-28  Chicago   Boston
4   2018-06-28  New York  Detroit
# melt df (same as above example)
melt = df.melt('Date').sort_values('Date')

# find the difference
melt['diff'] = melt.groupby('value')['Date'].diff()

# use pivot_table not pivot
piv = melt.pivot_table(index=['Date', 'diff'], columns='variable', values='value', aggfunc=lambda x:x)

# reset index and dropna from team 1
piv.reset_index(level=1, inplace=True)
piv = piv[~piv['Team1'].isna()]

# merge your original df and your new one together
pd.merge(df, piv[piv.columns[:-1]], on=['Date','Team1'], how='outer').fillna(0)

         Date   Team1     Team2     diff
0   2018-06-01  Boston    New York  0 days
1   2018-06-13  New York  Chicago   12 days
2   2018-06-27  Boston    New York  26 days
3   2018-06-28  Chicago   Boston    15 days
4   2018-06-28  New York  Detroit   1 days