Python 如何消除数据帧中列中每行的重复值

Python 如何消除数据帧中列中每行的重复值,python,pandas,dataframe,Python,Pandas,Dataframe,我遇到了一个问题,如果date\u x和date\u y中都有重复的日期,则假日列的输出似乎会为具有相同确切日期的行的每个外观添加假日实例。下面是我正在使用的代码和一个小样本数据集来说明我的问题 from pandas.tseries.holiday import USFederalHolidayCalendar from datetime import datetime import pandas as pd cal = USFederalHolidayCalendar() holidays

我遇到了一个问题,如果
date\u x
date\u y
中都有重复的日期,则假日列的输出似乎会为具有相同确切日期的行的每个外观添加假日实例。下面是我正在使用的代码和一个小样本数据集来说明我的问题

from pandas.tseries.holiday import USFederalHolidayCalendar
from datetime import datetime
import pandas as pd

cal = USFederalHolidayCalendar()
holidays = (pd.DataFrame(cal.holidays(return_name=True), columns=['Holiday'])
            .reset_index()
            .rename({'index': 'Date'}, axis=1))
holidays['Date'] = pd.to_datetime(holidays['Date'])
df= pd.DataFrame({'Date_x': {0: '2020-12-22', 1: '2020-06-20', 2: '2020-02-11', 3: '2020-05-22', 4: '2020-12-22', 5: '2020-12-20', 6: '2020-12-20', 7: '2020-12-22'},
                  'Date_y': {0: '2021-01-01', 1: '2020-07-11', 2: '2020-03-27', 3: '2020-06-27', 4: '2021-01-01', 5: '2020-12-26', 6: '2020-12-27', 7: '2021-01-01'}})
df['Date_x'] = pd.to_datetime(df['Date_x'])
df['Date_y'] = pd.to_datetime(df['Date_y'])

Y = 2000 # dummy leap year to allow input X-02-29 (leap day)
seasons = [('Winter', (date(Y,  1,  1),  date(Y,  3, 20))),
           ('Spring', (date(Y,  3, 21),  date(Y,  6, 20))),
           ('Summer', (date(Y,  6, 21),  date(Y,  9, 22))),
           ('Fall', (date(Y,  9, 23),  date(Y, 12, 20))),
           ('Winter', (date(Y, 12, 21),  date(Y, 12, 31)))]

def get_season(x):
    x = x.replace(year=Y)
    return next(season for season, (start, end) in seasons
                if start <= x <= end)


def get_holiday():
    return pd.DataFrame([(h,y,z) for (h,d) in zip(holidays['Holiday'], holidays['Date'])
     for (y, z) in zip(df['Date_x'], df['Date_y']) if y.date() <= d.date() if d.date() <= z.date()], columns=['Holiday', 'Date_x', 'Date_y'])


s1 = df['Date_x'].apply(lambda x: get_season(x))
s2 = df['Date_y'].apply(lambda x: get_season(x))
df['Season']= [', '.join(list(set([x,y]))) for (x,y) in zip(s1,s2)]
dft = get_holiday()
dft = dft.groupby(['Date_x', 'Date_y'])['Holiday'].apply(lambda x: ', '.join(list(x)))
df = pd.merge(df, dft, how='left', on=['Date_x', 'Date_y'])
从pandas.tseries.holiday导入USFederalHolidayCalendar
从日期时间导入日期时间
作为pd进口熊猫
cal=USFederalHolidayCalendar()
假日=(pd.DataFrame(cal.holidays(return\u name=True),columns=['Holiday']))
.reset_index()
.rename({'index':'Date'},axis=1))
假日['Date']=pd.to_datetime(假日['Date'])
数据帧({'Date_x':{0:'2020-12-22',1:'2020-06-20',2:'2020-02-11',3:'2020-05-22',4:'2020-12-22',5:'2020-12-20',6:'2020-12-20',7:'2020-12-22'),
‘日期’:{0:'2021-01-01',1:'2020-07-11',2:'2020-03-27',3:'2020-06-27',4:'2021-01-01',5:'2020-12-26',6:'2020-12-27',7:'2021-01-01'})
df['Date\u x']=pd.to\u datetime(df['Date\u x'])
df['Date\u y']=pd.to\u datetime(df['Date\u y'])
Y=2000#允许输入X-02-29(闰日)的虚拟闰年
季节=[(‘冬季’,(日期(Y,1,1),日期(Y,3,20)),
(‘春天’,(日期(Y,3,21),日期(Y,6,20)),
(‘夏季’,(日期(Y,6,21),日期(Y,9,22)),
(‘秋天’,(日期(Y,9,23),日期(Y,12,20)),
(‘冬季’,(日期(Y,12,21),日期(Y,12,31))]
def get_季节(x):
x=x.更换(年份=Y)
下一个(一季接一季,(开始,结束)按季返回
如果开始改变:

dft = dft.groupby(['Date_x', 'Date_y'])['Holiday'].apply(lambda x: ', '.join(list(x)))
致:


@Larry,我刚刚添加了一个简单的drop_duplicates()语句来删除它们

df = pd.merge(df, dft, how='left', on=['Date_x', 'Date_y'])
df = df.drop_duplicates()
print(df)

我在您上一条语句之后添加了drop_duplicates()

我认为它仍然会有重复项,因为合并之后会发生
df = pd.merge(df, dft, how='left', on=['Date_x', 'Date_y'])
df = df.drop_duplicates()
print(df)