Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/346.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 大熊猫梅尔戈夫对NaT的解释_Python_Pandas - Fatal编程技术网

Python 大熊猫梅尔戈夫对NaT的解释

Python 大熊猫梅尔戈夫对NaT的解释,python,pandas,Python,Pandas,我的代码在所有日期都不缺失的情况下工作,但一旦遇到1个缺失值,我就会得到一个错误。为了在最终结果中保留缺少的行,最好的解释方法是什么 fake_hol = '{"holiday_dt":{"0":"2000-04-23","1":"2001-04-15","2":"2002-03-31","3":"2000-01-01"

我的代码在所有日期都不缺失的情况下工作,但一旦遇到1个缺失值,我就会得到一个错误。为了在最终结果中保留缺少的行,最好的解释方法是什么

fake_hol = '{"holiday_dt":{"0":"2000-04-23","1":"2001-04-15","2":"2002-03-31","3":"2000-01-01","4":"2000-01-17","5":"2000-05-29","6":"2000-07-04","7":"2000-09-04","8":"2000-10-09","9":"2000-11-11","10":"2000-11-23","11":"2000-11-24","12":"2000-12-25","13":"2000-12-23","14":"2001-01-01","15":"2001-01-15","16":"2001-05-28","17":"2001-07-04","18":"2001-09-03","19":"2001-10-08","20":"2001-11-11","21":"2001-11-22","22":"2001-11-23","23":"2001-12-25","24":"2001-12-22","25":"2002-01-01","26":"2002-01-21","27":"2002-05-27","28":"2002-07-04","29":"2002-09-02","30":"2002-10-14","31":"2002-11-11","32":"2002-11-28","33":"2002-11-29","34":"2002-12-25","35":"2002-12-21"},"holiday":{"0":"Easter","1":"Easter","2":"Easter","3":"New Year\'s Day","4":"Martin Luther King, Jr. Day","5":"Memorial Day","6":"Independence Day","7":"Labor Day","8":"Columbus Day","9":"Veterans Day","10":"Thanksgiving","11":"Black Friday","12":"Christmas Day","13":"Sat Before X-max","14":"New Year\'s Day","15":"Martin Luther King, Jr. Day","16":"Memorial Day","17":"Independence Day","18":"Labor Day","19":"Columbus Day","20":"Veterans Day","21":"Thanksgiving","22":"Black Friday","23":"Christmas Day","24":"Sat Before X-max","25":"New Year\'s Day","26":"Martin Luther King, Jr. Day","27":"Memorial Day","28":"Independence Day","29":"Labor Day","30":"Columbus Day","31":"Veterans Day","32":"Thanksgiving","33":"Black Friday","34":"Christmas Day","35":"Sat Before X-max"}}'

dfA_nomissing = pd.DataFrame({'x1': [3,4,2,4,5,6], 'x2': ['A','Z','G','I','D','H'], 'dt': ['2001-01-23','2001-08-14','2001-04-23','2001-08-08','2001-09-17','2001-11-11'], 'y': [1,1,1,0,1,0]})
dfA_1missing = pd.DataFrame({'x1': [3,4,2,4,5,6], 'x2': ['A','Z','G','I','D','H'], 'dt': ['2001-01-23','','2001-04-23','2001-08-08','2001-09-17','2001-11-11'], 'y': [1,1,1,0,1,0]})
dfB = pd.read_json(fake_hol)

dfA_nomissing
+----+----+------------+---+
| x1 | x2 |     dt     | y |
+----+----+------------+---+
|  3 | A  | 2001-01-23 | 1 |
|  4 | Z  | 2001-08-14 | 1 |
|  2 | G  | 2001-04-23 | 1 |
|  4 | I  | 2001-08-08 | 0 |
|  5 | D  | 2001-09-17 | 1 |
|  6 | H  | 2001-11-11 | 0 |
+----+----+------------+---+

dfB
+------------+-----------------------------+
| holiday_dt |           holiday           |
+------------+-----------------------------+
| 2000-04-23 | Easter                      |
| 2001-04-15 | Easter                      |
| 2002-03-31 | Easter                      |
| 2000-01-01 | New Year's Day              |
| 2000-01-17 | Martin Luther King, Jr. Day |
| ...        | ...                         |
| 2002-11-11 | Veterans Day                |
| 2002-11-28 | Thanksgiving                |
| 2002-11-29 | Black Friday                |
| 2002-12-25 | Christmas Day               |
| 2002-12-21 | Sat Before X-max            |
+------------+-----------------------------+
下面是比较dfA中“dt”列并从dfB中添加一些时间相关特性的代码

def add_calendar_cols(dfMain, dfEvents, date_col_list, eventname_col='Name', eventdate_col='Date'):
    
    # dont modify the original for testing purposes
    df = dfMain.copy(deep=True)
    
    # convert date cols to datetime
    for c in date_col_list:
        df[c] = pd.to_datetime(df[c])
    dfEvents[eventdate_col] = pd.to_datetime(dfEvents[eventdate_col])
    
    # function that calculates days until next event
    def calc_days(df, dfCal, direction, mainjoinkey, eventjoinkey):
        s = pd.merge_asof(df.sort_values(mainjoinkey), dfCal.sort_values(eventjoinkey), left_on=mainjoinkey, right_on=eventjoinkey, direction=direction)
        s = (s[eventjoinkey] - s[mainjoinkey]).dt.days.abs()
        return s
    
    # unique list of events
    unique_events = dfEvents[eventname_col].unique().tolist()
    
    # loop in case there are multiple date columns
    for dtcol in date_col_list:
        
        # dataframe of unique dates
        dfDates = pd.DataFrame(df[dtcol].unique(),columns=[dtcol])
        
        # calc days until the next event
        dfDates['until_next'] = calc_days(dfDates, dfEvents, 'forward', dtcol, eventdate_col)

        # do the same for each specific event
        for e in unique_events:  
            dfDates[dtcol + '_days_until_' + e] = calc_days(dfDates, dfEvents[dfEvents[eventname_col].eq(e)], 'forward', dtcol, eventdate_col)

        # merge everything back to the original dataframe    
        df = df.merge(dfDates, how='left', left_on=dtcol, right_on=dtcol)
        
        return df
这项工作:

result = add_calendar_cols(dfA_nomissing, dfB, ['dt'], eventname_col='holiday', eventdate_col='holiday_dt')
这给了我一个错误
ValueError:合并键左侧包含空值

result = add_calendar_cols(dfA_1missing, dfB, ['dt'], eventname_col='holiday', eventdate_col='holiday_dt')



<ipython-input-93-67afeca366e9> in calc_days(df, dfCal, direction, mainjoinkey, eventjoinkey)
     12     # requires pandas >= 1.1.0
     13     def calc_days(df, dfCal, direction, mainjoinkey, eventjoinkey):
---> 14         s = pd.merge_asof(df.sort_values(mainjoinkey), dfCal.sort_values(eventjoinkey), left_on=mainjoinkey, right_on=eventjoinkey, direction=direction)
     15         s = (s[eventjoinkey] - s[mainjoinkey]).dt.days.abs()
     16         return s
result=add_calendar_cols(dfA_1 missing,dfB,['dt'],eventname_col='holiday',eventdate_col='holiday_dt')
在计算日内(df、dfCal、方向、mainjoinkey、eventjoinkey)
12#要求熊猫>=1.1.0
13个def计算日(df、dfCal、方向、主连接键、事件连接键):
--->14 s=pd.merge\u asof(df.sort\u值(mainjoinkey)、dfCal.sort\u值(eventjoinkey)、left\u on=mainjoinkey、right\u on=eventjoinkey、direction=direction)
15 s=(s[eventjoinkey]-s[mainjoinkey]).dt.days.abs()
16回s

我找到了一种解决问题的方法,我将它发布在这里,以防其他人在
merge\u asof
中遇到这种情况

我相信函数本身无法处理它,所以我所做的是
用一个它可以执行操作的值屏蔽
缺少的值,然后返回并手动使它们为NaN。根据上述内容,修改如下:

    def calc_days(df, dfCal, direction, mainjoinkey, eventjoinkey):
        
        df = df.copy(deep=True)
        # mask missing values with the minimum (this is temporary to avoid an error)
        df['__temp'] = df[mainjoinkey].mask(df[mainjoinkey].isnull(), df[mainjoinkey].min())
        
        # calculate days until or since, based on direction (which is passed in)
        s = pd.merge_asof(df.sort_values(['__temp']), dfCal.sort_values(eventjoinkey), left_on=['__temp'], right_on=eventjoinkey, direction=direction)
        s = ((s[eventjoinkey] - s['__temp']).dt.days.abs() * np.where(df[mainjoinkey].notnull(), 1, np.NaN))

        return s

这感觉不是很优雅,我也不为此感到骄傲,但它现在似乎起作用了。如果有人找到更好的解决方案,请告诉我,但我找不到。

这是否回答了您的问题?不,merge_asof不做外部连接,我实际上想做一个内部连接(就像它所做的那样),但是左侧有一个空白的问题。我试着为丢失的物品做一个面具,但没能让它起作用