Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/365.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 大熊猫的时间事件研究_Python_Pandas_Dataframe_Pivot Table_Reshape - Fatal编程技术网

Python 大熊猫的时间事件研究

Python 大熊猫的时间事件研究,python,pandas,dataframe,pivot-table,reshape,Python,Pandas,Dataframe,Pivot Table,Reshape,我有以下数据集,其中提供了消费者购买和转售产品的日期: data = [['01/01/2000', '06/03/2000'], ['12/03/2000', '15/08/2000'], ['12/04/2000',np.nan]] df = pd.DataFrame(data, columns = ['Date_buy', 'Date_sell']) Date_buy Date_sell 0 01/01/2000 06/03/200

我有以下数据集,其中提供了消费者购买和转售产品的日期:

data = [['01/01/2000', '06/03/2000'],
        ['12/03/2000', '15/08/2000'],
        ['12/04/2000',np.nan]]  

df = pd.DataFrame(data, columns = ['Date_buy', 'Date_sell'])

     Date_buy   Date_sell
0  01/01/2000  06/03/2000
1  12/03/2000  15/08/2000
2  12/04/2000         NaN
我需要将其转换为一个买卖时间事件格式,该格式描述买卖的动态

  • 更准确地说,我需要创建列,指出产品销售多少个月后
我想创建的最后一个数据帧应该是这样的:

           Date_buy   Date_sell  m_1  m_2  m_3  m_4  m_5  m_6  m_7 ...
0        01/01/2000  06/03/2000    0    0    1    1    1    1    1
1        12/03/2000  15/08/2000    0    0    0    0    0    1    1
2        12/04/2000         NaN    0    0    0    0    0    0    0

一定有快速的方法,但我还没有

不是最优雅的解决方案,但您可以从以下方面入手并加以改进:

diff_func = lambda row: row['Date_sell'].month-row['Date_buy'].month + 12*(row['Date_sell'].year-row['Date_buy'].year)
df['months_diff'] = df.apply(diff_func, axis=1).fillna(0).astype(int) # count how many months between buy and sell

output_columns = ['m'+str(i+1) for i in range(12)]
df = df.join(pd.DataFrame(index = df.index, columns = ['m'+str(i) for i in range(12)], data=0))

for i in df.index:
    df.loc[i,output_columns[:df.loc[i]['months_diff']]] = 1

我无法从您的示例中理解计时事件格式的逻辑。你能描述清楚吗?@ItamarMushkin抱歉,我不清楚。提前感谢您的回答:)
import numpy as np
import pandas as pd

data = [['01/01/2000', '06/03/2000'],
        ['12/03/2000', '15/08/2000'],
        ['12/04/2000',np.nan]]  

df = pd.DataFrame(data, columns = ['Date_buy', 'Date_sell'])

df['Date_buy'] = pd.to_datetime(df['Date_buy'], format='%d/%m/%Y')
df['Date_sell'] = pd.to_datetime(df['Date_sell'], format='%d/%m/%Y')

df['date_diff'] = df.Date_sell.dt.month - df.Date_buy.dt.month
cols = [f'm_{x}' for x in range(1, int(df['date_diff'].max()))]

df2 = pd.DataFrame(columns=cols)
res = pd.concat([df, df2], sort=False)

for idx, val in res.date_diff.iteritems():
  if np.isnan(val) != True:
    for idx2 in range(len(cols)):
      if idx2 <= val:
        res.at[idx, f'm_{idx2}'] = 0
      else:
        res.at[idx, f'm_{idx2}'] = 1

res.loc[res['date_diff'].apply(np.isnan), cols] = 0

print(res)
    Date_buy  Date_sell  date_diff  m_1  m_2  m_3  m_4
0 2000-01-01 2000-03-06        2.0    0    0    1    1
1 2000-03-12 2000-08-15        5.0    0    0    0    0
2 2000-04-12        NaT        NaN    0    0    0    0