Python PANDES groupby长度与NAN不匹配

Python PANDES groupby长度与NAN不匹配,python,pandas,Python,Pandas,我正在尝试对熊猫中的groupby对象应用变换 代码如下: df = pd.DataFrame({ 'id':['012', '013', '014', '014', '015', '015', '016', '016', '017', '017'], 'date': pd.to_datetime( ['2008-11-05', 'NaT', 'NaT', '2008-11-05', 'NaT', '2008-11-05', 'NaT', '20

我正在尝试对熊猫中的
groupby
对象应用变换

代码如下:

df = pd.DataFrame({
    'id':['012', '013', '014', '014', '015', '015', '016', '016', '017', '017'],
    'date': pd.to_datetime(
        ['2008-11-05', 'NaT', 'NaT', '2008-11-05', 'NaT', '2008-11-05',
         'NaT', '2008-11-05', 'NaT', '2008-11-05']),
    'grade': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
              np.nan, np.nan],
    'length': [1, 2, 3, 4, 5, 6, 7, 8, np.nan, 10]})

df['uuid'] = np.nan

df

Out[7]: 
    id       date  grade  length  uuid
0  012 2008-11-05    NaN     1.0   NaN
1  013        NaT    NaN     2.0   NaN
2  014        NaT    NaN     3.0   NaN
3  014 2008-11-05    NaN     4.0   NaN
4  015        NaT    NaN     5.0   NaN
5  015 2008-11-05    NaN     6.0   NaN
6  016        NaT    NaN     7.0   NaN
7  016 2008-11-05    NaN     8.0   NaN
8  017        NaT    NaN     NaN   NaN
9  017 2008-11-05    NaN    10.0   NaN

In[8]:
df.groupby(['id', 'date']).uuid.transform(lambda g: uuid.uuid4())

Out[9]:
...
...
ValueError: Length mismatch: Expected axis has 5 elements, new values have 10 elements
与问题类似,我假设问题出在日期列中的
NaT
,所以我尝试
df.fillna('nan')
。不幸的是,这引发了相同的错误-这是因为日期列将字符串
'nan'
识别为
np.nan

我尝试用字符串填充,
'nullv'
,这使我得到了
'ValueError:无法将字符串转换为时间戳'

因此,我当前的解决方案如下所示:

df['uuid'] = np.nan

df.date = df.date.astype('str')
df.uuid = df.groupby(['id', 'date']).uuid.transform(lambda g: uuid.uuid4())
df.date = pd.to_datetime(df.date)
df

Out[9]: 
    id       date  grade  length                                  uuid
0  012 2008-11-05    NaN     1.0  267b9c5f-41d9-4a8c-91af-aaa2dbddc911
1  013        NaT    NaN     2.0  0e7ae8fa-cf64-4c3a-abd8-85d40b6253a4
2  014        NaT    NaN     3.0  d1de91d8-099e-492c-8434-94ebd269280f
3  014 2008-11-05    NaN     4.0  91b42203-1a31-4dfe-8566-abba3686734f
4  015        NaT    NaN     5.0  6a83b025-98c4-4196-8bfb-1ca88e426d8b
5  015 2008-11-05    NaN     6.0  d0ba9dfc-fa2b-4a1f-995b-66f798bd9259
6  016        NaT    NaN     7.0  67a26331-03de-440e-8958-89a375007535
7  016 2008-11-05    NaN     8.0  ca94c6f2-1520-4162-94cf-cf4536fb8828
8  017        NaT    NaN     NaN  133da892-a0ef-4fa3-9557-14049e8f3b66
9  017 2008-11-05    NaN    10.0  4a19db2b-0166-45e0-aff0-54f83b479507

除了转换成字符串然后再转换回来之外,肯定还有另外一种方法?

这似乎是groupby()的一个公开问题,我上面介绍的方法确实是目前实现这一点的方法,请参见