Python 如果至少有一个单元格为NaN，则熊猫将加入行_Python_Pandas

Python 如果至少有一个单元格为NaN，则熊猫将加入行

python pandas

Python 如果至少有一个单元格为NaN，则熊猫将加入行,python,pandas,Python,Pandas,我有一个从PDF文件中提取的文本构建的熊猫数据框架。看起来是这样的： index date description1 description2 value1 value2 0 18-01-2019 some more 1 2 1 NaN text

我有一个从PDF文件中提取的文本构建的熊猫数据框架。看起来是这样的：

index      date         description1        description2        value1        value2
   0       18-01-2019    some                  more                1             2
   1       NaN           text                  text                NaN           NaN
   2       NaN           here                   NaN                NaN           NaN
   3       19-01-2019    some                  some                3             4
   4       NaN           text                  more                NaN           NaN
   5       NaN           here                  text                NaN           NaN
   6       NaN            NaN                  here                NaN           NaN
   .
   .
   .

df_new = df.groupby('date', as_index=False).agg({'description1': lambda x: ' '.join(x.values)}).reset_index(drop=True)

始终至少有一行没有NaN，并且该行将始终包含日期和值。只有描述在多行上

有没有一种方法可以基于（比如说）日期将行连接到下面的行，直到值不在NaN中，并连接描述

预期产出：

index      date         description1        description2           value1        value2
   0       18-01-2019    some text here      more text              1             2
   1       19-01-2019    some text here      some more text here    3             4
   .
   .
   .

一种方法是通过向前填充

日期

（或用于区分组的任何列）来创建分组列，如果是数字，则使用

连接

，并删除缺少的值：

f = lambda x: x.iloc[0] if np.issubdtype(x.dtype, np.number) else ' '.join(x.dropna())

或指定字典中的每一列：

f1 = lambda x: ' '.join(x.dropna())

f = {'date':'first', 'description1':f1, 'description1':f1, 'value1':'first', 'value2':'first'}

应动态创建的内容创建DICT并合并在一起：

f1 = lambda x: ' '.join(x.dropna())

c =['description1','description2']
d1 = dict.fromkeys(c, f1)
d2 = dict.fromkeys(df.columns.difference(c), 'first')
f = {**d1, **d2}

使用fillna和ffill，然后按此时间戳分组，然后使用agg中的描述进行操作：

df['date'] = df['date'].fillna(method='ffill')

df_new = df.groupby('date').agg({'description1': lambda x: ' '.join(x.values)})

更新：对于输出格式，可能需要稍微操纵索引，如下所示：

index      date         description1        description2        value1        value2
   0       18-01-2019    some                  more                1             2
   1       NaN           text                  text                NaN           NaN
   2       NaN           here                   NaN                NaN           NaN
   3       19-01-2019    some                  some                3             4
   4       NaN           text                  more                NaN           NaN
   5       NaN           here                  text                NaN           NaN
   6       NaN            NaN                  here                NaN           NaN
   .
   .
   .

df_new = df.groupby('date', as_index=False).agg({'description1': lambda x: ' '.join(x.values)}).reset_index(drop=True)

as_index=False也可以用作重置_索引的替代方法，对吗？我得到以下错误：类型错误：列表索引必须是整数或片，而不是str。列的数据类型是：date:datetime64；描述1和描述2:str；值1和值2：float64@PetruTanas-什么解决方案？@jezrael我试过第一个，还有动态的。@PetruTanas-你的熊猫版本是什么？因为对我来说工作很好