Python 将DataFrame中的列值与“连接”;“南”;价值观

Python 将DataFrame中的列值与“连接”;“南”;价值观,python,pandas,concatenation,dataframe,Python,Pandas,Concatenation,Dataframe,我正在尝试将PandasDataFrame列与NaN值连接起来 In [96]:df = pd.DataFrame({'col1' : ["1","1","2","2","3","3"], 'col2' : ["p1","p2","p1",np.nan,"p2",np.nan], 'col3' : ["A","B","C","D","E","F"]}) In [97]: df Out[97]: col1 col2 col3 0 1 p1

我正在尝试将Pandas
DataFrame
列与NaN值连接起来

In [96]:df = pd.DataFrame({'col1' : ["1","1","2","2","3","3"],
                'col2'  : ["p1","p2","p1",np.nan,"p2",np.nan], 'col3' : ["A","B","C","D","E","F"]})

In [97]: df
Out[97]: 
  col1 col2 col3
0    1   p1    A
1    1   p2    B
2    2   p1    C
3    2  NaN    D
4    3   p2    E
5    3  NaN    F

In [98]: df['concatenated'] = df['col2'] +','+ df['col3']
In [99]: df
Out[99]: 
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D          NaN
4    3   p2    E         p2,E
5    3  NaN    F          NaN

对于本例,我想分别得到“D”和“F”,而不是“concatenated”列中的“NaN”值?

我认为您的问题并不简单。但是,这里有一个使用numpy矢量化的解决方法:

In [49]: def concat(*args):
    ...:     strs = [str(arg) for arg in args if not pd.isnull(arg)]
    ...:     return ','.join(strs) if strs else np.nan
    ...: np_concat = np.vectorize(concat)
    ...: 

In [50]: np_concat(df['col2'], df['col3'])
Out[50]: 
array(['p1,A', 'p2,B', 'p1,C', 'D', 'p2,E', 'F'], 
      dtype='|S64')

In [51]: df['concatenated'] = np_concat(df['col2'], df['col3'])

In [52]: df
Out[52]: 
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D            D
4    3   p2    E         p2,E
5    3  NaN    F            F

[6 rows x 4 columns]

对于整个数据帧或所需的列,可以首先用空字符串替换NAN

In [6]: df = df.fillna('')

In [7]: df['concatenated'] = df['col2'] +','+ df['col3']

In [8]: df
Out[8]:
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2         D           ,D
4    3   p2    E         p2,E
5    3         F           ,F

我们可以使用
stack
,它将删除
NaN
,然后使用
groupby.agg
,'。连接字符串:

df['concatenated'] = df[['col2', 'col3']].stack().groupby(level=0).agg(','.join)

嘿,谢谢你,猕猴桃,看来这是最简单的方法了。:)不知道为什么,但我不得不稍微修改一下,即
strs=[str(arg)for args in args if Not arg='nan']
return','.join(filter(None,strs))if strs else'
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D            D
4    3   p2    E         p2,E
5    3  NaN    F            F