Python 将DataFrame中的列值与“连接”;“南”;价值观
我正在尝试将PandasPython 将DataFrame中的列值与“连接”;“南”;价值观,python,pandas,concatenation,dataframe,Python,Pandas,Concatenation,Dataframe,我正在尝试将PandasDataFrame列与NaN值连接起来 In [96]:df = pd.DataFrame({'col1' : ["1","1","2","2","3","3"], 'col2' : ["p1","p2","p1",np.nan,"p2",np.nan], 'col3' : ["A","B","C","D","E","F"]}) In [97]: df Out[97]: col1 col2 col3 0 1 p1
DataFrame
列与NaN值连接起来
In [96]:df = pd.DataFrame({'col1' : ["1","1","2","2","3","3"],
'col2' : ["p1","p2","p1",np.nan,"p2",np.nan], 'col3' : ["A","B","C","D","E","F"]})
In [97]: df
Out[97]:
col1 col2 col3
0 1 p1 A
1 1 p2 B
2 2 p1 C
3 2 NaN D
4 3 p2 E
5 3 NaN F
In [98]: df['concatenated'] = df['col2'] +','+ df['col3']
In [99]: df
Out[99]:
col1 col2 col3 concatenated
0 1 p1 A p1,A
1 1 p2 B p2,B
2 2 p1 C p1,C
3 2 NaN D NaN
4 3 p2 E p2,E
5 3 NaN F NaN
对于本例,我想分别得到“D”和“F”,而不是“concatenated”列中的“NaN”值?我认为您的问题并不简单。但是,这里有一个使用numpy矢量化的解决方法:
In [49]: def concat(*args):
...: strs = [str(arg) for arg in args if not pd.isnull(arg)]
...: return ','.join(strs) if strs else np.nan
...: np_concat = np.vectorize(concat)
...:
In [50]: np_concat(df['col2'], df['col3'])
Out[50]:
array(['p1,A', 'p2,B', 'p1,C', 'D', 'p2,E', 'F'],
dtype='|S64')
In [51]: df['concatenated'] = np_concat(df['col2'], df['col3'])
In [52]: df
Out[52]:
col1 col2 col3 concatenated
0 1 p1 A p1,A
1 1 p2 B p2,B
2 2 p1 C p1,C
3 2 NaN D D
4 3 p2 E p2,E
5 3 NaN F F
[6 rows x 4 columns]
对于整个数据帧或所需的列,可以首先用空字符串替换NAN
In [6]: df = df.fillna('')
In [7]: df['concatenated'] = df['col2'] +','+ df['col3']
In [8]: df
Out[8]:
col1 col2 col3 concatenated
0 1 p1 A p1,A
1 1 p2 B p2,B
2 2 p1 C p1,C
3 2 D ,D
4 3 p2 E p2,E
5 3 F ,F
我们可以使用
stack
,它将删除NaN
,然后使用groupby.agg
和,'。连接字符串:
df['concatenated'] = df[['col2', 'col3']].stack().groupby(level=0).agg(','.join)
嘿,谢谢你,猕猴桃,看来这是最简单的方法了。:)不知道为什么,但我不得不稍微修改一下,即strs=[str(arg)for args in args if Not arg='nan']
和return','.join(filter(None,strs))if strs else'
col1 col2 col3 concatenated
0 1 p1 A p1,A
1 1 p2 B p2,B
2 2 p1 C p1,C
3 2 NaN D D
4 3 p2 E p2,E
5 3 NaN F F