Python 如何在dataframe中连接多个文本字段
如何将数据框中某些文本列的唯一值合并到单个列中。 例如:Python 如何在dataframe中连接多个文本字段,python,pandas,Python,Pandas,如何将数据框中某些文本列的唯一值合并到单个列中。 例如: data = [[1,"US","California","Los Angeles"], [1,"US","California","San Francisco"], [1,"US","California","San Diego"], [1,"US","Texas","Austin"], [2,"IND","Maharashtra","Mumbai"], [
data = [[1,"US","California","Los Angeles"],
[1,"US","California","San Francisco"],
[1,"US","California","San Diego"],
[1,"US","Texas","Austin"],
[2,"IND","Maharashtra","Mumbai"],
[2,"IND","Maharashtra","Pune"],
[2,"IND","Maharashtra","Nagpur"]]
df = pd.DataFrame(data, columns = ['Country_Id', 'Country','State','Place'])
从上面的数据框中,如何生成一个字段为Country\u Id
的输出,第二个字段为包含Country
、State
、Place
唯一值的文本字段的输出
比如:
-
< > 1、美国加利福尼亚、德克萨斯、洛杉矶、旧金山、圣地亚哥、奥斯丁<
- 印度马哈拉施特拉邦孟买浦那布尔
请忽略组合文本字段的含义使用
groupby
和apply
在unique
和genexp上使用双join
df.groupby('Country_Id').apply(lambda x: ' '.join(' '.join(x[col].unique()) for col in x))
.to_frame('Country-State-Place')
Out[434]:
Country-State-Place
Country_Id
1 US California Texas Los Angeles San Francisco San Diego Austin
2 IND Maharashtra Mumbai Pune Nagpur
退房