Python 如何更新字符串的dataframe列中的子字符串
我有一个数据帧“sp500news”,如下所示:Python 如何更新字符串的dataframe列中的子字符串,python,pandas,Python,Pandas,我有一个数据帧“sp500news”,如下所示: date_publish \ 79944 2007-01-29 19:08:35 181781 2007-12-14 19:39:06 213175 2008-01-22 11:17:19 93554 2008-01-22 18:52:56 ... title 79944 Microsoft Vista corporate sales go very well
date_publish \
79944 2007-01-29 19:08:35
181781 2007-12-14 19:39:06
213175 2008-01-22 11:17:19
93554 2008-01-22 18:52:56
...
title
79944 Microsoft Vista corporate sales go very well
181781 Williams No Anglican consensus on Episcopal Church
213175 CSX quarterly profit rises
93554 Citigroup says 30 bln capital helps exceed target
...
我试图用df“成分”的“符号”列中相应的股票代码更新每个公司名称,如下所示:
Symbol Name Sector
0 MMM 3M Industrials
1 AOS A.O. Smith Industrials
2 ABT Abbott Health Care
3 ABBV AbbVie Health Care
...
116 C Citigroup Financials
...
我已经试过了:
for item in sp500news['title']:
for word in item:
if word in constituents['Name']:
indx = constituents['Name'].index(word)
str.replace(word, constituents['Symbol'][indx])
请尝试以下代码
df = pd.DataFrame({'title': ['Citigroup says 30 bln capital helps exceed target',
'Williams No Anglican consensus on Episcopal Church',
'Microsoft Vista corporate sales go very well']})
constituents = pd.DataFrame({'symbol': ['MMM', 'C', 'MCR', 'WLM'],
'name': ['3M', 'Citigroup', 'Microsoft', 'Williams']})
for name, symbol in zip(constituents['name'], constituents['symbol']):
df['title'] = df['title'].str.replace(name, symbol)
输出
title
0 C says 30 bln capital helps exceed target
1 WLM No Anglican consensus on Episcopal Church
2 MCR Vista corporate sales go very well
我基本上只是复制了几行你的sp500news['title],并合成了一些成分['Name'],只是为了演示转换。本质上,我正在从sp500news访问列标题的pd.Series对象的string方法对象,因此当它找到匹配的公司名称时,我可以对其应用replace。尝试以下操作:
以下是表示数据的虚拟数据帧
df1 = pd.DataFrame({'Symbol': ['MV', 'AOS','ABT'],
'Name': ['Microsoft Vista', 'A.0.', 'Abbot']})
df1
Symbol Name
0 MV Microsoft Vista
1 AOS A.0.
2 ABT Abbot
df2 = pd.DataFrame({'title': [79944, 181781, 213175],
'comment': ['Microsoft Vista corporate sales go very well',
'Abbot consensus on Episcopal Church',
'A.O. says 30 bln captial helps exceed target']})
title comment
0 79944 Microsoft Vista corporate sales go very well
1 181781 Abbot consensus on Episcopal Church
2 213175 A.O. says 30 bln captial helps exceed target
制作一个将名称映射到各自符号的值字典
rep = dict(zip(df1.Name,df1.Symbol))
rep
{'Microsoft Vista': 'MV', 'A.0.': 'AOS', 'Abbot': 'ABT'}
使用该方法替换它们
您希望您的输出是什么样子的..sp500新闻中的“标题”列,所有公司名称都替换为“成分”中“符号”列的股票代码值,股票代码值在哪里?“成分”中的“符号”列我如何知道符号对应于哪个公司名称?您的问题非常不清楚。请避免在数据帧上使用for循环。因为循环很慢,我有点困惑。你想在函数中放什么?你所说的物体到底是什么意思?
df2['comment'] = df2['comment'].replace(rep, regex = True)
df2
title comment
0 79944 MV corporate sales go very well
1 181781 ABT consensus on Episcopal Church
2 213175 A.O. says 30 bln captial helps exceed target