Python 如何更新字符串的dataframe列中的子字符串_Python_Pandas

Python 如何更新字符串的dataframe列中的子字符串

python pandas

Python 如何更新字符串的dataframe列中的子字符串,python,pandas,Python,Pandas,我有一个数据帧“sp500news”，如下所示： date_publish \ 79944 2007-01-29 19:08:35 181781 2007-12-14 19:39:06 213175 2008-01-22 11:17:19 93554 2008-01-22 18:52:56 ... title 79944 Microsoft Vista corporate sales go very well

我有一个数据帧“sp500news”，如下所示：

date_publish  \
79944   2007-01-29 19:08:35   
181781  2007-12-14 19:39:06   
213175  2008-01-22 11:17:19   
93554   2008-01-22 18:52:56   
  ...

title  
 79944   Microsoft Vista corporate sales go very well                                            
 181781  Williams No Anglican consensus on Episcopal Church                                      
 213175  CSX quarterly profit rises                                                              
 93554   Citigroup says 30 bln capital helps exceed target                                       
    ...

我试图用df“成分”的“符号”列中相应的股票代码更新每个公司名称，如下所示：

Symbol  Name    Sector
0   MMM 3M  Industrials
1   AOS A.O. Smith  Industrials
2   ABT Abbott  Health Care
3   ABBV    AbbVie  Health Care
...
116  C      Citigroup    Financials       
...

我已经试过了：

for item in sp500news['title']:
    for word in item:
        if word in constituents['Name']:
            indx = constituents['Name'].index(word)
            str.replace(word, constituents['Symbol'][indx])

请尝试以下代码

df = pd.DataFrame({'title': ['Citigroup says 30 bln capital helps exceed target',
                             'Williams No Anglican consensus on Episcopal Church',
                             'Microsoft Vista corporate sales go very well']})

constituents = pd.DataFrame({'symbol': ['MMM', 'C', 'MCR', 'WLM'],
                             'name': ['3M', 'Citigroup', 'Microsoft', 'Williams']})

for name, symbol in zip(constituents['name'], constituents['symbol']):
    df['title'] = df['title'].str.replace(name, symbol)

输出

                                           title
0      C says 30 bln capital helps exceed target
1  WLM No Anglican consensus on Episcopal Church
2         MCR Vista corporate sales go very well

我基本上只是复制了几行你的sp500news['title]，并合成了一些成分['Name']，只是为了演示转换。本质上，我正在从sp500news访问列标题的pd.Series对象的string方法对象，因此当它找到匹配的公司名称时，我可以对其应用replace。

尝试以下操作：

以下是表示数据的虚拟数据帧

df1 = pd.DataFrame({'Symbol': ['MV', 'AOS','ABT'],
                  'Name': ['Microsoft Vista', 'A.0.', 'Abbot']})
df1
  Symbol    Name
0   MV  Microsoft Vista
1   AOS A.0.
2   ABT Abbot
df2 = pd.DataFrame({'title': [79944, 181781, 213175],
                   'comment': ['Microsoft Vista corporate sales go very well',
                              'Abbot consensus on Episcopal Church',
                              'A.O. says 30 bln captial helps exceed target']})

    title   comment
0   79944   Microsoft Vista corporate sales go very well
1   181781  Abbot consensus on Episcopal Church
2   213175  A.O. says 30 bln captial helps exceed target

制作一个将名称映射到各自符号的值字典

rep = dict(zip(df1.Name,df1.Symbol))
rep

{'Microsoft Vista': 'MV', 'A.0.': 'AOS', 'Abbot': 'ABT'}

使用该方法替换它们

您希望您的输出是什么样子的..sp500新闻中的“标题”列，所有公司名称都替换为“成分”中“符号”列的股票代码值，股票代码值在哪里？“成分”中的“符号”列我如何知道符号对应于哪个公司名称？您的问题非常不清楚。请避免在数据帧上使用for循环。因为循环很慢，我有点困惑。你想在函数中放什么？你所说的物体到底是什么意思？

df2['comment'] = df2['comment'].replace(rep, regex = True)
df2
   title    comment
0   79944   MV corporate sales go very well
1   181781  ABT consensus on Episcopal Church
2   213175  A.O. says 30 bln captial helps exceed target