Python Pandas.apply在spacy doc列上返回none值
我在我的“sp500news3”上运行以下命令,它返回一个None值Python Pandas.apply在spacy doc列上返回none值,python,pandas,nlp,spacy,Python,Pandas,Nlp,Spacy,我在我的“sp500news3”上运行以下命令,它返回一个None值 def extract_ticker(title): for word in title: if word in constituents['Symbol']: return word sp500news3['tickers'] = sp500news3['title'].apply(extract_ticker) #sp500news3 sample: index
def extract_ticker(title):
for word in title:
if word in constituents['Symbol']:
return word
sp500news3['tickers'] = sp500news3['title'].apply(extract_ticker)
#sp500news3 sample:
index date_publish title tickers
0 79944 2007-01-29 19:08:35 (MSFT, Vista, corporate, sales, go, very, well) None
1 181781 2007-12-14 19:39:06 (WMB, No, Anglican, consensus, on, Episcopal, Church) None
2 213175 2008-01-22 11:17:19 (CSX, quarterly, profit, rises) None
3 93554 2008-01-22 18:52:56 (C, says, 30, bln, capital, helps, exceed, target) None
成分['Symbol']:样本
0 TWX
1 C
2 MSFT
3 WMB ...
从以下位置复制spacy文档:
constituents = pd.DataFrame({"Symbol":["TWX","C","MSFT","WMB"]})
sp500news3 = pd.DataFrame({"title":["MSFT Vista corporate sales go very well","WMB No Anglican consensus on Episcopal Church","CSX quarterly profit rises",'C says 30 bln capital helps exceed target','TWX plans cable spinoff']})
import spacy
nlp = spacy.load('en_core_web_sm')
sp500news3['title'] = sp500news3['title'].apply(nlp)
您必须使用
word.text
,因为当它迭代:
以你的例子:
In [11]: sp500news3['title'].apply(extract_ticker)
Out[11]:
0 MSFT
1 WMB
2 None
3 C
4 TWX
Name: title, dtype: object
你期望发生什么?
成分中的字符串似乎都不在您的标题中。标题是什么数据类型?它是字符串还是元组?据推测,MSFT在成分中,这是第一个成分的预期结果?成分比样本长-包含所有sp500标记-期望从每个标题中提取标记并添加到sp500news3
DF中的标记列标题的数据类型是spacy.tokens.doc.doc
我想不是这样的“不为字符串实现\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
:”但熊猫不像为字符串实现的那样对其进行特殊处理。这仍然不返回任何值,检查了components
数据类型,即str
@W.R components是字符串,但title不是(您说它是Doc)。请分享一个我们可以在其上复制的示例。在发送到spacy doc之前已添加df,spacy doc的代码已添加成分df
In [11]: sp500news3['title'].apply(extract_ticker)
Out[11]:
0 MSFT
1 WMB
2 None
3 C
4 TWX
Name: title, dtype: object