Python Pandas-在列上迭代并更新值
由于tf idf矢量器在遇到新标签时会崩溃,因此我尝试从新输入中删除新标签。如何更新dataframes列的值?我正在做:Python Pandas-在列上迭代并更新值,python,pandas,series,Python,Pandas,Series,由于tf idf矢量器在遇到新标签时会崩溃,因此我尝试从新输入中删除新标签。如何更新dataframes列的值?我正在做: def clean_unseen(dfcol, vectorizer): cleanedstring = "" for entry in dfcol: for word in entry.split(): if word in vectorizer.vocabulary_: cleane
def clean_unseen(dfcol, vectorizer):
cleanedstring = ""
for entry in dfcol:
for word in entry.split():
if word in vectorizer.vocabulary_:
cleanedstring = cleanedstring + " " + word
print(cleanedstring)
entry = cleanedstring
cleanedstring = ""
return dfcol
例如:
tfifgbdf_vect= TfidfVectorizer()
s2 = pd.Series(['the cat', 'awesome xyz', 'f_g_h lol asd'])
tfifgbdf_vect.fit_transform(s2)
s3 = pd.Series(['the dog the awesome xyz', 'xyz lol asd', 'f_g_h lol aha'])
clean_unseen(s3, tfifgbdf_vect)
但是,这将使原始列返回不变:
Output:
0 the dog the awesome xyz
1 xyz lol asd
2 f_g_h lol aha
dtype: object
由于序列中的单个条目不是对象,因此它始终是深度副本而不是引用,因此需要显式更改
def clean_unseen(dfcol, vectorizer):
dfc1 = []
cleanedstring = ""
for entry in dfcol:
for word in entry.split():
if word in vectorizer.vocabulary_:
cleanedstring = cleanedstring + " " + word
#print(cleanedstring)
#entry = cleanedstring
dfc1.append(cleanedstring)
cleanedstring = ""
return pd.Series(dfc1)
tfifgbdf_vect= TfidfVectorizer()
s2 = pd.Series(['the cat', 'awesome xyz', 'f_g_h lol asd'])
tfifgbdf_vect.fit_transform(s2)
s3 = pd.Series(['the dog the awesome xyz', 'xyz lol asd', 'f_g_h lol aha'])
clean_unseen(s3, tfifgbdf_vect)