Python 在执行正则表达式数据清理时将列表输出存储为dataframe列_Python_Pandas_Data Cleaning

Python 在执行正则表达式数据清理时将列表输出存储为dataframe列

python pandas

Python 在执行正则表达式数据清理时将列表输出存储为dataframe列,python,pandas,data-cleaning,Python,Pandas,Data Cleaning,我正在清理列数据，然后我的工作是将清理后的信息完整地存储在同一列中，以便将其输入tf idf矢量器。我编写的以下代码运行良好，但将输出存储在列表中。我希望干净的输出存储在同一列中，而不是列表中。我的目标是保持我的信息完整，格式正确 #Cleaning comment section import re import nltk stopwords = nltk.corpus.stopwords.words('english') def text_cleaner(text,num): ne

我正在清理列数据，然后我的工作是将清理后的信息完整地存储在同一列中，以便将其输入tf idf矢量器。我编写的以下代码运行良好，但将输出存储在列表中。我希望干净的输出存储在同一列中，而不是列表中。我的目标是保持我的信息完整，格式正确

#Cleaning comment section
import re
import nltk

stopwords = nltk.corpus.stopwords.words('english')

def text_cleaner(text,num):
   newString = text.lower()
   newString = re.sub(r'\([^)]*\)', '', newString)
   newString = re.sub(r"[0-9]", "", newString)
   newString = re.sub(',','.', newString)
   newString = re.sub(r"'s\b","",newString)
   newString = re.sub("[^a-zA-Z]", " ", newString) 
   newString = re.sub(r"Ä¢", "", newString)
   newString = re.sub(r"¬∑", "", newString)
   newString = re.sub(r"\'", "", newString)    
   newString = re.sub(r"\"", "", newString)   
   newString = re.sub(r"\n", "", newString)    
   newString = re.sub(r"\r", "", newString) 
if(num==0):
    tokens = [w for w in newString.split() if not w in stopwords]
else:
    tokens=newString.split()
long_words=[]
for i in tokens:
    if len(i)>1:  #removing short word
        long_words.append(i)   
return (" ".join(long_words)).strip()
#call the function
X = []
for t in df1['CHARACTERISTICS']:
   X.append(text_cleaner(t,0))

列表

应转换回数据帧列C1，或者清除函数应直接在列C1中返回干净字符串。我尝试使用以下代码执行此操作，但导致错误：

df['C1'] = df['C1'].apply(text_cleaner(t,0))

以下回答了我的问题

df['C1'] = pd.DataFrame(X)