从Python数据框架中删除HTML标记_Python_Html_Pandas_Nlp

从Python数据框架中删除HTML标记

python html pandas nlp

从Python数据框架中删除HTML标记,python,html,pandas,nlp,Python,Html,Pandas,Nlp,我有一个包含html标记的csv文件。我尝试使用以下函数遍历数据帧以删除html标记，并得到“TypeError:expected string或buffer”。如果您对这个错误有任何帮助，我们将不胜感激 import re def clean_html(raw_html): for index, row in raw_html.iterrows(): cleanr = re.compile('<.*?>') cleantext = re.s

我有一个包含html标记的csv文件。我尝试使用以下函数遍历数据帧以删除html标记，并得到“TypeError:expected string或buffer”。如果您对这个错误有任何帮助，我们将不胜感激

import re

def clean_html(raw_html):
    for index, row in raw_html.iterrows():
        cleanr = re.compile('<.*?>')
        cleantext = re.sub(cleanr, '', raw_html)
        return cleantext

重新导入
def清洁html（原始html）：
对于索引，原始html.iterrows（）中的行：
cleanr=re.compile（“”）
cleantext=re.sub（cleanr'，原始html）
返回干净文本

您正在将原始html变量传递给re.sub函数。尝试传入行数据

cleantext = re.sub(clean, '', row['a1'])