删除字符串'；根据元素长度在python数据帧中添加元素_Python_Dataframe_Nlp_String Length

删除字符串'；根据元素长度在python数据帧中添加元素

python dataframe nlp

删除字符串'；根据元素长度在python数据帧中添加元素,python,dataframe,nlp,string-length,Python,Dataframe,Nlp,String Length,我有一个python数据框架，由13列和60000行组成，其中一列nammed“Text”（type object）包含相当长的文本单元格： Text ID AI BI GH JB EQ HE EN MA WE WR 2585 obstetric gynaecologicaladmissions owing abor... 2585 0 0 0 0 0 1 0 0 0 0 507 graphic il

我有一个python数据框架，由13列和60000行组成，其中一列nammed“Text”（type object）包含相当长的文本单元格：

    Text    ID  AI  BI  GH  JB  EQ  HE  EN  MA  WE  WR
2585    obstetric gynaecologicaladmissions owing abor...    2585    0   0   0   0   0   1   0   0   0   0
507     graphic illustration process flow help organiz...   507     0   0   0   0   0   0   0   0   1   0

某些行中的一些单词被粘贴（如第一个数据框行：妇科许可），为了消除这个问题，我想删除整个数据集中的所有这些情况。我考虑过删除，对于“文本”列中的每一行，所有超过13个字符的单词

我试过这行代码：

res.loc[res['Text'].str.len() < 13]

res.loc[res['Text'].str.len（）<13]

但结果它只提供了两条空行

如何解决此问题？

让我们以数据帧为例

df

    text
0   obstetric gynaecologicaladmissions owing
1   graphic illustration process flow help
2   process flow help
3   illustrationprocess flow

由于必须检查单词长度，因此必须使用分隔符（在本例中为空格）拆分每个字符串，并循环遍历数组，包括长度为的单词，谢谢您的回答。我还想保留在检测到单词>13个字符的行中存在的其他单词。例如，第0行将给出“产科欠薪”。更新了答案

def func(x):
    res = list()
    for word in x:
        if len(word) <= 13:
            res.append(word)
    return " ".join(res)
    
df['text'] = df['text'].str.split().apply(func)
df
    
     text
0   obstetric owing
1   graphic illustration process flow help
2   process flow help
3   flow