Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/333.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/wix/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
清除非ASCII python数据_Python_Dataframe_Ascii_Data Cleaning_Nsregularexpression - Fatal编程技术网

清除非ASCII python数据

清除非ASCII python数据,python,dataframe,ascii,data-cleaning,nsregularexpression,Python,Dataframe,Ascii,Data Cleaning,Nsregularexpression,我有一个函数,可以根据ASCII codigp清除数据框中的文本,如果我有另一种语言的数据,例如俄语或汉语,如何才能不删除它们 def clean_text(text, pattern="[^ a-zA-Z0-9]"): cleaned_text = unicodedata.normalize('NFD', text).encode('ascii', 'ignore') cleaned_text = re.sub(pattern, " ",

我有一个函数,可以根据ASCII codigp清除数据框中的文本,如果我有另一种语言的数据,例如俄语或汉语,如何才能不删除它们

def clean_text(text, pattern="[^ a-zA-Z0-9]"):
    cleaned_text = unicodedata.normalize('NFD', text).encode('ascii', 'ignore')
    cleaned_text = re.sub(pattern, " ", cleaned_text.decode("utf-8"), flags=re.UNICODE)
    cleaned_text = u' '.join(cleaned_text.lower().strip().split())
return cleaned_text
原始df

index  name
0      水柳仙
1      Dean,Martín
2      Doris Day
当应用我得到的函数时

index  name
0     
1      dean martin
2      doris day
我想去

index  name
0      水柳仙
1      dean martin
2      doris day

那么,除了删除非ASCII字符,您还想对它们做什么呢?我想清除特殊字符,但因为我有其他语言的数据​​它会删除它,我不希望它从其他语言删除数据。如果您不想删除非ASCII字符,为什么要使用
re.sub
删除它们?除了删除它们之外,“清除特殊字符”意味着什么?他们该怎么办?你想保留汉字,但你也说你想去掉重音?但是重音字符在非英语语言的名称中是有效的!