Python 在执行json.dump时，如何仅保留ascii并丢弃非ascii、nbsp等_Python_Ascii_Non Ascii Characters_Python Unicode

Python 在执行json.dump时，如何仅保留ascii并丢弃非ascii、nbsp等

python

Python 在执行json.dump时，如何仅保留ascii并丢弃非ascii、nbsp等,python,ascii,non-ascii-characters,python-unicode,Python,Ascii,Non Ascii Characters,Python Unicode,我使用csv阅读器读取csv文件，然后使用字典将其转换为json文件。在这样做时，我只希望字母和数字没有非ascii字符或字符。我试着这样做： with open ('/file', 'rb') as file_Read: reader = csv.reader(file_Read) lis = [] di = {} for r in reader: di = {r[0].strip():[some_val]} lis

我使用csv阅读器读取csv文件，然后使用字典将其转换为json文件。
在这样做时，我只希望字母和数字没有非ascii字符或字符。我试着这样做：

with open ('/file', 'rb') as file_Read:
     reader = csv.reader(file_Read)
     lis = []
     di = {}
     for r in reader:
         di = {r[0].strip():[some_val]}
         lis.append(di)

with open('/file1', 'wb') as file_Dumped:
     list_to_be_written = json.dumps(lis)
     file_Dumped.write(liss)

my_string.replace('  ', '').strip()

当我读取文件时，输出由类似

\xa0\xa0\xa0\xa0\xa0

的序列和键组成。
Ex-

{“name\xa0\xa0\xa0\xa0”：[9]}

如果我做了json.dumps（lis，确保ascii=False）那么我会看到键周围有空格。
Ex-

{“name”：[9]}

如何完全删除除字母和数字以外的所有内容？

如果空格仅位于行的末尾，则可以使用

.strip（）

。如果需要在ascii字符之间留空格，可以使用如下方式：

with open ('/file', 'rb') as file_Read:
     reader = csv.reader(file_Read)
     lis = []
     di = {}
     for r in reader:
         di = {r[0].strip():[some_val]}
         lis.append(di)

with open('/file1', 'wb') as file_Dumped:
     list_to_be_written = json.dumps(lis)
     file_Dumped.write(liss)

my_string.replace('  ', '').strip()

要删除非ascii字符，请尝试以下操作：

my_string = 'name  \xa0\xa0\xa0\xa0'
my_string.encode('ascii', 'ignore').strip()

import pandas as pd
import json
# Read the csv file using pandas
df = pd.read_csv("YourInputCSVFile")

#Convert all column types to str in order to remove non-ascii characters
df = df.astype(str)

#Iterate between all columns in order to remove non-ascii characters
for column in df:
    df[column] = df[column].apply(lambda x: ''.join([" " if ord(i) < 32 or ord(i) > 126 else i for i in x]))

#Convert the dataframe to dictionary for json conversion
df_dict = df.to_dict()

#Save the dictionary contents to a json file
with open('data.json', 'w') as fp:
    json.dump(df_dict, fp)

您可以尝试以下方法：

my_string = 'name  \xa0\xa0\xa0\xa0'
my_string.encode('ascii', 'ignore').strip()

import pandas as pd
import json
# Read the csv file using pandas
df = pd.read_csv("YourInputCSVFile")

#Convert all column types to str in order to remove non-ascii characters
df = df.astype(str)

#Iterate between all columns in order to remove non-ascii characters
for column in df:
    df[column] = df[column].apply(lambda x: ''.join([" " if ord(i) < 32 or ord(i) > 126 else i for i in x]))

#Convert the dataframe to dictionary for json conversion
df_dict = df.to_dict()

#Save the dictionary contents to a json file
with open('data.json', 'w') as fp:
    json.dump(df_dict, fp)

将熊猫作为pd导入
导入json
#使用pandas读取csv文件
df=pd.read\u csv（“您的输入csvfile”）
#将所有列类型转换为str以删除非ascii字符
df=df.astype（str）
#在所有列之间迭代以删除非ascii字符
对于df中的列：
df[column]=df[column].apply（lambda x:''。联接（[“”如果ord（i）<32或ord（i）>126，则为x中i的其他i]））
#将dataframe转换为字典以进行json转换
df_dict=df.to_dict（）
#将字典内容保存到json文件
将open（'data.json'，'w'）作为fp:
json.dump（df_dict，fp）

看起来像是导入字符串的副本

printable=set（string.printable）

'''.join（filter（可打印文件中的lambda x:x，要编写的列表））

@HarishKumar这非常有用，先生。我添加了strip（），它给了我想要的结果。谢谢您的回复，先生。我已经删除了尾随/前导空格（for循环的第一行）。考虑这个-<代码> s= '\xEf\xbb\xbf NAME1 '/COD>。如果在Python空闲时键入print s，输出将是

name1

。如果键入s，输出将为

'\xef\xbb\xbf name1'

。如何删除该

'\xef\xbb\xbf'

？请尝试以下操作：

my_string='name\xa0\xa0\xa0\xa0'

my_string.encode（'ascii'，'ignore'）.strip（）

它给出了此错误-UnicodeDecodeError:'ascii'编解码器无法解码第5位的字节0xa0：序号不在范围内（128）请在此处查看类似问题的解决方案（）