使用Python处理从Excel到CSV的特殊字符
您好,我在使用python处理从Excel工作表到CSV的特殊字符时遇到问题 当我使用使用Python处理从Excel到CSV的特殊字符,python,regex,export-to-csv,Python,Regex,Export To Csv,您好,我在使用python处理从Excel工作表到CSV的特殊字符时遇到问题 当我使用 else: # Encode strings into format to preserve content of cell row_values.append(cell.value.encode("UTF-8").strip()) 我得到的特殊字符是'a' 当我使用 else: #
else:
# Encode strings into format to preserve content of cell
row_values.append(cell.value.encode("UTF-8").strip())
我得到的特殊字符是'a'
当我使用
else:
# Encode strings into ISO-8859-1 format to preserve content of cell
row_values.append(cell.value.encode("iso-8859-1").strip())
我得到的特殊字符是'�'代码>容易说吗?钻石色
我相信这与编码有关,但不确定使用哪种编码。这些字符来自转换为CSV的Excel工作表
这是我使用的代码
def convert_to_csv(excel_file, input_dir, output_dir):
"""Convert an excel file to a CSV file by removing irrelevant data"""
try:
sheet = read_excel(excel_file)
except UnicodeDecodeError:
print 'File %s is possibly corrupt. Please check again.' % (excel_file)
sys.exit(1)
row_num = sheet.get_highest_row() # Number of rows
col_num = sheet.get_highest_column() # Number of columns
all_rows = []
# Loop through rows and columns
for row in range(row_num):
row_values = []
for column in range(col_num):
# Get cell element
cell = sheet.cell(row=row, column=column)
# Ignore empty cells
if cell.value is not None:
if type(cell.value) == int or type(cell.value) == float:
# String encoding not applicable for integers and floating point numbers
row_values.append(cell.value)
else:
# Encode strings into ISO-8859-1 format to preserve content of cell
row_values.append(cell.value.encode("iso-8859-1").strip())
else:
row_values.append('')
# Append rows only having more than three values each
if len(set(row_values)-{''}) > 3:
# print row_values
all_rows.append(row_values)
# Saving the data to a csv extension with the same name as the given excel file
output_path = os.path.join(output_dir, excel_file.split('.')[0] + '.csv')
with open(output_path, 'wb') as f:
writer = csv.writer(f, delimiter=";", quoting=csv.QUOTE_ALL)
writer.writerows(all_rows[1:])
使用Python 2.6.9
我想知道我们是否可以在写入CSV之前使用常规表达式
我们还能处理吗
提前谢谢 我们已经修好了
` else:
# Encode strings into ISO-8859-1 format to preserve content of cell
row_values.append(
re.sub(r'[^\x00-\x7f]', r'', cell.value).strip())`