Python 3.x 使用Python3转换非ascii字符
我想问一个关于在Python中将文本字符转换为二进制数的问题。 我写了一个程序,将所有ASCII字符和一些土耳其字符转换成二进制数。 以下代码是该转换器程序的代码:Python 3.x 使用Python3转换非ascii字符,python-3.x,non-ascii-characters,Python 3.x,Non Ascii Characters,我想问一个关于在Python中将文本字符转换为二进制数的问题。 我写了一个程序,将所有ASCII字符和一些土耳其字符转换成二进制数。 以下代码是该转换器程序的代码: while True: ASCII_characters_dict = {chr(i): "0" + bin(ord(chr(i)))[2:] for i in range(128)} for i in ASCII_characters_dict: if len(ASCII_characters_di
while True:
ASCII_characters_dict = {chr(i): "0" + bin(ord(chr(i)))[2:] for i in range(128)}
for i in ASCII_characters_dict:
if len(ASCII_characters_dict[i]) == 7:
ASCII_characters_dict[i] = "0" + ASCII_characters_dict[i]
elif len(ASCII_characters_dict[i]) == 6:
ASCII_characters_dict[i] = "00" + ASCII_characters_dict[i]
elif len(ASCII_characters_dict[i]) == 5:
ASCII_characters_dict[i] = "000" + ASCII_characters_dict[i]
elif len(ASCII_characters_dict[i]) == 4:
ASCII_characters_dict[i] = "0000" + ASCII_characters_dict[i]
elif len(ASCII_characters_dict[i]) == 3:
ASCII_characters_dict[i] = "00000" + ASCII_characters_dict[i]
elif len(ASCII_characters_dict[i]) == 2:
ASCII_characters_dict[i] = "000000" + ASCII_characters_dict[i]
Turkish_characters = "çÇöÖüÜ"
Turkish_characters_dict = {i: bin(ord(i))[2:] for i in Turkish_characters}
Dictionary = ASCII_characters_dict.copy()
Dictionary.update(Turkish_characters_dict)
başlık = "WELCOME TO THE CONVERTOR"
süs="-"*80
print("\n{}\n\n{}\n".format(süs, başlık.center(80," ")))
seçenekler = "1. To convert text to binary, press '1'.\n2. To convert binary to text, press '2'.\n3. To exit the program, press '3'."
print("{0}\n\n{1}\n\n{0}\n".format(süs, seçenekler))
while True:
seçim = input("Select:")
print("\n{}\n".format(süs))
if seçim=="1":
break
elif seçim=="2":
break
elif seçim == "3":
quit()
else:
print("Warning: Please select one of the given numbers.\n")
while seçim == "1":
altbaşlık = "Convert Text to Binary"
print("{}\n\n{}\n".format(altbaşlık.center(80," "), süs))
text_1 = input("Text:")
text_2 = ""
for i in text_1:
for j in Dictionary:
if i == j:
text_2 += Dictionary[j]
with open("Text_To_Binary.txt","a") as dosya:
dosya.write("\n{0}\n\nBinary: {1}\n\n{0}\n".format(süs, text_2))
print("\n{0}\n\nBinary: {1}\n\n{0}".format(süs, text_2))
message = "1. To continue converting text to binary, press '1'.\n\n3. To return the main page, press '2'.\n\n3. To exit the program, press '3'."
print("\n{}\n\n{}".format(message,süs))
while True:
yeni_seçim_1 = input("\nSelect:")
print("\n{}\n".format(süs))
if yeni_seçim_1 == "1":
break
elif yeni_seçim_1 == "2":
break
elif yeni_seçim_1 == "3":
quit()
else:
print("Warning: Please select one of the given numbers.")
if yeni_seçim_1 == "1":
continue
elif yeni_seçim_1 == "2":
break
while seçim == "2":
altbaşlık = "Convert Binary to Text"
print("{}\n\n{}\n".format(altbaşlık.center(80," "), süs))
text_1 = input("Binary:")
text_2 = ""
list_1 = []
if " " in text_1:
list_1 = text_1.split()
elif " " not in text_1:
for i in range(len(text_1)):
if i % 8 == 0:
text_2 += " "
text_2 += text_1[i]
list_1 = text_2[1:].split(" ")
text_3 = ""
for i in list_1:
for j in Dictionary:
if i == Dictionary[j]:
text_3 += j
with open("Binary_To_Text.txt","a") as dosya:
dosya.write("\n{0}\n\nText: {1}\n\n{0}\n".format(süs, text_3))
print("\n{0}\n\nText: {1}\n\n{0}\n".format(süs, text_3))
message = "1. To continue converting binary to text, press '1'.\n\n3. To return the main page, press '2'.\n\n3. To exit the program, press '3'."
print("{}\n\n{}".format(message,süs))
while True:
yeni_seçim = input("\nSelect:")
print("\n{}\n".format(süs))
if yeni_seçim == "1":
break
elif yeni_seçim == "2":
break
elif yeni_seçim == "3":
quit()
else:
print("Warning: Please select one of the given numbers.")
if yeni_seçim == "2":
break
该转换器可以将“çççÖÜ”字符正确地转换为二进制数。我从土耳其语字符列表中删除了“şŞĞıİ”字符,因为程序无法正确转换它们
根据
字符“ş”的二进制数为“001001100011001101010011000100111011”
当我复制这个数字并将其写入程序的“binary to text”部分时,输出显示为“&351;”我在字符之间加了空格,因为它显示为ş。
当我输入chr351时,输出将结果显示为字符“ş”
字符“ş”的二进制数为bin351,等于“0b101011111”。但是,当我将这些数字写入转换器时,程序将结果显示为null
同样的问题也可以在“ŞĞıİ”字符中看到。但是,“ççÖÜ”字符可以毫无问题地转换
根据
字符“ş”的二进制数为“01011111”。但这些数字属于字符u”
同样的问题也可以在“ŞĞıİ”字符中看到
所以,我的一个问题是为什么“ççççççÜ”字符可以正确转换而“şŞĞıİ”字符不能正确转换?除了通过在输入步骤后控制“şŞĞıİ”字符来实现字符化之外,还有其他解决方案吗?
提前感谢。让我分享一下我从这个例子中学到的东西 首先,ASCII字符是具有8位数字的字符。其中7位表示字符的二进制数,最后一位等于0,称为奇偶校验位。看 例如:字符a的二进制数为“1100001”,但通常该数字显示为0110001。这种显示称为奇偶校验。奇偶校验是一种知道二进制数是奇数还是偶数的控制类型 向该二进制数添加奇偶校验位的原因是,当向另一台计算机发送该二进制数时,传输可能会中断。a的二进制数等于97,这是一个奇数。通过将0奇偶校验位放入该数字,该数字继续表示97 所以字符a是一个用ASCII定义的数字。所有ASCII字符均为8位,表示二进制数有8位。但非ASCII字符为16位。让我们看看为什么非ASCII字符是16位的
number=ord("a")
#number=97
string=chr(number)
#string="a"
上述代码定义了一个仅包含字符a的字符串。但是,当用户希望使用utf-8编码此数字时,如上文所述:
number=ord("a")
#number=97
string=chr(number).encode(encoding="utf-8")
#string=b'a'
len(string)
#1
如果b字符之后是\x字符,则此代码将定义十六进制格式的字符串。假设角色是:
字符串的最后一个值似乎有点奇怪,但它的真正值就在其中。该值为c3a7,是一个十六进制数。字符a的长度为1。这意味着这个数字有8位,等于1字节。然而,字符ç的长度是2。这意味着这个数字有16位,等于2字节
让我们看看characterç的二进制数:
number=int("c3a7",16)
#number=50087
binary_number_of_character_ç=bin(50087)
#binary_number_of_character_ç=1100001110100111
len(binary_number_of_character_ç)
#16
因此,ç的二进制数是1100001110100111,通常也显示为110000110100111
如果我们根据上述信息改变整个代码结构,代码可以如下所示,在显示非ascii字符时不会出现错误:
#-----------------------------IMPORTING Fore and init FUNCTIONS FROM COLORAMA MODULE------------------------------------
from colorama import Fore,init
init(autoreset=True)
#--------------------------------------------DICTIONARY FUNCTION--------------------------------------------------------
def dictionary():
ascii_dictionary = {chr(i): bin(i)[2:] for i in range(128)}
for i in ascii_dictionary:
if len(ascii_dictionary[i]) < 8:
count = 8 - len(ascii_dictionary[i])
ascii_dictionary[i] = "".zfill(count) + ascii_dictionary[i]
non_ascii_dictionary = {chr(i): bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[2:10] + " " +
bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[10:18] for i in range(128, 512)}
dictionary = ascii_dictionary.copy()
dictionary.update(non_ascii_dictionary)
return dictionary
#--------------------------------------------CONVERTER FUNCTIONS--------------------------------------------------------
def convert_text_to_binary():
text_1 = input("Write A Text:")
print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
return_value = dictionary()
list_1 = [return_value[j] for i in text_1 for j in return_value if i == j]
text_2 = " ".join(list_1)
print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
with open("Text.to_Binary.txt", "a", encoding="utf-8") as file:
file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))
def convert_binary_to_text():
text_1 = input("Write Binary Numbers:")
print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
list_1 = text_1.split()
list_2 = [i for i in list_1 if i.startswith("1")]
count = 0
list_3 = []
while count < len(list_2):
list_3.append(" ".join(list_2[count:count + 2]))
count += 2
list_4 = []
count = 0
for i in list_1:
if i.startswith("0"):
list_4.append(i)
elif i.startswith("1"):
list_1.remove(i)
list_4.append(list_3[count])
count += 1
text_2 = ""
return_value = dictionary()
for i in list_4:
for j in return_value:
if i == return_value[j]:
text_2 += j
print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
with open("Binary_to_Text.txt", "a", encoding="utf-8") as file:
file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))
#------------------------------------------STYLING WITH TEXT CLASS------------------------------------------------------
class text():
def __init__(self,name,style=Fore.LIGHTBLUE_EX+"-"*80):
self.name=name
self.style=style
def title(self):
print("\n{0}\n\n{1}\n\n{0}\n".format(self.style, str(self.name).center(80, " ")))
def paragraph(self):
print("{0}\n\n{1}\n".format(self.name, self.style))
#-----------------------------------------------TEXT INSTANCES----------------------------------------------------------
head = text(Fore.RED + "WELCOME TO THE CONVERTER")
sub_head_1 = text(Fore.RED + "CONVERT TEXT TO BINARY")
sub_head_2 = text(Fore.RED + "CONVERT BINARY TO TEXT")
head_options = text(Fore.RED + "1. " + Fore.GREEN + "To convert text to binary, press '1'.\n\n" +
Fore.RED + "2. " + Fore.GREEN + "To convert binary to text, press '2'.\n\n" +
Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")
sub_head_options = text(Fore.RED + "1. " + Fore.GREEN + "To continue converting, press '1'\n\n" +
Fore.RED + "2. " + Fore.GREEN + "To return the main page, press '2'.\n\n" +
Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")
#-----------------------------------------------CHOICE FUNCTION---------------------------------------------------------
def choice():
while True:
select = input("Select:")
if select == "1":
return select
elif select == "2":
return select
elif select == "3":
quit()
else:
print(Fore.RED+"\nWarning: "+Fore.GREEN+"Please select one of the given numbers.\n")
continue
if select == "1" or select == "2":
break
#---------------------------------------BUNDLING PROGRAM PARTS IN FUNCTION----------------------------------------------
def main_program():
while True:
head.title()
head_options.paragraph()
select = choice()
while select == "1":
sub_head_1.title()
convert_text_to_binary()
sub_head_options.paragraph()
new_select = choice()
if new_select == "2":
break
while select == "2":
sub_head_2.title()
convert_binary_to_text()
sub_head_options.paragraph()
new_select = choice()
if new_select == "2":
break
main_program()
所有关于字符集、编码的信息。ASCII中没有重音字符,拉丁-1中有ççççççççççççççççççç。一般来说,可以使用UTF-8编码Unicode,您可以将任何脚本、土耳其语、阿拉伯语等组合在一起。然而,UTF-8是一种多字节编码。进一步:351英镑;这些是HTML数字实体。嗨,为了在程序中进行特殊的更改。我应该在哪里更改字符的编码?
#-----------------------------IMPORTING Fore and init FUNCTIONS FROM COLORAMA MODULE------------------------------------
from colorama import Fore,init
init(autoreset=True)
#--------------------------------------------DICTIONARY FUNCTION--------------------------------------------------------
def dictionary():
ascii_dictionary = {chr(i): bin(i)[2:] for i in range(128)}
for i in ascii_dictionary:
if len(ascii_dictionary[i]) < 8:
count = 8 - len(ascii_dictionary[i])
ascii_dictionary[i] = "".zfill(count) + ascii_dictionary[i]
non_ascii_dictionary = {chr(i): bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[2:10] + " " +
bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[10:18] for i in range(128, 512)}
dictionary = ascii_dictionary.copy()
dictionary.update(non_ascii_dictionary)
return dictionary
#--------------------------------------------CONVERTER FUNCTIONS--------------------------------------------------------
def convert_text_to_binary():
text_1 = input("Write A Text:")
print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
return_value = dictionary()
list_1 = [return_value[j] for i in text_1 for j in return_value if i == j]
text_2 = " ".join(list_1)
print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
with open("Text.to_Binary.txt", "a", encoding="utf-8") as file:
file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))
def convert_binary_to_text():
text_1 = input("Write Binary Numbers:")
print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
list_1 = text_1.split()
list_2 = [i for i in list_1 if i.startswith("1")]
count = 0
list_3 = []
while count < len(list_2):
list_3.append(" ".join(list_2[count:count + 2]))
count += 2
list_4 = []
count = 0
for i in list_1:
if i.startswith("0"):
list_4.append(i)
elif i.startswith("1"):
list_1.remove(i)
list_4.append(list_3[count])
count += 1
text_2 = ""
return_value = dictionary()
for i in list_4:
for j in return_value:
if i == return_value[j]:
text_2 += j
print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
with open("Binary_to_Text.txt", "a", encoding="utf-8") as file:
file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))
#------------------------------------------STYLING WITH TEXT CLASS------------------------------------------------------
class text():
def __init__(self,name,style=Fore.LIGHTBLUE_EX+"-"*80):
self.name=name
self.style=style
def title(self):
print("\n{0}\n\n{1}\n\n{0}\n".format(self.style, str(self.name).center(80, " ")))
def paragraph(self):
print("{0}\n\n{1}\n".format(self.name, self.style))
#-----------------------------------------------TEXT INSTANCES----------------------------------------------------------
head = text(Fore.RED + "WELCOME TO THE CONVERTER")
sub_head_1 = text(Fore.RED + "CONVERT TEXT TO BINARY")
sub_head_2 = text(Fore.RED + "CONVERT BINARY TO TEXT")
head_options = text(Fore.RED + "1. " + Fore.GREEN + "To convert text to binary, press '1'.\n\n" +
Fore.RED + "2. " + Fore.GREEN + "To convert binary to text, press '2'.\n\n" +
Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")
sub_head_options = text(Fore.RED + "1. " + Fore.GREEN + "To continue converting, press '1'\n\n" +
Fore.RED + "2. " + Fore.GREEN + "To return the main page, press '2'.\n\n" +
Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")
#-----------------------------------------------CHOICE FUNCTION---------------------------------------------------------
def choice():
while True:
select = input("Select:")
if select == "1":
return select
elif select == "2":
return select
elif select == "3":
quit()
else:
print(Fore.RED+"\nWarning: "+Fore.GREEN+"Please select one of the given numbers.\n")
continue
if select == "1" or select == "2":
break
#---------------------------------------BUNDLING PROGRAM PARTS IN FUNCTION----------------------------------------------
def main_program():
while True:
head.title()
head_options.paragraph()
select = choice()
while select == "1":
sub_head_1.title()
convert_text_to_binary()
sub_head_options.paragraph()
new_select = choice()
if new_select == "2":
break
while select == "2":
sub_head_2.title()
convert_binary_to_text()
sub_head_options.paragraph()
new_select = choice()
if new_select == "2":
break
main_program()