Python 3.x 使用Python3转换非ascii字符

Python 3.x 使用Python3转换非ascii字符,python-3.x,non-ascii-characters,Python 3.x,Non Ascii Characters,我想问一个关于在Python中将文本字符转换为二进制数的问题。 我写了一个程序,将所有ASCII字符和一些土耳其字符转换成二进制数。 以下代码是该转换器程序的代码: while True: ASCII_characters_dict = {chr(i): "0" + bin(ord(chr(i)))[2:] for i in range(128)} for i in ASCII_characters_dict: if len(ASCII_characters_di

我想问一个关于在Python中将文本字符转换为二进制数的问题。 我写了一个程序,将所有ASCII字符和一些土耳其字符转换成二进制数。 以下代码是该转换器程序的代码:

while True:
    ASCII_characters_dict = {chr(i): "0" + bin(ord(chr(i)))[2:] for i in range(128)}
    for i in ASCII_characters_dict:
        if len(ASCII_characters_dict[i]) == 7:
            ASCII_characters_dict[i] = "0" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 6:
            ASCII_characters_dict[i] = "00" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 5:
            ASCII_characters_dict[i] = "000" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 4:
            ASCII_characters_dict[i] = "0000" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 3:
            ASCII_characters_dict[i] = "00000" + ASCII_characters_dict[i]
        elif len(ASCII_characters_dict[i]) == 2:
            ASCII_characters_dict[i] = "000000" + ASCII_characters_dict[i]
    Turkish_characters = "çÇöÖüÜ"
    Turkish_characters_dict = {i: bin(ord(i))[2:] for i in Turkish_characters}
    Dictionary = ASCII_characters_dict.copy()
    Dictionary.update(Turkish_characters_dict)
    başlık = "WELCOME TO THE CONVERTOR"
    süs="-"*80
    print("\n{}\n\n{}\n".format(süs, başlık.center(80," ")))
    seçenekler = "1. To convert text to binary, press '1'.\n2. To convert binary to text, press '2'.\n3. To exit the program, press '3'."
    print("{0}\n\n{1}\n\n{0}\n".format(süs, seçenekler))
    while True:
        seçim = input("Select:")
        print("\n{}\n".format(süs))
        if seçim=="1":
            break
        elif seçim=="2":
            break
        elif seçim == "3":
            quit()
        else:
            print("Warning: Please select one of the given numbers.\n")
    while seçim == "1":
        altbaşlık = "Convert Text to Binary"
        print("{}\n\n{}\n".format(altbaşlık.center(80," "), süs))
        text_1 = input("Text:")
        text_2 = ""
        for i in text_1:
            for j in Dictionary:
                if i == j:
                    text_2 += Dictionary[j]
        with open("Text_To_Binary.txt","a") as dosya:
            dosya.write("\n{0}\n\nBinary: {1}\n\n{0}\n".format(süs, text_2))
        print("\n{0}\n\nBinary: {1}\n\n{0}".format(süs, text_2))
        message = "1. To continue converting text to binary, press '1'.\n\n3. To return the main page, press '2'.\n\n3. To exit the program, press '3'."
        print("\n{}\n\n{}".format(message,süs))
        while True:
            yeni_seçim_1 = input("\nSelect:")
            print("\n{}\n".format(süs))
            if yeni_seçim_1 == "1":
                break
            elif yeni_seçim_1 == "2":
                break
            elif yeni_seçim_1 == "3":
                quit()
            else:
                print("Warning: Please select one of the given numbers.")
        if yeni_seçim_1 == "1":
            continue
        elif yeni_seçim_1 == "2":
            break
    while seçim == "2":
        altbaşlık = "Convert Binary to Text"
        print("{}\n\n{}\n".format(altbaşlık.center(80," "), süs))
        text_1 = input("Binary:")
        text_2 = ""
        list_1 = []
        if " " in text_1:
            list_1 = text_1.split()
        elif " " not in text_1:
            for i in range(len(text_1)):
                if i % 8 == 0:
                    text_2 += " "
                text_2 += text_1[i]
            list_1 = text_2[1:].split(" ")
        text_3 = ""
        for i in list_1:
            for j in Dictionary:
                if i == Dictionary[j]:
                    text_3 += j
        with open("Binary_To_Text.txt","a") as dosya:
            dosya.write("\n{0}\n\nText: {1}\n\n{0}\n".format(süs, text_3))
        print("\n{0}\n\nText: {1}\n\n{0}\n".format(süs, text_3))
        message = "1. To continue converting binary to text, press '1'.\n\n3. To return the main page, press '2'.\n\n3. To exit the program, press '3'."
        print("{}\n\n{}".format(message,süs))
        while True:
            yeni_seçim = input("\nSelect:")
            print("\n{}\n".format(süs))
            if yeni_seçim == "1":
                break
            elif yeni_seçim == "2":
                break
            elif yeni_seçim == "3":
                quit()
            else:
                print("Warning: Please select one of the given numbers.")
        if yeni_seçim == "2":
            break
该转换器可以将“çççÖÜ”字符正确地转换为二进制数。我从土耳其语字符列表中删除了“şŞĞıİ”字符,因为程序无法正确转换它们

根据

字符“ş”的二进制数为“001001100011001101010011000100111011” 当我复制这个数字并将其写入程序的“binary to text”部分时,输出显示为“&351;”我在字符之间加了空格,因为它显示为ş。 当我输入chr351时,输出将结果显示为字符“ş”

字符“ş”的二进制数为bin351,等于“0b101011111”。但是,当我将这些数字写入转换器时,程序将结果显示为null

同样的问题也可以在“ŞĞıİ”字符中看到。但是,“ççÖÜ”字符可以毫无问题地转换

根据

字符“ş”的二进制数为“01011111”。但这些数字属于字符u”

同样的问题也可以在“ŞĞıİ”字符中看到

所以,我的一个问题是为什么“ççççççÜ”字符可以正确转换而“şŞĞıİ”字符不能正确转换?除了通过在输入步骤后控制“şŞĞıİ”字符来实现字符化之外,还有其他解决方案吗?
提前感谢。

让我分享一下我从这个例子中学到的东西

首先,ASCII字符是具有8位数字的字符。其中7位表示字符的二进制数,最后一位等于0,称为奇偶校验位。看

例如:字符a的二进制数为“1100001”,但通常该数字显示为0110001。这种显示称为奇偶校验。奇偶校验是一种知道二进制数是奇数还是偶数的控制类型

向该二进制数添加奇偶校验位的原因是,当向另一台计算机发送该二进制数时,传输可能会中断。a的二进制数等于97,这是一个奇数。通过将0奇偶校验位放入该数字,该数字继续表示97

所以字符a是一个用ASCII定义的数字。所有ASCII字符均为8位,表示二进制数有8位。但非ASCII字符为16位。让我们看看为什么非ASCII字符是16位的

number=ord("a")
#number=97
string=chr(number)
#string="a"
上述代码定义了一个仅包含字符a的字符串。但是,当用户希望使用utf-8编码此数字时,如上文所述:

number=ord("a")
#number=97
string=chr(number).encode(encoding="utf-8")
#string=b'a'
len(string)
#1
如果b字符之后是\x字符,则此代码将定义十六进制格式的字符串。假设角色是:

字符串的最后一个值似乎有点奇怪,但它的真正值就在其中。该值为c3a7,是一个十六进制数。字符a的长度为1。这意味着这个数字有8位,等于1字节。然而,字符ç的长度是2。这意味着这个数字有16位,等于2字节

让我们看看characterç的二进制数:

number=int("c3a7",16)
#number=50087
binary_number_of_character_ç=bin(50087)
#binary_number_of_character_ç=1100001110100111
len(binary_number_of_character_ç)
#16
因此,ç的二进制数是1100001110100111,通常也显示为110000110100111

如果我们根据上述信息改变整个代码结构,代码可以如下所示,在显示非ascii字符时不会出现错误:

#-----------------------------IMPORTING Fore and init FUNCTIONS FROM COLORAMA MODULE------------------------------------

from colorama import Fore,init
init(autoreset=True)

#--------------------------------------------DICTIONARY FUNCTION--------------------------------------------------------

def dictionary():
    ascii_dictionary = {chr(i): bin(i)[2:] for i in range(128)}
    for i in ascii_dictionary:
        if len(ascii_dictionary[i]) < 8:
            count = 8 - len(ascii_dictionary[i])
            ascii_dictionary[i] = "".zfill(count) + ascii_dictionary[i]
    non_ascii_dictionary = {chr(i): bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[2:10] + " " +
                        bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[10:18] for i in range(128, 512)}
    dictionary = ascii_dictionary.copy()
    dictionary.update(non_ascii_dictionary)
    return dictionary

#--------------------------------------------CONVERTER FUNCTIONS--------------------------------------------------------

def convert_text_to_binary():
    text_1 = input("Write A Text:")
    print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
    return_value = dictionary()
    list_1 = [return_value[j] for i in text_1 for j in return_value if i == j]
    text_2 = " ".join(list_1)
    print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
    with open("Text.to_Binary.txt", "a", encoding="utf-8") as file:
        file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))


def convert_binary_to_text():
    text_1 = input("Write Binary Numbers:")
    print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
    list_1 = text_1.split()
    list_2 = [i for i in list_1 if i.startswith("1")]
    count = 0
    list_3 = []
    while count < len(list_2):
        list_3.append(" ".join(list_2[count:count + 2]))
        count += 2
    list_4 = []
    count = 0
    for i in list_1:
        if i.startswith("0"):
            list_4.append(i)
        elif i.startswith("1"):
            list_1.remove(i)
            list_4.append(list_3[count])
            count += 1
    text_2 = ""
    return_value = dictionary()
    for i in list_4:
        for j in return_value:
            if i == return_value[j]:
                text_2 += j
    print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
    with open("Binary_to_Text.txt", "a", encoding="utf-8") as file:
        file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))

#------------------------------------------STYLING WITH TEXT CLASS------------------------------------------------------

class text():
    def __init__(self,name,style=Fore.LIGHTBLUE_EX+"-"*80):
        self.name=name
        self.style=style
    def title(self):
        print("\n{0}\n\n{1}\n\n{0}\n".format(self.style, str(self.name).center(80, " ")))
    def paragraph(self):
        print("{0}\n\n{1}\n".format(self.name, self.style))

#-----------------------------------------------TEXT INSTANCES----------------------------------------------------------

head = text(Fore.RED + "WELCOME TO THE CONVERTER")
sub_head_1 = text(Fore.RED + "CONVERT TEXT TO BINARY")
sub_head_2 = text(Fore.RED + "CONVERT BINARY TO TEXT")
head_options = text(Fore.RED + "1. " + Fore.GREEN + "To convert text to binary, press '1'.\n\n" + 
                    Fore.RED + "2. " + Fore.GREEN + "To convert binary to text, press '2'.\n\n" + 
                    Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")
sub_head_options = text(Fore.RED + "1. " + Fore.GREEN + "To continue converting, press '1'\n\n" + 
                        Fore.RED + "2. " + Fore.GREEN + "To return the main page, press '2'.\n\n" + 
                        Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")

#-----------------------------------------------CHOICE FUNCTION---------------------------------------------------------

def choice():
    while True:
        select = input("Select:")
        if select == "1":
            return select
        elif select == "2":
            return select
        elif select == "3":
            quit()
        else:
            print(Fore.RED+"\nWarning: "+Fore.GREEN+"Please select one of the given numbers.\n")
            continue
        if select == "1" or select == "2":
            break

#---------------------------------------BUNDLING PROGRAM PARTS IN FUNCTION----------------------------------------------

def main_program():
    while True:
        head.title()
        head_options.paragraph()
        select = choice()
        while select == "1":
            sub_head_1.title()
            convert_text_to_binary()
            sub_head_options.paragraph()
            new_select = choice()
            if new_select == "2":
                break
        while select == "2":
            sub_head_2.title()
            convert_binary_to_text()
            sub_head_options.paragraph()
            new_select = choice()
            if new_select == "2":
                break

main_program()

所有关于字符集、编码的信息。ASCII中没有重音字符,拉丁-1中有ççççççççççççççççççç。一般来说,可以使用UTF-8编码Unicode,您可以将任何脚本、土耳其语、阿拉伯语等组合在一起。然而,UTF-8是一种多字节编码。进一步:351英镑;这些是HTML数字实体。嗨,为了在程序中进行特殊的更改。我应该在哪里更改字符的编码?
#-----------------------------IMPORTING Fore and init FUNCTIONS FROM COLORAMA MODULE------------------------------------

from colorama import Fore,init
init(autoreset=True)

#--------------------------------------------DICTIONARY FUNCTION--------------------------------------------------------

def dictionary():
    ascii_dictionary = {chr(i): bin(i)[2:] for i in range(128)}
    for i in ascii_dictionary:
        if len(ascii_dictionary[i]) < 8:
            count = 8 - len(ascii_dictionary[i])
            ascii_dictionary[i] = "".zfill(count) + ascii_dictionary[i]
    non_ascii_dictionary = {chr(i): bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[2:10] + " " +
                        bin(int(bytes(chr(i).encode(encoding="utf-8")).hex(), 16))[10:18] for i in range(128, 512)}
    dictionary = ascii_dictionary.copy()
    dictionary.update(non_ascii_dictionary)
    return dictionary

#--------------------------------------------CONVERTER FUNCTIONS--------------------------------------------------------

def convert_text_to_binary():
    text_1 = input("Write A Text:")
    print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
    return_value = dictionary()
    list_1 = [return_value[j] for i in text_1 for j in return_value if i == j]
    text_2 = " ".join(list_1)
    print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
    with open("Text.to_Binary.txt", "a", encoding="utf-8") as file:
        file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))


def convert_binary_to_text():
    text_1 = input("Write Binary Numbers:")
    print(Fore.LIGHTBLUE_EX + "\n{}".format("-" * 80))
    list_1 = text_1.split()
    list_2 = [i for i in list_1 if i.startswith("1")]
    count = 0
    list_3 = []
    while count < len(list_2):
        list_3.append(" ".join(list_2[count:count + 2]))
        count += 2
    list_4 = []
    count = 0
    for i in list_1:
        if i.startswith("0"):
            list_4.append(i)
        elif i.startswith("1"):
            list_1.remove(i)
            list_4.append(list_3[count])
            count += 1
    text_2 = ""
    return_value = dictionary()
    for i in list_4:
        for j in return_value:
            if i == return_value[j]:
                text_2 += j
    print(Fore.RED + "\nOutput: " + Fore.GREEN + "{}\n\n".format(text_2) + Fore.LIGHTBLUE_EX + "{}\n".format("-" * 80))
    with open("Binary_to_Text.txt", "a", encoding="utf-8") as file:
        file.write("{0}\nInput: {2}\n\nOutput: {1}\n{0}".format("\n"+"-"*80+"\n", text_2, text_1))

#------------------------------------------STYLING WITH TEXT CLASS------------------------------------------------------

class text():
    def __init__(self,name,style=Fore.LIGHTBLUE_EX+"-"*80):
        self.name=name
        self.style=style
    def title(self):
        print("\n{0}\n\n{1}\n\n{0}\n".format(self.style, str(self.name).center(80, " ")))
    def paragraph(self):
        print("{0}\n\n{1}\n".format(self.name, self.style))

#-----------------------------------------------TEXT INSTANCES----------------------------------------------------------

head = text(Fore.RED + "WELCOME TO THE CONVERTER")
sub_head_1 = text(Fore.RED + "CONVERT TEXT TO BINARY")
sub_head_2 = text(Fore.RED + "CONVERT BINARY TO TEXT")
head_options = text(Fore.RED + "1. " + Fore.GREEN + "To convert text to binary, press '1'.\n\n" + 
                    Fore.RED + "2. " + Fore.GREEN + "To convert binary to text, press '2'.\n\n" + 
                    Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")
sub_head_options = text(Fore.RED + "1. " + Fore.GREEN + "To continue converting, press '1'\n\n" + 
                        Fore.RED + "2. " + Fore.GREEN + "To return the main page, press '2'.\n\n" + 
                        Fore.RED + "3. " + Fore.GREEN + "To exit the program, press '3'.")

#-----------------------------------------------CHOICE FUNCTION---------------------------------------------------------

def choice():
    while True:
        select = input("Select:")
        if select == "1":
            return select
        elif select == "2":
            return select
        elif select == "3":
            quit()
        else:
            print(Fore.RED+"\nWarning: "+Fore.GREEN+"Please select one of the given numbers.\n")
            continue
        if select == "1" or select == "2":
            break

#---------------------------------------BUNDLING PROGRAM PARTS IN FUNCTION----------------------------------------------

def main_program():
    while True:
        head.title()
        head_options.paragraph()
        select = choice()
        while select == "1":
            sub_head_1.title()
            convert_text_to_binary()
            sub_head_options.paragraph()
            new_select = choice()
            if new_select == "2":
                break
        while select == "2":
            sub_head_2.title()
            convert_binary_to_text()
            sub_head_options.paragraph()
            new_select = choice()
            if new_select == "2":
                break

main_program()