Python 如何更改文件中的字节？_Python_Unicode_Binary_Ascii_Encode

Python 如何更改文件中的字节？

python unicode binary

Python 如何更改文件中的字节？,python,unicode,binary,ascii,encode,Python,Unicode,Binary,Ascii,Encode,我正在做一个加密程序，我需要以二进制模式打开文件以访问非ascii和不可打印字符，我需要检查文件中的字符是字母、数字、符号还是不可打印字符。这意味着我必须逐个检查字节（当它们被解码为ascii时）是否与以下任何字符匹配： {^9,dzEV=Q4ciT+/s};fnq3BFh% #2!k7>YSU<GyD\I]|OC_e.W0M~ua-jR5lv1wA`@8t*xr'K"[P)&b:g$p(mX6Ho?JNZL 要以二进制模式打开文件，请使用open（“filena.me”，

我正在做一个加密程序，我需要以二进制模式打开文件以访问非ascii和不可打印字符，我需要检查文件中的字符是字母、数字、符号还是不可打印字符。这意味着我必须逐个检查字节（当它们被解码为ascii时）是否与以下任何字符匹配：

{^9,dzEV=Q4ciT+/s};fnq3BFh% #2!k7>YSU<GyD\I]|OC_e.W0M~ua-jR5lv1wA`@8t*xr'K"[P)&b:g$p(mX6Ho?JNZL

要以二进制模式打开文件，请使用

open（“filena.me”，“rb”）

命令。我从来没有亲自使用过该命令，但这应该可以为您提供所需的信息。

Python中有两种主要的字符串类型：表示二进制数据的ByTestRing（字节序列）和表示人类可读文本的Unicode字符串（Unicode码点序列）。把一个转换成另一个很简单(☯):

如果以二进制模式打开文件，例如，

'rb'

，则

file.read（）

返回bytestring（

字节

类型）：

有几种方法可用于对字节进行分类：

字符串方法，例如
```
bytes.isdigit（）
```
：

字符串常量，如

string.printable

>>> import string
>>> b'!' in string.printable.encode()
True

正则表达式，如

\d

>>> import re
>>> bool(re.match(br'\d+$', b'123'))
True

```
curses.ascii
```
模块中的分类函数，例如，
```
curses.ascii.isprint（）
```

bytearray

是一个可变的字节序列-与bytestring不同，您可以在适当的位置将其更改为小写，例如，每3个字节为大写：

>>> import string
>>> a = bytearray(b'ABCDEF_')
>>> uppercase = string.ascii_uppercase.encode()
>>> a[::3] = [b | 0b0100000 if b in uppercase else b 
...           for b in a[::3]]
>>> a
bytearray(b'aBCdEF_')

注意：

b'ad'

是小写的，但是

b'

保持不变

要就地修改二进制文件，可以使用

mmap

模块，例如，在

'file'

中每隔一行将第4列小写：

#!/usr/bin/env python3
import mmap
import string

uppercase = string.ascii_uppercase.encode()
ncolumn = 3 # select 4th column
with open('file', 'r+b') as file, \
     mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_WRITE) as mm:
    while True:
        mm.readline()   # ignore every other line
        pos = mm.tell() # remember current position
        if not mm.readline(): # EOF
            break
        if mm[pos + ncolumn] in uppercase:
            mm[pos + ncolumn] |= 0b0100000 # lowercase

注意：在本例中，Python2和3 API有所不同。代码使用Python3

输入输出注意：第2行和第4h行的第4列变为小写

通常，如果要更改文件：从文件中读取，对临时文件写入修改，成功后将临时文件移到原始文件的位置：

#!/usr/bin/env python3
import os
import string
from tempfile import NamedTemporaryFile

caesar_shift = 3
filename = 'file'

def caesar_bytes(plaintext, shift, alphabet=string.ascii_lowercase.encode()):
    shifted_alphabet = alphabet[shift:] + alphabet[:shift]
    return plaintext.translate(plaintext.maketrans(alphabet, shifted_alphabet))

dest_dir = os.path.dirname(filename)
chunksize = 1 << 15
with open(filename, 'rb') as file, \
     NamedTemporaryFile('wb', dir=dest_dir, delete=False) as tmp_file:
    while True: # encrypt
        chunk = file.read(chunksize)
        if not chunk: # EOF
            break
        tmp_file.write(caesar_bytes(chunk, caesar_shift))
os.replace(tmp_file.name, filename)

输出

要转换回输出，请设置

caesar\u shift=-3

Python 2或Python 3？我之所以问这个问题，是因为字节对象与Python 3中的本机字符串不同。非常感谢，它确实帮了我很多忙。：D

>>> import re
>>> bool(re.match(br'\d+$', b'123'))
True

>>> from curses import ascii
>>> bytearray(filter(ascii.isprint, b'123'))
bytearray(b'123')

>>> import string
>>> a = bytearray(b'ABCDEF_')
>>> uppercase = string.ascii_uppercase.encode()
>>> a[::3] = [b | 0b0100000 if b in uppercase else b 
...           for b in a[::3]]
>>> a
bytearray(b'aBCdEF_')

#!/usr/bin/env python3
import mmap
import string

uppercase = string.ascii_uppercase.encode()
ncolumn = 3 # select 4th column
with open('file', 'r+b') as file, \
     mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_WRITE) as mm:
    while True:
        mm.readline()   # ignore every other line
        pos = mm.tell() # remember current position
        if not mm.readline(): # EOF
            break
        if mm[pos + ncolumn] in uppercase:
            mm[pos + ncolumn] |= 0b0100000 # lowercase

ABCDE1
FGHIJ
ABCDE
FGHI

ABCDE1
FGHiJ
ABCDE
FGHi

#!/usr/bin/env python3
import os
import string
from tempfile import NamedTemporaryFile

caesar_shift = 3
filename = 'file'

def caesar_bytes(plaintext, shift, alphabet=string.ascii_lowercase.encode()):
    shifted_alphabet = alphabet[shift:] + alphabet[:shift]
    return plaintext.translate(plaintext.maketrans(alphabet, shifted_alphabet))

dest_dir = os.path.dirname(filename)
chunksize = 1 << 15
with open(filename, 'rb') as file, \
     NamedTemporaryFile('wb', dir=dest_dir, delete=False) as tmp_file:
    while True: # encrypt
        chunk = file.read(chunksize)
        if not chunk: # EOF
            break
        tmp_file.write(caesar_bytes(chunk, caesar_shift))
os.replace(tmp_file.name, filename)

abc
def
ABC
DEF

def
ghi
ABC
DEF