Python 如何在文件中查找字节序列？_Python

Python 如何在文件中查找字节序列？

python

Python 如何在文件中查找字节序列？,python,Python,我有一个二进制文件，其中需要更改某些位该位的字节地址相对于某个字节序列（某个ASCII字符串）：然而，正如我必须承认的那样，在谷歌搜索了很多地方之后，我找不到获取“ABC”字符串索引的方法当然，我可以编写一个用循环实现的函数，但我不敢相信没有一个线性函数（好的，甚至两个…）可以实现它如何做到这一点呢？不确定这是否是最具python风格的方式，但这是有效的。在这个文件中 $ cat so.bin ��ABC̻�X��w $ hexdump so.bin 0000000 eeff

我有一个二进制文件，其中需要更改某些位

该位的字节地址相对于某个字节序列（某个ASCII字符串）：

然而，正如我必须承认的那样，在谷歌搜索了很多地方之后，我找不到获取“

ABC

”字符串索引的方法

当然，我可以编写一个用循环实现的函数，但我不敢相信没有一个线性函数（好的，甚至两个…）可以实现它

如何做到这一点呢？

不确定这是否是最具python风格的方式，但这是有效的。在这个文件中

$ cat so.bin    
���ABC̻�X��w
$ hexdump so.bin
0000000 eeff 41dd 4342 bbcc 58aa 8899 0a77     
000000e

编辑：新的解决方案从这里开始

import string

char_ints = [ord(c) for c in string.ascii_letters]

with open("so.out.bin", "wb") as fo:
    with open("so.bin", "rb") as fi:

        # Read bytes but only keep letters.
        chars = []
        for b in fi.read():
            if b in char_ints:
                chars.append(chr(b))
            else:
                chars.append(" ")

        # Search for 'ABC' in the read letters.
        pos = "".join(chars).index("ABC")

        # We now know the position of the intersting byte.
        pos_x = pos + len("ABC") + 3 # known offset

        # Now copy all bytes from the input to the output, ...
        fi.seek(0)
        i = 0
        for b in fi.read():
            # ... but replace the intersting byte.
            if i == pos_x:
                fo.write(b"Y")
            else:
                fo.write(bytes([b]))
            i = i + 1

编辑：新解决方案到此结束

我想在

ABC

之后获得

四个位置。稍微保持一点状态可以定位ABC的位置，跳过偏移量，打印感兴趣的字节

foundA = False
foundB = False
foundC = False
found = False
offsetAfterC = 3
lengthAfterC = 1

with open("so.bin", "rb") as f:
    pos = 0
    for b in f.read():
        pos = pos + 1
        if not found:
            if b == 0x41:
                foundA = True
            elif foundA and b == 0x42:
                foundB = True
            elif foundA and foundB and b == 0x43:
                foundC = True
            else:
                foundA, foundB, foundC = False, False, False

        if foundA and foundB and foundC:
            found = True
            break

    f.seek(0)
    i = 0
    while i < pos + offsetAfterC:
        b = f.read(1)
        i = i + 1
    while i < pos + offsetAfterC + lengthAfterC:
        b = f.read(1)
        print(hex(int.from_bytes(b, byteorder="big")))
        i = i + 1

您的“尽可能高效”需求受到了很好的约束。如果速度是唯一需要考虑的问题，为什么还要使用Python而不是C或汇编？@timgeb，我去掉了这个限制。这不是这里的主要问题。但是，如果您坚持要回答，那么它就是一个构建脚本，它必须是一个脚本，而不是编译代码，并且还有许多其他文件需要更改，同时确保构建不会变得太慢。基本上我只是想避免使用不可变序列，我想就地更改数据。不用担心。“尽可能高效”的问题是，你可能会得到非音速的答案，这会在纳秒内牺牲大量的可读性。相反，试着用一种更严格的方式来描述所需的效率水平。

“ABC.encode（hex）

”的预期目的是什么？这是一个Python2方法，在2012年被调用。。。无论如何：由于它将

ABC

转换为

，您确定文本

应该出现在二进制文件中的某个位置吗？还是我误解了它的用途？@usr2564301，是的，这就是问题所在，

不是以字符串的形式出现的，而是以字节序列的形式出现的，这意味着在文件的某个地方有一个

[0x41，0x42，0x43]

的序列，我不知道如何：1：从字符串生成该序列，和2：如何在文件内容中定位该字节序列。我可以用abc=[abc中字母的ord（字母）]克服问题1，但第2个仍然失败。谢谢：-），但我希望更多的是一行代码。。。我有一种感觉，这是一种矫枉过正的做法，而且有一个更简单的解决方案。@Tar我添加了一个新的解决方案。这不是一行，但它更好。

foundA = False
foundB = False
foundC = False
found = False
offsetAfterC = 3
lengthAfterC = 1

with open("so.bin", "rb") as f:
    pos = 0
    for b in f.read():
        pos = pos + 1
        if not found:
            if b == 0x41:
                foundA = True
            elif foundA and b == 0x42:
                foundB = True
            elif foundA and foundB and b == 0x43:
                foundC = True
            else:
                foundA, foundB, foundC = False, False, False

        if foundA and foundB and foundC:
            found = True
            break

    f.seek(0)
    i = 0
    while i < pos + offsetAfterC:
        b = f.read(1)
        i = i + 1
    while i < pos + offsetAfterC + lengthAfterC:
        b = f.read(1)
        print(hex(int.from_bytes(b, byteorder="big")))
        i = i + 1

0x58