Python 开放式XML文档保护实现(documentProtection类)

Python 开放式XML文档保护实现(documentProtection类),python,hash,openxml,sha,hashlib,Python,Hash,Openxml,Sha,Hashlib,我试图用Python实现MS Word(2019)文档的开放XML documentProtection哈希保护,以测试哈希算法。所以我创建了一个Word文档,用这个密码保护它不被编辑:johnjohn。然后,以ZIP/XML格式打开文档,我在documentProtection部分看到以下内容: <w:documentProtection w:edit="readOnly" w:enforcement="1" w:cryptProviderType

我试图用Python实现MS Word(2019)文档的开放XML documentProtection哈希保护,以测试哈希算法。所以我创建了一个Word文档,用这个密码保护它不被编辑:johnjohn。然后,以ZIP/XML格式打开文档,我在
documentProtection
部分看到以下内容:

<w:documentProtection w:edit="readOnly" w:enforcement="1" w:cryptProviderType="rsaAES" w:cryptAlgorithmClass="hash" w:cryptAlgorithmType="typeAny" w:cryptAlgorithmSid="14" w:cryptSpinCount="100000" w:hash="pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw==" w:salt="pH1TDVHSfGBxkd3Q88UNhQ==" /> 
但是当我跑的时候

print(gethash('johnjohn',bsalt))
我得到以下哈希值,它不等于目标哈希值:

G47RT4/+JdE6pnrP6MqUKa3JyL8abeYSCX+E4+9J+6shiZqImBJ8M6bb+IMKEdvKd6+9dVnQ3oeOsgQz/aCdcQ==
我的实现可能在某个地方出错,或者您认为低级散列函数实现(Python与开放XML)有什么不同

更新 我意识到Word使用传统算法来预处理密码(与旧版本兼容)。第4部分(过渡迁移特性,#14.8.1“遗留密码哈希算法”)详细描述了该算法。因此,我成功地制作了一个脚本,再现了官方的ECMA示例:

def strtobytes(s,trunc=15):
b=s.encode('utf-16-le')
#删除BOM表符号(如果存在)
如果b[0]==0xfeff:b=b[1:]
pwdlen=最小值(trunc,len)
如果pwdlen<1:返回None
返回字节([b[i]或b[i+1]表示范围(0,pwdlen*2,2)])
def过程(pwd):
# 1. 准备PWD字符串(截断、转换为字节)
pw=strtobytes(pwd)if is instance(pwd,str)else pwd[:15]
pwdlen=len(pw)
# 2. 高字计算
HW=初始代码数组[pwdlen-1]
对于范围内的i(pwdlen):
r=15-pwdlen+i
对于范围(7)内的ibit:
如果(pw[i]&(0x0001>14)&0x0001)|((LW>14)&0x0001)|((LW>8,HW&0xff,HW>>8))
因此,当我执行
process_pwd('Example')
时,我得到了ECMA(
0x7EEDCE64
)中所说的内容。散列函数也被修改(正如我在论坛上发现的那样,最初的SALT+散列不应包含在主迭代循环中):

def gethash(数据,salt=None,alg='sha512',iters=100000,base64result=True,returnstring=True):
def hashit(what,alg='sha512'):
返回getattr(hashlib,alg)(什么)
#如果给定字符串,则使用传统算法对密码进行编码
如果存在(数据,str):
数据=过程(数据)
如果数据为无:
打印('错误的密码字符串!')
一无所获
#预先添加盐(如提供)
如果不是盐,则为无:
如果存在(盐、str):
salt=工艺流程(salt)
如果没有盐:
打印('错误的盐字符串!')
一无所获
ghash=盐+数据
其他:
ghash=数据
#初始散列(腌制)
ghash=hashit(ghash,alg).digest()
#迭代散列“iters”轮
对于范围内的i(iters):
尝试:
#下一个散列=散列(上一个数据+4字节整数(上一轮数),带LE字节顺序)
#ECMA-376-1:2016 17.15.1.29(第1020页)

ghash=hashit(ghash+struct.pack(“今天也不得不考虑这个问题,并设法对其进行反向工程

简单来说,步骤如下:

  • 将密码截断为15个字符(不清楚这是ASCII编码还是UTF8-一些文档引用了“Unicode密码”,但所有示例似乎都基于ASCII)。我的实现只是在UTF8转换后截短字节(保留ASCII集)
  • 根据密码长度从魔法列表中获取高阶单词。如果密码长度为0,则仅为两个零字节
  • 对于密码中的每个字节,根据其在加密矩阵中的位置获取位(注意最后一个字符始终对应于最后一行,如果密码小于15个字节,则矩阵的第一部分可能未使用).对于第一位到第七位,如果已设置,则对高阶字的当前值执行异或操作。对每个字符重复此操作
  • 抓取一个低阶字(2字节)并初始化为零。从密码中的最后一个字符开始,对每个字符执行该操作,然后继续操作:
    low-order-word=((low-order-word>>14)和0x0001)|(low-order-word>14)和0x0001)|(low-order-word非常感谢!这看起来像是生意。我现在需要用Python测试一下:)
    
    /// <summary>
    /// Class that generates hashes suitable for use with OpenXML Wordprocessing ML documents with the documentProtection element.
    /// </summary>
    public class WordprocessingMLDocumentProtectionHashGenerator
    {
        private static readonly byte[][] HighOrderWords = new byte[][]
        {
            new byte[] { 0xE1, 0xF0 },
            new byte[] { 0x1D, 0x0F },
            new byte[] { 0xCC, 0x9C },
            new byte[] { 0x84, 0xC0 },
            new byte[] { 0x11, 0x0C },
            new byte[] { 0x0E, 0x10 },
            new byte[] { 0xF1, 0xCE },
            new byte[] { 0x31, 0x3E },
            new byte[] { 0x18, 0x72 },
            new byte[] { 0xE1, 0x39 },
            new byte[] { 0xD4, 0x0F },
            new byte[] { 0x84, 0xF9 },
            new byte[] { 0x28, 0x0C },
            new byte[] { 0xA9, 0x6A },
            new byte[] { 0x4E, 0xC3 }
        };
    
        private static readonly byte[,,] EncryptionMatrix = new byte[,,]
        {
            { { 0xAE, 0xFC }, { 0x4D, 0xD9 }, { 0x9B, 0xB2 }, { 0x27, 0x45 }, { 0x4E, 0x8A }, { 0x9D, 0x14 }, { 0x2A, 0x09 } },
            { { 0x7B, 0x61 }, { 0xF6, 0xC2 }, { 0xFD, 0xA5 }, { 0xEB, 0x6B }, { 0xC6, 0xF7 }, { 0x9D, 0xCF }, { 0x2B, 0xBF } },
            { { 0x45, 0x63 }, { 0x8A, 0xC6 }, { 0x05, 0xAD }, { 0x0B, 0x5A }, { 0x16, 0xB4 }, { 0x2D, 0x68 }, { 0x5A, 0xD0 } },
            { { 0x03, 0x75 }, { 0x06, 0xEA }, { 0x0D, 0xD4 }, { 0x1B, 0xA8 }, { 0x37, 0x50 }, { 0x6E, 0xA0 }, { 0xDD, 0x40 } },
            { { 0xD8, 0x49 }, { 0xA0, 0xB3 }, { 0x51, 0x47 }, { 0xA2, 0x8E }, { 0x55, 0x3D }, { 0xAA, 0x7A }, { 0x44, 0xD5 } },
            { { 0x6F, 0x45 }, { 0xDE, 0x8A }, { 0xAD, 0x35 }, { 0x4A, 0x4B }, { 0x94, 0x96 }, { 0x39, 0x0D }, { 0x72, 0x1A } },
            { { 0xEB, 0x23 }, { 0xC6, 0x67 }, { 0x9C, 0xEF }, { 0x29, 0xFF }, { 0x53, 0xFE }, { 0xA7, 0xFC }, { 0x5F, 0xD9 } },
            { { 0x47, 0xD3 }, { 0x8F, 0xA6 }, { 0x0F, 0x6D }, { 0x1E, 0xDA }, { 0x3D, 0xB4 }, { 0x7B, 0x68 }, { 0xF6, 0xD0 } },
            { { 0xB8, 0x61 }, { 0x60, 0xE3 }, { 0xC1, 0xC6 }, { 0x93, 0xAD }, { 0x37, 0x7B }, { 0x6E, 0xF6 }, { 0xDD, 0xEC } },
            { { 0x45, 0xA0 }, { 0x8B, 0x40 }, { 0x06, 0xA1 }, { 0x0D, 0x42 }, { 0x1A, 0x84 }, { 0x35, 0x08 }, { 0x6A, 0x10 } },
            { { 0xAA, 0x51 }, { 0x44, 0x83 }, { 0x89, 0x06 }, { 0x02, 0x2D }, { 0x04, 0x5A }, { 0x08, 0xB4 }, { 0x11, 0x68 } },
            { { 0x76, 0xB4 }, { 0xED, 0x68 }, { 0xCA, 0xF1 }, { 0x85, 0xC3 }, { 0x1B, 0xA7 }, { 0x37, 0x4E }, { 0x6E, 0x9C } },
            { { 0x37, 0x30 }, { 0x6E, 0x60 }, { 0xDC, 0xC0 }, { 0xA9, 0xA1 }, { 0x43, 0x63 }, { 0x86, 0xC6 }, { 0x1D, 0xAD } },
            { { 0x33, 0x31 }, { 0x66, 0x62 }, { 0xCC, 0xC4 }, { 0x89, 0xA9 }, { 0x03, 0x73 }, { 0x06, 0xE6 }, { 0x0D, 0xCC } },
            { { 0x10, 0x21 }, { 0x20, 0x42 }, { 0x40, 0x84 }, { 0x81, 0x08 }, { 0x12, 0x31 }, { 0x24, 0x62 }, { 0x48, 0xC4 } }
        };
    
        /// <summary>
        /// Generates a base-64 string according to the Wordprocessing ML Document DocumentProtection security algorithm.
        /// </summary>
        /// <param name="password"></param>
        /// <param name="salt"></param>
        /// <param name="iterations"></param>
        /// <param name="hashAlgorithmName"></param>
        /// <returns></returns>
        public string GenerateHash(string password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithmName)
        {
            if (password == null)
            {
                throw new ArgumentNullException(nameof(password));
            }
    
            // Algorithm given in ECMA-374, 1st Edition, December 2006
            // https://www.ecma-international.org/wp-content/uploads/ecma-376_first_edition_december_2006.zip
            // Alternatively: https://c-rex.net/projects/samples/ooxml/e1/Part4/OOXML_P4_DOCX_documentProtection_topic_ID0EJVTX.html
            byte[] passwordBytes = Encoding.UTF8.GetBytes(password);
            passwordBytes = passwordBytes.Take(15).ToArray();
            int passwordLength = passwordBytes.Length;
    
            // If the password length is 0, the key is 0.
            byte[] highOrderWord = new byte[] { 0x00, 0x00 };
            if (passwordLength > 0)
            {
                highOrderWord = HighOrderWords[passwordLength - 1].ToArray();
            }
            for (int i = 0; i < passwordLength; i++)
            {
                byte passwordByte = passwordBytes[i];
                int encryptionMatrixIndex = i + (EncryptionMatrix.GetLength(0) - passwordLength);
    
                BitArray bitArray = passwordByte.ToBitArray();
    
                for (int j = 0; j < EncryptionMatrix.GetLength(1); j++)
                {
                    bool isSet = bitArray[j];
    
                    if (isSet)
                    {
                        for (int k = 0; k < EncryptionMatrix.GetLength(2); k++)
                        {
                            highOrderWord[k] = (byte)(highOrderWord[k] ^ EncryptionMatrix[encryptionMatrixIndex, j, k]);
                        }
                    }
                }
            }
    
            byte[] lowOrderWord = new byte[] { 0x00, 0x00 };
            BitSequence lowOrderBitSequence = lowOrderWord.ToBitSequence();
            BitSequence bitSequence1 = new byte[] { 0x00, 0x01 }.ToBitSequence();
            BitSequence bitSequence7FFF = new byte[] { 0x7F, 0xFF }.ToBitSequence();
    
            for (int i = passwordLength - 1; i >= 0; i--)
            {
                byte passwordByte = passwordBytes[i];
                lowOrderBitSequence = (((lowOrderBitSequence >> 14) & bitSequence1) | ((lowOrderBitSequence << 1) & bitSequence7FFF)) ^ new byte[] { 0x00, passwordByte }.ToBitSequence();
            }
    
            lowOrderBitSequence = (((lowOrderBitSequence >> 14) & bitSequence1) | ((lowOrderBitSequence << 1) & bitSequence7FFF)) ^ new byte[] { 0x00, (byte)passwordLength }.ToBitSequence() ^ new byte[] { 0xCE, 0x4B }.ToBitSequence();
            lowOrderWord = lowOrderBitSequence.ToByteArray();
    
            byte[] key = highOrderWord.Concat(lowOrderWord).ToArray();
            key = key.Reverse().ToArray();
    
            // https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oe376/fb220a2f-88d4-488c-a9b7-e094756b6699
            // In Word, an additional third stage is added to the process of hashing and storing a user supplied password.  In this third stage, the reversed byte order legacy hash from the second stage shall be converted to Unicode hex string representation [Example: If the single byte string 7EEDCE64 is converted to Unicode hex string it will be represented in memory as the following byte stream: 37 00 45 00 45 00 44 00 43 00 45 00 36 00 34 00. end example], and that value shall be hashed as defined by the attribute values.
            key = Encoding.Unicode.GetBytes(BitConverter.ToString(key).Replace("-", string.Empty));
    
            HashAlgorithm hashAlgorithm = hashAlgorithmName.Create();
    
            byte[] computedHash = key;
    
            if (salt != null)
            {
                computedHash = salt.Concat(key).ToArray();
            }
    
            // Word requires that the initial hash of the password with the salt not be considered in the count.
            computedHash = hashAlgorithm.ComputeHash(computedHash);
    
            for (int i = 0; i < iterations; i++)
            {
                // ISO/IEC 29500-1 Fourth Edition, 2016-11-01
                // 17.15.1.29 - spinCount
                // Specifies the number of times the hashing function shall be iteratively run (runs using each iteration''s result plus a 4 byte value (0-based, little endian) containing the number of the iteration as the input for the next iteration) when attempting to compare a user-supplied password with the value stored in the hashValue attribute.
                byte[] iterationBytes = BitConverter.GetBytes(i);
                computedHash = computedHash.Concat(iterationBytes).ToArray();
                computedHash = hashAlgorithm.ComputeHash(computedHash);
            }
    
            return Convert.ToBase64String(computedHash);
        }
    }
    
        [TestClass]
    [TestCategory("WordprocessingMLDocumentProtectionHashGenerator")]
    public class WordprocessingMLDocumentProtectionHashGeneratorTests
    {
        [TestMethod]
        public void GeneratesKnownHashes()
        {
            WordprocessingMLDocumentProtectionHashGenerator wordprocessingMLDocumentProtectionHashGenerator = new WordprocessingMLDocumentProtectionHashGenerator();
    
            Assert.AreEqual("sstT7oPzpUQTchSUE6WbidCrZv1c8k+/5D1Pm+weZt7QoaeSnBEg/cZFg2W+1eohg1mgXGXLci1CWbnbHDYsXQ==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Example", Convert.FromBase64String("KPr2WqWFihenPDtAmpqUtw=="), 100000, HashAlgorithmName.SHA512));
            Assert.AreEqual("uBuZhlyVTOQtRwQuOGjY7GU3FnJbe1VFKvN+j9u27HSbthOY+n1/daU/WCkqV40fG6HxX+pxgR+Ow4ZvAE7aZg==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Password", Convert.FromBase64String("On9D022mrdqvHTb6eEkFGA=="), 100000, HashAlgorithmName.SHA512));
            Assert.AreEqual("mkGbBri0a1icL1nJKTQL7PyLUY2Uei2wyMHC0Y6s1+DOMYvPWdB6cy0Npao15O0+yqtyZW4hAP0+dcdyrEk7qg==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Password", Convert.FromBase64String("On9D022mrdqvHTb6eEkFGA=="), 0, HashAlgorithmName.SHA512));
            Assert.AreEqual("qdPI8cSBM/21Mr29mfFrR6l7hIn8oLKKT1nTDXHsAQA=", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Testerman", Convert.FromBase64String("On9D022mrdqvHTb6eEkFGA=="), 100000, HashAlgorithmName.SHA256));
            Assert.AreEqual("d5FZvHnQhm6Mzqy6cYE7ZbniYXA/8qJxkAze0sFcNirWYhaLpScmSsfBHptuEmuBreLuNjyV5IjdUoOFWM9mbQ==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Password", null, 100000, HashAlgorithmName.SHA512));
            Assert.AreEqual("pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("johnjohn", Convert.FromBase64String("pH1TDVHSfGBxkd3Q88UNhQ=="), 100000, HashAlgorithmName.SHA512));
        }
    }