如何在python ctypes中使用UTF-16？_Python_Ctypes_Utf 16_Python Unicode_Wchar T

如何在python ctypes中使用UTF-16？

python

如何在python ctypes中使用UTF-16？,python,ctypes,utf-16,python-unicode,wchar-t,Python,Ctypes,Utf 16,Python Unicode,Wchar T,我有一个外国的C库，它在API中使用utf-16：作为函数参数、返回值和结构成员在Windows上，ctypes.c_wchar_p可以，但在OSX下，ctypes在c_wchar中使用UCS-32，我找不到支持utf-16的方法以下是我的研究：使用_SimpleCData子类化到它允许utf-16到Python字符串的透明转换可以作为C结构成员放置但是它不允许将字符串作为参数处理，它的from_param（）方法从未被调用（为什么？）： func（'str'，b'W\x00B\x

我有一个外国的C库，它在API中使用utf-16：作为函数参数、返回值和结构成员

在Windows上，ctypes.c_wchar_p可以，但在OSX下，ctypes在c_wchar中使用UCS-32，我找不到支持utf-16的方法

以下是我的研究：

使用_SimpleCData子类化到

它允许utf-16到Python字符串的透明转换
可以作为C结构成员放置
但是它不允许将字符串作为参数处理，它的
```
from_param（）
```
方法从未被调用（为什么？）：
```
func（'str'，b'W\x00B\x00\x00\x00'）#未经转换而传递
```


将自己的类型与from_param（）
方法一起使用

优点：可以使用构造函数初始化，也可以在向函数传递字符串时动态编码：
缺点：不能用作函数返回类型或结构成员

这是：
ustr = myutf16('hello')
func(ustr)
func('hello')   # calls myutf16.from_param('hello')

您可以在c_char\u p
子类中重写from_param
，将unicode
字符串编码为UTF-16。您可以添加\u check\u retval\u
方法，将UTF-16结果解码为unicode
字符串。对于结构字段，可以使用处理设置和获取属性的描述符类。将字段设置为c\u char\u p
类型的私有\u名称
，并将描述符设置为公共名称
。例如：
import sys
import ctypes

if sys.version_info[0] > 2:
    unicode = str

def decode_utf16_from_address(address, byteorder='little',
                              c_char=ctypes.c_char):
    if not address:
        return None
    if byteorder not in ('little', 'big'):
        raise ValueError("byteorder must be either 'little' or 'big'")
    chars = []
    while True:
        c1 = c_char.from_address(address).value
        c2 = c_char.from_address(address + 1).value
        if c1 == b'\x00' and c2 == b'\x00':
            break
        chars += [c1, c2]
        address += 2
    if byteorder == 'little':
        return b''.join(chars).decode('utf-16le')
    return b''.join(chars).decode('utf-16be')

class c_utf16le_p(ctypes.c_char_p):
    def __init__(self, value=None):
        super(c_utf16le_p, self).__init__()
        if value is not None:
            self.value = value

    @property
    def value(self,
              c_void_p=ctypes.c_void_p):
        addr = c_void_p.from_buffer(self).value
        return decode_utf16_from_address(addr, 'little')

    @value.setter
    def value(self, value,
              c_char_p=ctypes.c_char_p):
        value = value.encode('utf-16le') + b'\x00'
        c_char_p.value.__set__(self, value)

    @classmethod
    def from_param(cls, obj):
        if isinstance(obj, unicode):
            obj = obj.encode('utf-16le') + b'\x00'
        return super(c_utf16le_p, cls).from_param(obj)

    @classmethod
    def _check_retval_(cls, result):
        return result.value

class UTF16LEField(object):
    def __init__(self, name):
        self.name = name

    def __get__(self, obj, cls,
                c_void_p=ctypes.c_void_p,
                addressof=ctypes.addressof):
        field_addr = addressof(obj) + getattr(cls, self.name).offset
        addr = c_void_p.from_address(field_addr).value
        return decode_utf16_from_address(addr, 'little')

    def __set__(self, obj, value):
        value = value.encode('utf-16le') + b'\x00'
        setattr(obj, self.name, value)

示例：
if __name__ == '__main__':
    class Test(ctypes.Structure):
        _fields_ = (('x', ctypes.c_int),
                    ('y', ctypes.c_void_p),
                    ('_string', ctypes.c_char_p))
        string = UTF16LEField('_string')

    print('test 1: structure field')
    t = Test()
    t.string = u'eggs and spam'
    print(t.string)

    print('test 2: parameter and result')
    result = None

    @ctypes.CFUNCTYPE(c_utf16le_p, c_utf16le_p)
    def testfun(string):
        global result
        print('parameter: %s' % string.value)
        # callbacks leak memory except for simple return
        # values such as an integer address, so return the
        # address of a global variable.
        result = c_utf16le_p(string.value + u' and eggs')
        return ctypes.c_void_p.from_buffer(result).value

    print('result: %s' % testfun(u'spam'))

输出：
if __name__ == '__main__':
    class Test(ctypes.Structure):
        _fields_ = (('x', ctypes.c_int),
                    ('y', ctypes.c_void_p),
                    ('_string', ctypes.c_char_p))
        string = UTF16LEField('_string')

    print('test 1: structure field')
    t = Test()
    t.string = u'eggs and spam'
    print(t.string)

    print('test 2: parameter and result')
    result = None

    @ctypes.CFUNCTYPE(c_utf16le_p, c_utf16le_p)
    def testfun(string):
        global result
        print('parameter: %s' % string.value)
        # callbacks leak memory except for simple return
        # values such as an integer address, so return the
        # address of a global variable.
        result = c_utf16le_p(string.value + u' and eggs')
        return ctypes.c_void_p.from_buffer(result).value

    print('result: %s' % testfun(u'spam'))

测试1：结构字段
鸡蛋和垃圾邮件
测试2：参数和结果
参数：垃圾邮件
结果：垃圾和鸡蛋
是否必须使用“ctypes”而不是“unicode”和“codec”？这是非常可取的。当然，我可以手动编码和解码utf-16，但接下来我需要为每个函数调用创建大量包装。我认为从安全/易懂的角度来看，只使用“unicode”对象或“utf-8”从长远来看，与其他方法相比，在内部使用字符串并仅在调用其他系统和库时执行转换会好得多。我不会在系统中传递其他类型的字符串，除非编码/解码开销严格要求这样做。混合使用大量不同的字符串类型会使代码非常具有挑战性。如果您仅使用Python 2，则可以调用ctypes。设置转换模式（'utf-16le'，'strict'）
，它允许您通过转换为临时utf-16缓冲区来传递unicode
字符串。类似地，它允许您将unicode
分配给c\u char\p
struct字段。但我不推荐这种方法，因为它没有被getfunc行为反映出来，它仍然是一个以null结尾的char*
，而且它不适用于Python 3。@eryksun感谢描述符的提及，它听起来很有趣。它很棒！唯一的问题是，我是否可以直接从c_utf16le_p访问“str”方法而不使用“.value”？您可以将调用unicode
（3.xstr
）方法的方法添加到self.value
上，并添加一个\uu unicode\uu
（3.x\uu-str
）特殊的打印方法，等等。当值为NULL
指针时，需要说明该值为None
。但是作为函数结果，它已经返回了unicode
字符串，所以我不知道添加这些方法有多有用。