在Python中,如何判断模块是否来自C扩展?

在Python中,如何判断模块是否来自C扩展?,python,python-c-extension,extension-modules,Python,Python C Extension,Extension Modules,从Python中判断导入的模块是否来自于纯Python模块,正确或最健壮的方法是什么?例如,如果一个Python包有一个同时包含纯Python实现和C实现的模块,并且您希望能够在运行时知道正在使用哪个模块,那么这是非常有用的 一个想法是检查模块的文件扩展名。但是我不确定应该检查所有的文件扩展名,以及这种方法是否一定是最可靠的。首先,我认为这根本没有用。对于模块来说,在C扩展模块周围使用纯Python包装是非常常见的,或者在某些情况下,在C扩展模块周围使用纯Python包装(如果有),或者如果没有

从Python中判断导入的模块是否来自于纯Python模块,正确或最健壮的方法是什么?例如,如果一个Python包有一个同时包含纯Python实现和C实现的模块,并且您希望能够在运行时知道正在使用哪个模块,那么这是非常有用的


一个想法是检查
模块的文件扩展名。但是我不确定应该检查所有的文件扩展名,以及这种方法是否一定是最可靠的。

首先,我认为这根本没有用。对于模块来说,在C扩展模块周围使用纯Python包装是非常常见的,或者在某些情况下,在C扩展模块周围使用纯Python包装(如果有),或者如果没有,则使用纯Python实现

对于一些流行的第三方示例:
numpy
是纯Python,尽管所有重要的东西都是用C实现的
bintrees
是纯Python,尽管它的类可能全部用C或Python实现,这取决于您构建它的方式;等等

从3.2开始的大多数stdlib都是如此。例如,如果您只是
导入pickle
,那么实现类将在CPython中以C语言构建(您过去从2.7中的
cpickle
中获得),而它们在pypypy中是纯Python版本,但无论哪种方式
pickle
本身都是纯Python


但如果你真的想这样做,你实际上需要区分三件事:

  • 内置模块,如
    sys
  • C扩展模块,如2.x的
    cpickle
  • 纯Python模块,如2.x的
    pickle
假设你只关心CPython;如果您的代码运行在Jython或IronPython中,那么实现可能是JVM或.NET,而不是本机代码

您无法根据
\uuuuu文件\uuuu
完全区分,原因有很多:

  • 内置模块根本没有
    \uuuuuuu文件
    。(这在一些地方有记录,例如,
    inspect
    文档中的表格。)请注意,如果您使用的是类似于
    py2app
    cx\u freeze
    ,则“内置”可能与独立安装不同
  • 纯Python模块可能有一个.pyc/.pyo文件,而在分布式应用程序中没有.py文件
  • 作为单个文件egg安装的包中的模块(与
    easy\u install
    常见,与
    pip
    不太常见)将具有空白或无用的
    \u文件
  • 如果您构建一个二进制发行版,那么很有可能您的整个库都打包在一个zip文件中,从而导致与单个文件相同的问题

在3.1+中,导入过程被大量清理,大部分用Python重写,并且大部分暴露在Python层

因此,您可以使用该模块查看用于加载模块的加载程序链,最终您将访问
BuiltinImporter
(内置)、
ExtensionFileLoader
(.So/.pyd/等)、
SourceFileLoader
(.py)或
SourcelessFileLoader
(.pyc/.pyo)

在当前目标平台上,您还可以在
importlib.machine
中看到分配给四个后缀中每一个的常量。因此,您可以检查
any(importlib.machine.EXTENSION\u后缀中的后缀为pathname.endswith(后缀))
,但这实际上对例如egg/zip的情况没有帮助,除非您已经沿着链向上移动过


任何人提出的最好的启发式方法都是在
inspect
模块中实现的,所以最好的方法就是使用它

最佳选择是
getsource
getsourcefile
getfile
中的一个或多个;哪一个是最好的取决于你想要什么样的启发法

内置模块将为其中任何一个模块引发
类型错误

扩展模块应该为
getsourcefile
返回一个空字符串。这似乎适用于我所有的2.5-3.4版本,但我周围没有2.4。对于
getsource
,至少在某些版本中,它返回.so文件的实际字节,即使它应该返回空字符串或引发
IOError
。(在3.x中,您几乎肯定会得到
UnicodeError
SyntaxeError
,但您可能不想依赖它……)

如果在egg/zip/etc中,纯Python模块可能会为
getsourcefile
返回一个空字符串。如果源代码可用,即使在egg/zip/etc中,它们也应该为
getsource
返回一个非空字符串,但如果它们是无源字节码(.pyc/etc),它们将返回一个空字符串或引发一个IOError

最好的办法是在您关心的分发/设置中所关心的平台上试验您关心的版本

tl;dr

请参阅下面的“寻求完美”小节,以获得经过充分测试的答案

作为对可移植识别C扩展所涉及的微妙之处的语用对比,堆栈溢出生成™ 礼物实际答案。

可靠区分C扩展和非C扩展的能力非常有用,如果没有这种能力,Python社区将一贫如洗。现实世界的用例包括:

  • 应用程序冻结,将一个跨平台Python代码库转换为多个特定于平台的可执行文件。这是这里的标准示例。识别C扩展对于健壮的冻结至关重要。如果被冻结的代码库导入的模块是C扩展,则所有通过该C扩展传递链接到的外部共享库也必须使用该代码库冻结。可耻的自白:我向你道歉
  • 应用程序
    import inspect, os
    from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
    from types import ModuleType
    
    def is_c_extension(module: ModuleType) -> bool:
        '''
        `True` only if the passed module is a C extension implemented as a
        dynamically linked shared library specific to the current platform.
    
        Parameters
        ----------
        module : ModuleType
            Previously imported module object to be tested.
    
        Returns
        ----------
        bool
            `True` only if this module is a C extension.
        '''
        assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)
    
        # If this module was loaded by a PEP 302-compliant CPython-specific loader
        # loading only C extensions, this module is a C extension.
        if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
            return True
    
        # Else, fallback to filetype matching heuristics.
        #
        # Absolute path of the file defining this module.
        module_filename = inspect.getfile(module)
    
        # "."-prefixed filetype of this path if any or the empty string otherwise.
        module_filetype = os.path.splitext(module_filename)[1]
    
        # This module is only a C extension if this path's filetype is that of a
        # C extension specific to the current platform.
        return module_filetype in EXTENSION_SUFFIXES
    
    >>> import os
    >>> import importlib.machinery as im
    >>> import _elementtree as et
    >>> import numpy.core.multiarray as ma
    >>> for module in (os, im, et, ma):
    ...     print('Is "{}" a C extension? {}'.format(
    ...         module.__name__, is_c_extension(module)))
    Is "os" a C extension? False
    Is "importlib.machinery" a C extension? False
    Is "_elementtree" a C extension? True
    Is "numpy.core.multiarray" a C extension? True
    
    def is_c(module):
    
        # if module is part of the main python library (e.g. os), it won't have a path
        try:
            for path, subdirs, files in os.walk(module.__path__[0]):
    
                for f in files:
                    ftype = f.split('.')[-1]
                    if ftype == 'so':
                        is_c = True
                        break
            return is_c
    
        except AttributeError:
    
            path = inspect.getfile(module)
            suffix = path.split('.')[-1]
    
            if suffix != 'so':
    
                return False
    
            elif suffix == 'so':
    
                return True
    
    is_c(os), is_c(im), is_c(et), is_c_extension(ma), is_c(numpy)
    # (False, False, True, True, True)
    
    from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
    import inspect
    import logging
    import os
    import os.path
    import pkgutil
    from types import ModuleType
    from typing import List
    
    log = logging.getLogger(__name__)
    
    
    def is_builtin_module(module: ModuleType) -> bool:
        """
        Is this module a built-in module, like ``os``?
        Method is as per :func:`inspect.getfile`.
        """
        return not hasattr(module, "__file__")
    
    
    def is_module_a_package(module: ModuleType) -> bool:
        assert inspect.ismodule(module)
        return os.path.basename(inspect.getfile(module)) == "__init__.py"
    
    
    def is_c_extension(module: ModuleType) -> bool:
        """
        Modified from
        https://stackoverflow.com/questions/20339053/in-python-how-can-one-tell-if-a-module-comes-from-a-c-extension.
    
        ``True`` only if the passed module is a C extension implemented as a
        dynamically linked shared library specific to the current platform.
    
        Args:
            module: Previously imported module object to be tested.
    
        Returns:
            bool: ``True`` only if this module is a C extension.
    
        Examples:
    
        .. code-block:: python
    
            from cardinal_pythonlib.modules import is_c_extension
    
            import os
            import _elementtree as et
            import numpy
            import numpy.core.multiarray as numpy_multiarray
    
            is_c_extension(os)  # False
            is_c_extension(numpy)  # False
            is_c_extension(et)  # False on my system (Python 3.5.6). True in the original example.
            is_c_extension(numpy_multiarray)  # True
    
        """  # noqa
        assert inspect.ismodule(module), f'"{module}" not a module.'
    
        # If this module was loaded by a PEP 302-compliant CPython-specific loader
        # loading only C extensions, this module is a C extension.
        if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
            return True
    
        # If it's built-in, it's not a C extension.
        if is_builtin_module(module):
            return False
    
        # Else, fallback to filetype matching heuristics.
        #
        # Absolute path of the file defining this module.
        module_filename = inspect.getfile(module)
    
        # "."-prefixed filetype of this path if any or the empty string otherwise.
        module_filetype = os.path.splitext(module_filename)[1]
    
        # This module is only a C extension if this path's filetype is that of a
        # C extension specific to the current platform.
        return module_filetype in EXTENSION_SUFFIXES
    
    
    def contains_c_extension(module: ModuleType,
                             import_all_submodules: bool = True,
                             include_external_imports: bool = False,
                             seen: List[ModuleType] = None,
                             verbose: bool = False) -> bool:
        """
        Extends :func:`is_c_extension` by asking: is this module, or any of its
        submodules, a C extension?
    
        Args:
            module: Previously imported module object to be tested.
            import_all_submodules: explicitly import all submodules of this module?
            include_external_imports: check modules in other packages that this
                module imports?
            seen: used internally for recursion (to deal with recursive modules);
                should be ``None`` when called by users
            verbose: show working via log?
    
        Returns:
            bool: ``True`` only if this module or one of its submodules is a C
            extension.
    
        Examples:
    
        .. code-block:: python
    
            import logging
    
            import _elementtree as et
            import os
    
            import arrow
            import alembic
            import django
            import numpy
            import numpy.core.multiarray as numpy_multiarray
    
            log = logging.getLogger(__name__)
            logging.basicConfig(level=logging.DEBUG)  # be verbose
    
            contains_c_extension(os)  # False
            contains_c_extension(et)  # False
    
            contains_c_extension(numpy)  # True -- different from is_c_extension()
            contains_c_extension(numpy_multiarray)  # True
    
            contains_c_extension(arrow)  # False
    
            contains_c_extension(alembic)  # False
            contains_c_extension(alembic, include_external_imports=True)  # True
            # ... this example shows that Alembic imports hashlib, which can import
            #     _hashlib, which is a C extension; however, that doesn't stop us (for
            #     example) installing Alembic on a machine with no C compiler
    
            contains_c_extension(django)
    
        """  # noqa
        assert inspect.ismodule(module), f'"{module}" not a module.'
    
        if seen is None:  # only true for the top-level call
            seen = []  # type: List[ModuleType]
        if module in seen:  # modules can "contain" themselves
            # already inspected; avoid infinite loops
            return False
        seen.append(module)
    
        # Check the thing we were asked about
        is_c_ext = is_c_extension(module)
        if verbose:
            log.info(f"Is module {module!r} a C extension? {is_c_ext}")
        if is_c_ext:
            return True
        if is_builtin_module(module):
            # built-in, therefore we stop searching it
            return False
    
        # Now check any children, in a couple of ways
    
        top_level_module = seen[0]
        top_path = os.path.dirname(top_level_module.__file__)
    
        # Recurse using dir(). This picks up modules that are automatically
        # imported by our top-level model. But it won't pick up all submodules;
        # try e.g. for django.
        for candidate_name in dir(module):
            candidate = getattr(module, candidate_name)
            # noinspection PyBroadException
            try:
                if not inspect.ismodule(candidate):
                    # not a module
                    continue
            except Exception:
                # e.g. a Django module that won't import until we configure its
                # settings
                log.error(f"Failed to test ismodule() status of {candidate!r}")
                continue
            if is_builtin_module(candidate):
                # built-in, therefore we stop searching it
                continue
    
            candidate_fname = getattr(candidate, "__file__")
            if not include_external_imports:
                if os.path.commonpath([top_path, candidate_fname]) != top_path:
                    if verbose:
                        log.debug(f"Skipping, not within the top-level module's "
                                  f"directory: {candidate!r}")
                    continue
            # Recurse:
            if contains_c_extension(
                    module=candidate,
                    import_all_submodules=False,  # only done at the top level, below  # noqa
                    include_external_imports=include_external_imports,
                    seen=seen):
                return True
    
        if import_all_submodules:
            if not is_module_a_package(module):
                if verbose:
                    log.debug(f"Top-level module is not a package: {module!r}")
                return False
    
            # Otherwise, for things like Django, we need to recurse in a different
            # way to scan everything.
            # See https://stackoverflow.com/questions/3365740/how-to-import-all-submodules.  # noqa
            log.debug(f"Walking path: {top_path!r}")
            try:
                for loader, module_name, is_pkg in pkgutil.walk_packages([top_path]):  # noqa
                    if not is_pkg:
                        log.debug(f"Skipping, not a package: {module_name!r}")
                        continue
                    log.debug(f"Manually importing: {module_name!r}")
                    # noinspection PyBroadException
                    try:
                        candidate = loader.find_module(module_name)\
                            .load_module(module_name)  # noqa
                    except Exception:
                        # e.g. Alembic "autogenerate" gives: "ValueError: attempted
                        # relative import beyond top-level package"; or Django
                        # "django.core.exceptions.ImproperlyConfigured"
                        log.error(f"Package failed to import: {module_name!r}")
                        continue
                    if contains_c_extension(
                            module=candidate,
                            import_all_submodules=False,  # only done at the top level  # noqa
                            include_external_imports=include_external_imports,
                            seen=seen):
                        return True
            except Exception:
                log.error("Unable to walk packages further; no C extensions "
                          "detected so far!")
                raise
    
        return False
    
    
    # noinspection PyUnresolvedReferences,PyTypeChecker
    def test() -> None:
        import _elementtree as et
    
        import arrow
        import alembic
        import django
        import django.conf
        import numpy
        import numpy.core.multiarray as numpy_multiarray
    
        log.info(f"contains_c_extension(os): "
                 f"{contains_c_extension(os)}")  # False
        log.info(f"contains_c_extension(et): "
                 f"{contains_c_extension(et)}")  # False
    
        log.info(f"is_c_extension(numpy): "
                 f"{is_c_extension(numpy)}")  # False
        log.info(f"contains_c_extension(numpy): "
                 f"{contains_c_extension(numpy)}")  # True
        log.info(f"contains_c_extension(numpy_multiarray): "
                 f"{contains_c_extension(numpy_multiarray)}")  # True  # noqa
    
        log.info(f"contains_c_extension(arrow): "
                 f"{contains_c_extension(arrow)}")  # False
    
        log.info(f"contains_c_extension(alembic): "
                 f"{contains_c_extension(alembic)}")  # False
        log.info(f"contains_c_extension(alembic, include_external_imports=True): "
                 f"{contains_c_extension(alembic, include_external_imports=True)}")  # True  # noqa
        # ... this example shows that Alembic imports hashlib, which can import
        #     _hashlib, which is a C extension; however, that doesn't stop us (for
        #     example) installing Alembic on a machine with no C compiler
    
        django.conf.settings.configure()
        log.info(f"contains_c_extension(django): "
                 f"{contains_c_extension(django)}")  # False
    
    
    if __name__ == '__main__':
        logging.basicConfig(level=logging.INFO)  # be verbose
        test()
    
    import inspect, os
    import importlib
    from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
    from types import ModuleType
    
    # function from Cecil Curry's answer:
    
    def is_c_extension(module: ModuleType) -> bool:
        '''
        `True` only if the passed module is a C extension implemented as a
        dynamically linked shared library specific to the current platform.
    
        Parameters
        ----------
        module : ModuleType
            Previously imported module object to be tested.
    
        Returns
        ----------
        bool
            `True` only if this module is a C extension.
        '''
        assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)
    
        # If this module was loaded by a PEP 302-compliant CPython-specific loader
        # loading only C extensions, this module is a C extension.
        if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
            return True
    
        # Else, fallback to filetype matching heuristics.
        #
        # Absolute path of the file defining this module.
        module_filename = inspect.getfile(module)
    
        # "."-prefixed filetype of this path if any or the empty string otherwise.
        module_filetype = os.path.splitext(module_filename)[1]
    
        # This module is only a C extension if this path's filetype is that of a
        # C extension specific to the current platform.
        return module_filetype in EXTENSION_SUFFIXES
    
    
    with open('requirements.txt') as f:
        lines = f.readlines()
        for line in lines:
            # super lazy pip name to library name conversion
            # there is probably a better way to do this.
            lib = line.split("=")[0].replace("python-","").replace("-","_").lower()
            try:
                mod = importlib.import_module(lib)
                print(f"is {lib} a c extension? : {is_c_extension(mod)}")
            except:
                print(f"could not check {lib}, perhaps the name for imports is different?")