在Python中,如何判断模块是否来自C扩展?
从Python中判断导入的模块是否来自于纯Python模块,正确或最健壮的方法是什么?例如,如果一个Python包有一个同时包含纯Python实现和C实现的模块,并且您希望能够在运行时知道正在使用哪个模块,那么这是非常有用的在Python中,如何判断模块是否来自C扩展?,python,python-c-extension,extension-modules,Python,Python C Extension,Extension Modules,从Python中判断导入的模块是否来自于纯Python模块,正确或最健壮的方法是什么?例如,如果一个Python包有一个同时包含纯Python实现和C实现的模块,并且您希望能够在运行时知道正在使用哪个模块,那么这是非常有用的 一个想法是检查模块的文件扩展名。但是我不确定应该检查所有的文件扩展名,以及这种方法是否一定是最可靠的。首先,我认为这根本没有用。对于模块来说,在C扩展模块周围使用纯Python包装是非常常见的,或者在某些情况下,在C扩展模块周围使用纯Python包装(如果有),或者如果没有
一个想法是检查
模块的文件扩展名。但是我不确定应该检查所有的文件扩展名,以及这种方法是否一定是最可靠的。首先,我认为这根本没有用。对于模块来说,在C扩展模块周围使用纯Python包装是非常常见的,或者在某些情况下,在C扩展模块周围使用纯Python包装(如果有),或者如果没有,则使用纯Python实现
对于一些流行的第三方示例:numpy
是纯Python,尽管所有重要的东西都是用C实现的bintrees
是纯Python,尽管它的类可能全部用C或Python实现,这取决于您构建它的方式;等等
从3.2开始的大多数stdlib都是如此。例如,如果您只是导入pickle
,那么实现类将在CPython中以C语言构建(您过去从2.7中的cpickle
中获得),而它们在pypypy中是纯Python版本,但无论哪种方式pickle
本身都是纯Python
但如果你真的想这样做,你实际上需要区分三件事:
- 内置模块,如
sys
- C扩展模块,如2.x的
cpickle
- 纯Python模块,如2.x的
pickle
假设你只关心CPython;如果您的代码运行在Jython或IronPython中,那么实现可能是JVM或.NET,而不是本机代码
您无法根据\uuuuu文件\uuuu
完全区分,原因有很多:
- 内置模块根本没有
\uuuuuuu文件
。(这在一些地方有记录,例如,inspect
文档中的表格。)请注意,如果您使用的是类似于py2app
或cx\u freeze
,则“内置”可能与独立安装不同
- 纯Python模块可能有一个.pyc/.pyo文件,而在分布式应用程序中没有.py文件
- 作为单个文件egg安装的包中的模块(与
easy\u install
常见,与pip
不太常见)将具有空白或无用的\u文件
- 如果您构建一个二进制发行版,那么很有可能您的整个库都打包在一个zip文件中,从而导致与单个文件相同的问题
在3.1+中,导入过程被大量清理,大部分用Python重写,并且大部分暴露在Python层
因此,您可以使用该模块查看用于加载模块的加载程序链,最终您将访问BuiltinImporter
(内置)、ExtensionFileLoader
(.So/.pyd/等)、SourceFileLoader
(.py)或SourcelessFileLoader
(.pyc/.pyo)
在当前目标平台上,您还可以在importlib.machine
中看到分配给四个后缀中每一个的常量。因此,您可以检查any(importlib.machine.EXTENSION\u后缀中的后缀为pathname.endswith(后缀))
,但这实际上对例如egg/zip的情况没有帮助,除非您已经沿着链向上移动过
任何人提出的最好的启发式方法都是在inspect
模块中实现的,所以最好的方法就是使用它
最佳选择是getsource
、getsourcefile
和getfile
中的一个或多个;哪一个是最好的取决于你想要什么样的启发法
内置模块将为其中任何一个模块引发类型错误
扩展模块应该为getsourcefile
返回一个空字符串。这似乎适用于我所有的2.5-3.4版本,但我周围没有2.4。对于getsource
,至少在某些版本中,它返回.so文件的实际字节,即使它应该返回空字符串或引发IOError
。(在3.x中,您几乎肯定会得到UnicodeError
或SyntaxeError
,但您可能不想依赖它……)
如果在egg/zip/etc中,纯Python模块可能会为getsourcefile
返回一个空字符串。如果源代码可用,即使在egg/zip/etc中,它们也应该为getsource
返回一个非空字符串,但如果它们是无源字节码(.pyc/etc),它们将返回一个空字符串或引发一个IOError
最好的办法是在您关心的分发/设置中所关心的平台上试验您关心的版本
tl;dr
请参阅下面的“寻求完美”小节,以获得经过充分测试的答案
作为对可移植识别C扩展所涉及的微妙之处的语用对比,堆栈溢出生成™ 礼物实际答案。
可靠区分C扩展和非C扩展的能力非常有用,如果没有这种能力,Python社区将一贫如洗。现实世界的用例包括:
- 应用程序冻结,将一个跨平台Python代码库转换为多个特定于平台的可执行文件。这是这里的标准示例。识别C扩展对于健壮的冻结至关重要。如果被冻结的代码库导入的模块是C扩展,则所有通过该C扩展传递链接到的外部共享库也必须使用该代码库冻结。可耻的自白:我向你道歉
- 应用程序
import inspect, os
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
from types import ModuleType
def is_c_extension(module: ModuleType) -> bool:
'''
`True` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Parameters
----------
module : ModuleType
Previously imported module object to be tested.
Returns
----------
bool
`True` only if this module is a C extension.
'''
assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
>>> import os
>>> import importlib.machinery as im
>>> import _elementtree as et
>>> import numpy.core.multiarray as ma
>>> for module in (os, im, et, ma):
... print('Is "{}" a C extension? {}'.format(
... module.__name__, is_c_extension(module)))
Is "os" a C extension? False
Is "importlib.machinery" a C extension? False
Is "_elementtree" a C extension? True
Is "numpy.core.multiarray" a C extension? True
def is_c(module):
# if module is part of the main python library (e.g. os), it won't have a path
try:
for path, subdirs, files in os.walk(module.__path__[0]):
for f in files:
ftype = f.split('.')[-1]
if ftype == 'so':
is_c = True
break
return is_c
except AttributeError:
path = inspect.getfile(module)
suffix = path.split('.')[-1]
if suffix != 'so':
return False
elif suffix == 'so':
return True
is_c(os), is_c(im), is_c(et), is_c_extension(ma), is_c(numpy)
# (False, False, True, True, True)
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
import inspect
import logging
import os
import os.path
import pkgutil
from types import ModuleType
from typing import List
log = logging.getLogger(__name__)
def is_builtin_module(module: ModuleType) -> bool:
"""
Is this module a built-in module, like ``os``?
Method is as per :func:`inspect.getfile`.
"""
return not hasattr(module, "__file__")
def is_module_a_package(module: ModuleType) -> bool:
assert inspect.ismodule(module)
return os.path.basename(inspect.getfile(module)) == "__init__.py"
def is_c_extension(module: ModuleType) -> bool:
"""
Modified from
https://stackoverflow.com/questions/20339053/in-python-how-can-one-tell-if-a-module-comes-from-a-c-extension.
``True`` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Args:
module: Previously imported module object to be tested.
Returns:
bool: ``True`` only if this module is a C extension.
Examples:
.. code-block:: python
from cardinal_pythonlib.modules import is_c_extension
import os
import _elementtree as et
import numpy
import numpy.core.multiarray as numpy_multiarray
is_c_extension(os) # False
is_c_extension(numpy) # False
is_c_extension(et) # False on my system (Python 3.5.6). True in the original example.
is_c_extension(numpy_multiarray) # True
""" # noqa
assert inspect.ismodule(module), f'"{module}" not a module.'
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# If it's built-in, it's not a C extension.
if is_builtin_module(module):
return False
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
def contains_c_extension(module: ModuleType,
import_all_submodules: bool = True,
include_external_imports: bool = False,
seen: List[ModuleType] = None,
verbose: bool = False) -> bool:
"""
Extends :func:`is_c_extension` by asking: is this module, or any of its
submodules, a C extension?
Args:
module: Previously imported module object to be tested.
import_all_submodules: explicitly import all submodules of this module?
include_external_imports: check modules in other packages that this
module imports?
seen: used internally for recursion (to deal with recursive modules);
should be ``None`` when called by users
verbose: show working via log?
Returns:
bool: ``True`` only if this module or one of its submodules is a C
extension.
Examples:
.. code-block:: python
import logging
import _elementtree as et
import os
import arrow
import alembic
import django
import numpy
import numpy.core.multiarray as numpy_multiarray
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG) # be verbose
contains_c_extension(os) # False
contains_c_extension(et) # False
contains_c_extension(numpy) # True -- different from is_c_extension()
contains_c_extension(numpy_multiarray) # True
contains_c_extension(arrow) # False
contains_c_extension(alembic) # False
contains_c_extension(alembic, include_external_imports=True) # True
# ... this example shows that Alembic imports hashlib, which can import
# _hashlib, which is a C extension; however, that doesn't stop us (for
# example) installing Alembic on a machine with no C compiler
contains_c_extension(django)
""" # noqa
assert inspect.ismodule(module), f'"{module}" not a module.'
if seen is None: # only true for the top-level call
seen = [] # type: List[ModuleType]
if module in seen: # modules can "contain" themselves
# already inspected; avoid infinite loops
return False
seen.append(module)
# Check the thing we were asked about
is_c_ext = is_c_extension(module)
if verbose:
log.info(f"Is module {module!r} a C extension? {is_c_ext}")
if is_c_ext:
return True
if is_builtin_module(module):
# built-in, therefore we stop searching it
return False
# Now check any children, in a couple of ways
top_level_module = seen[0]
top_path = os.path.dirname(top_level_module.__file__)
# Recurse using dir(). This picks up modules that are automatically
# imported by our top-level model. But it won't pick up all submodules;
# try e.g. for django.
for candidate_name in dir(module):
candidate = getattr(module, candidate_name)
# noinspection PyBroadException
try:
if not inspect.ismodule(candidate):
# not a module
continue
except Exception:
# e.g. a Django module that won't import until we configure its
# settings
log.error(f"Failed to test ismodule() status of {candidate!r}")
continue
if is_builtin_module(candidate):
# built-in, therefore we stop searching it
continue
candidate_fname = getattr(candidate, "__file__")
if not include_external_imports:
if os.path.commonpath([top_path, candidate_fname]) != top_path:
if verbose:
log.debug(f"Skipping, not within the top-level module's "
f"directory: {candidate!r}")
continue
# Recurse:
if contains_c_extension(
module=candidate,
import_all_submodules=False, # only done at the top level, below # noqa
include_external_imports=include_external_imports,
seen=seen):
return True
if import_all_submodules:
if not is_module_a_package(module):
if verbose:
log.debug(f"Top-level module is not a package: {module!r}")
return False
# Otherwise, for things like Django, we need to recurse in a different
# way to scan everything.
# See https://stackoverflow.com/questions/3365740/how-to-import-all-submodules. # noqa
log.debug(f"Walking path: {top_path!r}")
try:
for loader, module_name, is_pkg in pkgutil.walk_packages([top_path]): # noqa
if not is_pkg:
log.debug(f"Skipping, not a package: {module_name!r}")
continue
log.debug(f"Manually importing: {module_name!r}")
# noinspection PyBroadException
try:
candidate = loader.find_module(module_name)\
.load_module(module_name) # noqa
except Exception:
# e.g. Alembic "autogenerate" gives: "ValueError: attempted
# relative import beyond top-level package"; or Django
# "django.core.exceptions.ImproperlyConfigured"
log.error(f"Package failed to import: {module_name!r}")
continue
if contains_c_extension(
module=candidate,
import_all_submodules=False, # only done at the top level # noqa
include_external_imports=include_external_imports,
seen=seen):
return True
except Exception:
log.error("Unable to walk packages further; no C extensions "
"detected so far!")
raise
return False
# noinspection PyUnresolvedReferences,PyTypeChecker
def test() -> None:
import _elementtree as et
import arrow
import alembic
import django
import django.conf
import numpy
import numpy.core.multiarray as numpy_multiarray
log.info(f"contains_c_extension(os): "
f"{contains_c_extension(os)}") # False
log.info(f"contains_c_extension(et): "
f"{contains_c_extension(et)}") # False
log.info(f"is_c_extension(numpy): "
f"{is_c_extension(numpy)}") # False
log.info(f"contains_c_extension(numpy): "
f"{contains_c_extension(numpy)}") # True
log.info(f"contains_c_extension(numpy_multiarray): "
f"{contains_c_extension(numpy_multiarray)}") # True # noqa
log.info(f"contains_c_extension(arrow): "
f"{contains_c_extension(arrow)}") # False
log.info(f"contains_c_extension(alembic): "
f"{contains_c_extension(alembic)}") # False
log.info(f"contains_c_extension(alembic, include_external_imports=True): "
f"{contains_c_extension(alembic, include_external_imports=True)}") # True # noqa
# ... this example shows that Alembic imports hashlib, which can import
# _hashlib, which is a C extension; however, that doesn't stop us (for
# example) installing Alembic on a machine with no C compiler
django.conf.settings.configure()
log.info(f"contains_c_extension(django): "
f"{contains_c_extension(django)}") # False
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO) # be verbose
test()
import inspect, os
import importlib
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
from types import ModuleType
# function from Cecil Curry's answer:
def is_c_extension(module: ModuleType) -> bool:
'''
`True` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Parameters
----------
module : ModuleType
Previously imported module object to be tested.
Returns
----------
bool
`True` only if this module is a C extension.
'''
assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
with open('requirements.txt') as f:
lines = f.readlines()
for line in lines:
# super lazy pip name to library name conversion
# there is probably a better way to do this.
lib = line.split("=")[0].replace("python-","").replace("-","_").lower()
try:
mod = importlib.import_module(lib)
print(f"is {lib} a c extension? : {is_c_extension(mod)}")
except:
print(f"could not check {lib}, perhaps the name for imports is different?")