Python 在函数I'；我在写_Python_Pandas_Documentation_Docstring

Python 在函数I'；我在写

python pandas documentation

Python 在函数I'；我在写,python,pandas,documentation,docstring,Python,Pandas,Documentation,Docstring,我想编写一个具有以下标题的函数： def split_csv(file, sep=";", output_path=".", nrows=None, chunksize=None, low_memory=True, usecols=None): 如您所见，我使用的参数与pd.read\u csv中的几个参数相同。我想知道（或做）的是将有关这些参数的文档字符串从read\u csv转发到我自己的函数，而不必复制/粘贴它们编辑：据我所知，没有现成的解决方案。因此，也许建造一座是合适的。我的想法是

我想编写一个具有以下标题的函数：

def split_csv(file, sep=";", output_path=".", nrows=None, chunksize=None, low_memory=True, usecols=None):

如您所见，我使用的参数与

pd.read\u csv

中的几个参数相同。我想知道（或做）的是将有关这些参数的文档字符串从

read\u csv

转发到我自己的函数，而不必复制/粘贴它们

编辑：据我所知，没有现成的解决方案。因此，也许建造一座是合适的。我的想法是：

一些新的库。获取文档（对于函数=pandas.read\u csv，对于参数=['sep'，'nrows']）

将输出：

{'sep'：'doc as found in docstring'，
'nrows'：'doc as found in docstring'，…}

然后只需将字典的值插入到我自己函数的docstring中

Cheers

您可以使用正则表达式解析docstring，并将匹配的参数返回到函数：

import re

pat = re.compile(r'([\w_+]+ :)')    # capturing group for arguments

splitted = pat.split(pd.read_csv.__doc__)

# Compare the parsed docstring against your function's arguments and only extract the required docstrings
docstrings = '\n'.join([''.join(splitted[i: i+2]) for i, s in enumerate(splitted) if s.rstrip(" :") in split_csv.__code__.co_varnames])

split_csv.__doc__ = docstrings

help(split_csv)

# Help on function split_csv in module __main__:
# 
# split_csv(file, sep=';', output_path='.', nrows=None, chunksize=None, low_memory=True, usecols=None)
#   sep : str, default ','
#       Delimiter to use. If sep is None, the C engine cannot automatically detect
#       the separator, but the Python parsing engine can, meaning the latter will
#       be used and automatically detect the separator by Python's builtin sniffer
#       tool, ``csv.Sniffer``. In addition, separators longer than 1 character and
#       different from ``'\s+'`` will be interpreted as regular expressions and
#       will also force the use of the Python parsing engine. Note that regex
#       delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``
#   
#   usecols : list-like or callable, default None
#       Return a subset of the columns. If list-like, all elements must either
#       be positional (i.e. integer indices into the document columns) or strings
#       that correspond to column names provided either by the user in `names` or
#       inferred from the document header row(s). For example, a valid list-like
#       `usecols` parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element
#       order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``.
#       To instantiate a DataFrame from ``data`` with element order preserved use
#       ``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns
#       in ``['foo', 'bar']`` order or
#       ``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]``
#       for ``['bar', 'foo']`` order.
#   
#       If callable, the callable function will be evaluated against the column
#       names, returning names where the callable function evaluates to True. An
#       example of a valid callable argument would be ``lambda x: x.upper() in
#       ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster
#       parsing time and lower memory usage.
#   
#   nrows : int, default None
#       Number of rows of file to read. Useful for reading pieces of large files
#   
#   chunksize : int, default None
#       Return TextFileReader object for iteration.
#       See the `IO Tools docs
#       <http://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking>`_
#       for more information on ``iterator`` and ``chunksize``.
#   
#   low_memory : boolean, default True
#       Internally process the file in chunks, resulting in lower memory use
#       while parsing, but possibly mixed type inference.  To ensure no mixed
#       types either set False, or specify the type with the `dtype` parameter.
#       Note that the entire file is read into a single DataFrame regardless,
#       use the `chunksize` or `iterator` parameter to return the data in chunks.
#       (Only valid with C parser)

重新导入
pat=re.compile（r'（[\w+]+：））#为参数捕获组
拆分=部分拆分（pd.read\u csv.\uuuuu doc\uuuuuuuuuuu）
#将解析的docstring与函数的参数进行比较，只提取所需的docstring
docstrings='\n'.join（[''.join（拆分的[i:i+2]）用于枚举中的i，s（拆分的），如果拆分的csv.\uuu代码\uuuu.co\u变量名中的s.rstrip（“：”））
拆分\u csv.\u文档\u=docstrings
帮助（拆分csv）
#模块\uuuu main\uuu中有关函数拆分\u csv的帮助：
# 
#split_csv（文件，sep='；'，output_path='.'，nrows=None，chunksize=None，low_memory=True，usecols=None）
#sep:str，默认'，'
#要使用的分隔符。如果sep为“无”，则C引擎无法自动检测
#分隔符，但Python解析引擎可以，这意味着后者将
#可以通过Python的内置嗅探器使用并自动检测分隔符
#工具，`csv.Sniffer`。此外，长度超过1个字符的分隔符和
#不同于“`\s+``的将被解释为正则表达式和
#还将强制使用Python解析引擎。注意正则表达式
#分隔符容易忽略带引号的数据。正则表达式示例：``\r\t'``
#   
#usecols：类似列表或可调用，默认为无
#返回列的子集。如果类似于列表，则所有元素都必须
#位置性（即，文档列中的整数索引）或字符串
#与用户在“名称”或中提供的列名相对应的
#从文档标题行推断。例如，一个有效的列表，如
#`usecols`参数应该是[0,1,2]或['foo'，'bar'，'baz']。元素
#顺序被忽略，因此``usecols=[0，1]``与`[1，0]``相同。
#要使用保留元素顺序的``data``实例化数据帧，请使用
#对于列，`pd.read_csv（数据，usecols=['foo'，'bar']]）[[['foo'，'bar']]``
#在`['foo'，'bar']``顺序中，或
#``pd.read_csv（数据，usecols=['foo'，'bar']）[['bar'，'foo']]``
#对于`['bar'，foo']``订单。
#   
#如果可调用，将根据列计算可调用函数
#名称，返回可调用函数计算为True的名称。一
#有效的可调用参数的示例是中的``lambda x:x.upper（）
#['AAA'，'BBB'，'DDD']````。使用此参数会导致更快的速度
#解析时间和更低的内存使用率。
#   
#nrows:int，默认为无
#要读取的文件行数。用于读取大文件片段
#   
#chunksize:int，默认为无
#返回用于迭代的TextFileReader对象。
#请参阅“IO工具文档”
#       `_
#有关“迭代器”和“块大小”的详细信息。
#   
#低内存：布尔值，默认为True
#在内部分块处理文件，从而降低内存使用
#解析时，但可能是混合类型推断。以确保没有混合
#类型要么设置为False，要么使用'dtype'参数指定类型。
#请注意，不管怎样，整个文件都会被读入一个数据帧中，
#使用'chunksize'或'iterator'参数以块的形式返回数据。
#（仅对C解析器有效）

当然，这取决于您对复制的函数有确切的参数名。如您所见，您需要自己添加不匹配的docstring（例如，

文件

，

输出路径

）。

不，我编写的函数完全不同，但它使用read\u csv参数。我没有为了更好的可读性而发布整个代码。我基本上希望有

pandas.read_csv

parameters'文档可用于我自己函数的一些参数。@aws_学徒同意，可以解析参数信息并传递到函数中，但所需的工作可能不仅仅是复制和粘贴实际的docstring。