Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
有没有办法强制类下定义的Python函数返回特定数据类型的东西(而不是什么都不返回)?_Python_Python 3.x_Pandas - Fatal编程技术网

有没有办法强制类下定义的Python函数返回特定数据类型的东西(而不是什么都不返回)?

有没有办法强制类下定义的Python函数返回特定数据类型的东西(而不是什么都不返回)?,python,python-3.x,pandas,Python,Python 3.x,Pandas,我知道Python没有强类型,它不支持指定返回类型的关键字,如void、int以及Java和C中的类似类型。我还知道,我们可以使用类型提示告诉用户,他们可以从函数中获得特定类型的返回 我正在尝试实现一个Python类,该类将读取一个配置文件(比如JSON文件),该文件指示应在pandasdataframe上应用哪些数据转换方法。配置文件类似于: [ { "input_folder_path": "./input/budget/", "input_file_name_or_pa

我知道Python没有强类型,它不支持指定返回类型的关键字,如
void
int
以及Java和C中的类似类型。我还知道,我们可以使用类型提示告诉用户,他们可以从函数中获得特定类型的返回

我正在尝试实现一个Python类,该类将读取一个配置文件(比如JSON文件),该文件指示应在
pandas
dataframe上应用哪些数据转换方法。配置文件类似于:

[
  {
    "input_folder_path": "./input/budget/",
    "input_file_name_or_pattern": "Global Budget Roll-up_9.16.19.xlsx",
    "sheet_name_of_excel_file": "Budget Roll-Up",
    "output_folder_path": "./output/budget/",
    "output_file_name_prefix": "transformed_budget_",

    "__comment__": "(Optional) File with Python class that houses data transformation functions, which will be imported and used in the transform process. If not provided, then the code will use default class in the 'transform_function.py' file.",
    "transform_functions_file": "./transform_functions/budget_transform_functions.py",

    "row_number_of_column_headers": 0,
    "row_number_where_data_starts": 1,
    "number_of_rows_to_skip_from_the_bottom_of_the_file": 0,

    "__comment__": "(Required) List of the functions and their parameters.",
    "__comment__": "These functions must be defined either in transform_functions.py or individual transformation file such as .\\transform_function\\budget_transform_functions.py",
    "functions_to_apply": [
      {
        "__function_comment__": "Drop empty columns in Budget roll up Excel file. No parameters required.",
        "function_name": "drop_unnamed_columns"
      },
      {
        "__function_comment__": "By the time we run this function, there should be only 13 columns total remaining in the raw data frame.",
        "function_name": "assert_number_of_columns_equals",
        "function_args": [13]
      },
      {
        "__function_comment__": "Map raw channel names 'Ecommerce' and 'ecommerce' to 'E-Commerce'.",
        "transform_function_name": "standardize_to_ecommerce",
        "transform_function_args": [["Ecommerce", "ecommerce"]]
      }
    ]
  }
]
main.py
代码中,我有如下内容:

if __name__ == '__main__':
    # 1. Process arguments passed into the program
    parser = argparse.ArgumentParser(description=transform_utils.DESC,
                                     formatter_class = argparse.RawTextHelpFormatter,
                                     usage=argparse.SUPPRESS)
    parser.add_argument('-c', required=True, type=str,
                        help=transform_utils.HELP)
    args = parser.parse_args()

    # 2. Load JSON configuration file
    if (not args.c) or (not os.path.exists(args.c)):
        raise transform_errors.ConfigFileError()

    # 3. Iterate through each transform procedure in config file
    for config in transform_utils.load_config(args.c):
        output_file_prefix = transform_utils.get_output_file_path_with_name_prefix(config)
        custom_transform_funcs_module = transform_utils.load_custom_functions(config)

        row_idx_where_data_starts = transform_utils.get_row_index_where_data_starts(config)
        footer_rows_to_skip = transform_utils.get_number_of_rows_to_skip_from_bottom(config)

        for input_file in transform_utils.get_input_files(config):
            print("Processing file:", input_file)
            col_headers_from_input_file = transform_utils.get_raw_column_headers(input_file, config)

            if transform_utils.is_excel(input_file):
                sheet = transform_utils.get_sheet(config)
                print("Skipping this many rows (including header row) from the top of the file:", row_idx_where_data_starts)
                cur_df = pd.read_excel(input_file,
                                       sheet_name=sheet,
                                       skiprows=row_idx_where_data_starts,
                                       skipfooter=footer_rows_to_skip,
                                       header=None,
                                       names=col_headers_from_input_file
                                       )
                custom_funcs_instance = custom_transform_funcs_module.TaskSpecificTransformFunctions()

                for func_and_params in transform_utils.get_functions_to_apply(config):
                    print("=>Invoking transform function:", func_and_params)
                    func_args = transform_utils.get_transform_function_args(func_and_params)
                    func_kwargs = transform_utils.get_transform_function_kwargs(func_and_params)
                    cur_df = getattr(custom_funcs_instance,
                                     transform_utils.get_transform_function_name(
                                         func_and_params))(cur_df, *func_args, **func_kwargs)
budget\u transform\u functions.py
文件中,我有:

class TaskSpecificTransformFunctions(TransformFunctions):
    def drop_unnamed_columns(self, df):
        """
        Drop columns that have 'Unnamed' as column header, which is a usual
        occurrence for some Excel/CSV raw data files with empty but hidden columns.
        Args:
            df: Raw dataframe to transform.
            params: We don't need any parameter for this function,
                    so it's defaulted to None.

        Returns:
            Dataframe whose 'Unnamed' columns are dropped.
        """
        return df.loc[:, ~df.columns.str.contains(r'Unnamed')]

    def assert_number_of_columns_equals(self, df, num_of_cols_expected):
        """
        Assert that the total number of columns in the dataframe
        is equal to num_of_cols (int).

        Args:
            df: Raw dataframe to transform.
            num_of_cols_expected: Number of columns expected (int).

        Returns:
            The original dataframe is returned if the assertion is successful.

        Raises:
            ColumnCountMismatchError: If the number of columns found
            does not equal to what is expected.
        """
        if df.shape[1] != num_of_cols_expected:
            raise transform_errors.ColumnCountError(
                ' '.join(["Expected column count of:", str(num_of_cols_expected),
                          "but found:", str(df.shape[1]), "in the current dataframe."])
            )
        else:
            print("Successfully check that the current dataframe has:", num_of_cols_expected, "columns.")

        return df
如您所见,我需要
budget\u transform\u functions.py的未来实现者了解
TaskSpecificTransformFunctions
中的函数必须始终返回
pandas
数据帧。我知道在
Java
中,您可以创建一个接口,任何实现该接口的人都必须遵守该接口中每个方法的返回值。我想知道我们在Python中是否有类似的构造(或实现类似功能的变通方法)


希望这个冗长的问题有意义,我希望比我有更多Python经验的人能够教我一些关于这方面的东西。提前非常感谢您的回答/建议

至少在运行时检查函数返回类型的一种方法是将函数包装到另一个检查返回类型的函数中。要为子类自动执行此操作,有
\uuu init\u subclass\uu
。这可以通过以下方式使用(需要抛光和处理特殊情况):


这就是类型提示的作用,尽管它们不会强制执行任何操作,除非您将IDE配置为将类型警告视为错误。我只想写:
def assert\u number\u of\u columns\u equals(self,df,num\u of\u cols\u预期)->Dataframe:
您也可以在Python中创建一个“接口”,但它也不会强制执行类型。实际执行类型的唯一方法是在运行时执行手动检查,或者使用为您执行静态检查的工具(如好的IDE)。可以通过在基类中实现
\uuuu init\u subclass\uuu
并将所有子类函数包装到检查返回类型的函数中来自动执行运行时检查(某种自动装饰功能).Nvm,我撒谎了。我以为协议可能会在这里起到解救作用,但我错了。@user1330974 Python的抽象基类和Python 3.8的
协议
尽可能接近接口,但它们不进行类型检查。它们只检查类是否具有正确的方法,以及它们是否具有适当数量的参数米。我已经测试了您建议的解决方案,它是有效的!但我想再问一个后续问题。如果我想在
TransformFunctions
类中实现函数,并且希望它们也遵守相同的规则(即返回
pandas
dataframe),该怎么办作为它的子类。有没有一种方法可以在不为
TransformFunctions
类创建另一个父类的情况下实现这一点?提前感谢您的回答!@user1330974基本上您可以在自己的函数中分解出
\uu init\u子类的相关部分,并在
TransformFunctions
类上调用它。稍微更简单但更多的工作可以是自己的元类来处理所有类的转换。
import pandas as pd

def wrapCheck(f):
    def checkedCall(*args, **kwargs):
        r = f(*args, **kwargs)
        if not isinstance(r, pd.DataFrame):
            raise Exception(f"Bad return value of {f.__name__}: {r!r}")

        return r

    return checkedCall


class TransformFunctions:

    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        for k, v in cls.__dict__.items():
            if callable(v):
                setattr(cls, k, wrapCheck(v))



class TryTransform(TransformFunctions):

    def createDf(self):
        return pd.DataFrame(data={"a":[1,2,3], "b":[4,5,6]})


    def noDf(self, a, b):
        return a + b


tt = TryTransform()

print(tt.createDf())   # Works
print(tt.noDf(2, 2))   # Fails with exception