Python 使用np.where子句向dataframe添加多列_Python_Pandas_Numpy

Python 使用np.where子句向dataframe添加多列

python pandas numpy

Python 使用np.where子句向dataframe添加多列,python,pandas,numpy,Python,Pandas,Numpy,我正在尝试使用ETL逻辑中的numpy.where（）向数据帧添加多列这是我的df：我正在尝试将我的df作为：代码是： current_time = pd.Timestamp.utcnow().strftime('%Y-%m-%d %H:%M:%S') df = pd.concat( [ df, pd.DataFrame( [ np.where( #

我正在尝试使用ETL逻辑中的numpy.where（）向数据帧添加多列

这是我的df：

我正在尝试将我的df作为：

代码是：

current_time = pd.Timestamp.utcnow().strftime('%Y-%m-%d %H:%M:%S')

df = pd.concat(
    [
        df,
        pd.DataFrame(
            [
                np.where(
                    # When old hash code is available and new hash code is not available. 0 -- N
                    (
                            df['new_hash'].isna()
                            &
                            ~df['old_hash'].isna()
                    ) |
                    # When hash codes are available and matched. 3.1 -- 'N'
                    (
                            ~df['new_hash'].isna()
                            &
                            ~df['old_hash'].isna()
                            &
                            ~(df['new_hash'].ne(df['old_hash']))
                    ),
                    ['N', df['cr_date'], df['up_date']],
                    np.where(
                        # When new hash code is available and old hash code is not available. 1 -- Y
                        (
                                ~df['new_hash'].isna()
                                &
                                df['old_hash'].isna()
                        ),
                        ['Y', current_time, current_time],
                        np.where(
                            # When hash codes are available and matched. 3.2 -- 'Y'
                            (
                                    ~df['new_hash'].isna()
                                    &
                                    ~df['old_hash'].isna()
                                    &
                                    df['new_hash'].ne(df['old_hash'])
                            ),
                            ['Y', df['cr_date'], current_time],
                            ['N', df['cr_date'], df['up_date']]
                        )
                    )
                )
            ],
            index=df.index,
            columns=['is_changed', 'cr_date_new', 'up_date_new']
        )
    ],
    axis=1
)

使用

df.join（）

而不是

pd.concat（）

尝试上述代码。仍然给我以下指定的

ValueError

我可以一次添加一列。例如：

df['is_changed'] = (
    np.where(
        # When old hash code is available and new hash code is not available. 0 -- N
        (
                df['new_hash'].isna()
                &
                ~df['old_hash'].isna()
        ) |
        # When hash codes are available and matched. 3.1 -- 'N'
        (
                ~df['new_hash'].isna()
                &
                ~df['old_hash'].isna()
                &
                ~(df['new_hash'].ne(df['old_hash']))
        ),
        'N',
        np.where(
            # When new hash code is available and old hash code is not available. 1 -- Y
            (
                    ~df['new_hash'].isna()
                    &
                    df['old_hash'].isna()
            ),
            'Y',
            np.where(
                # When hash codes are available and matched. 3.2 -- 'Y'
                (
                        ~df['new_hash'].isna()
                        &
                        ~df['old_hash'].isna()
                        &
                        df['new_hash'].ne(df['old_hash'])
                ),
                'Y',
                'N'
            )
        )
    )
)

但是获取错误（

ValueError:操作数无法与具有多列的形状（66、）（3、）（3、）

）一起广播

添加多个列有什么错？有人能帮我吗？

在

np中。where（cond，A，B）

Python计算

cond

、

和

，然后将它们传递给

where

函数<代码>其中然后相互广播输入，并执行元素选择。您似乎有3个嵌套的

，其中。我猜错误发生在最里面的一个，因为它将首先被评估（我不必猜测您是否提供了错误回溯。）
cond
部分是第一个（）
逻辑表达式
A
是3元素列表，而B
是下一个列表
假设有66行，cond
将具有（66，）形状
np.array（['Y'，df['cr_date']，current_time]）
可能是一个（3，）形对象数据类型数组，因为输入包括字符串、序列和时间对象
这就解释了错误消息中的3个形状：shapes（66，）（3，）（3，）

如果试图一次只设置一列，表达式将是np.where（cond，'Y，'N'）
，或np.where（cond，Series1，Series2）

如果你不明白我（或错误）所说的广播
，你可能需要了解更多关于numpy
（它是熊猫
的基础）的信息。
我几乎认不清这些代码。它与常规python格式相去甚远，我也不明白这是一种连接。你没有（我不认为）显示df，所以我没有；我不知道concat正在做什么。这不是一个简单的问题，我强烈建议您更改格式style@roganjosh希望我的改变现在可以理解
                    np.where(
                        # When hash codes are available and matched. 3.2 -- 'Y'
                        (
                                ~df['new_hash'].isna()
                                &
                                ~df['old_hash'].isna()
                                &
                                df['new_hash'].ne(df['old_hash'])
                        ),
                        ['Y', df['cr_date'], current_time],
                        ['N', df['cr_date'], df['up_date']]
                    )