如何使用python通过比较同一数据帧中的两列来创建新列？_Python_Python 3.x_Pandas_Jupyter Notebook

如何使用python通过比较同一数据帧中的两列来创建新列？

python python-3.x pandas jupyter-notebook

如何使用python通过比较同一数据帧中的两列来创建新列？,python,python-3.x,pandas,jupyter-notebook,Python,Python 3.x,Pandas,Jupyter Notebook,我的数据框如下所示。 df: 如果col\u 1有'EDU'和col\u 2有'facebook'，google'new\u col应该有相同的字符串，即facebook和google，如果col\u 2包含'google\u usa'，tabla'new\u col应该包含'gusa'，如果col 2有任何其他字符串，则在同一数据框中，一col col应该有其他字符串。如果col_1有'IAR'和col_2有'facebook'新的col应该有facebook，对于col_2中的任何其他字符

我的数据框如下所示。 df:

如果

col\u 1

有'EDU'和col\u 2有

'facebook'，google'

new\u col应该有相同的字符串，即

facebook和google

，如果col\u 2包含

'google\u usa'，tabla'

new\u col应该包含'gusa'，如果col 2有任何其他字符串，则在同一数据框中，一col col应该有

其他字符串。
如果col_1
有'IAR'和col_2
有'facebook'
新的col应该有facebook，对于col_2
中的任何其他字符串，它应该在同一数据框中包含'other'

预期产出：
col_1   col_2     new_col
EDU   facebook    facebook
EDU   google      google
EDU   google_usa  gusa
EDU   tabula      gusa
EDU   xyz         others
EDU   abc         others
IAR   facebook    facebook
IAR   google      others

我尝试了下面的代码，但没有成功。请在这方面帮助我。
提前谢谢
if df['col_1'].str.contains('EDU').any():

        df['new_col'] = ['facebook' if 'facebook' in x else
                            'google' if 'google' == x else
                            'gcusa_tb' if 'taboola' in x else
                            'gcusa_tb' if 'google_cusa' in x else
                            'Others' for x in df['col_2']]

我将使用几个numpy命令：
df['new_col'] = 'others'
df.loc[np.logical_and(df.col_1=='EDU', np.in1d(df.col_2, ['facebook','google'])), 'new_col'] = df.loc[np.logical_and(df.col_1=='EDU', np.in1d(df.col_2, ['facebook','google'])), 'col_2']
df.loc[np.logical_and(df.col_1=='EDU', np.in1d(df.col_2, ['google_usa','tabula'])), 'new_col'] = 'gusa'

另外，您的请求与您提出的输出不完全一致，我希望我正确地解释了请求。我的代码将输出：
    col_1   col_2   new_col
0   EDU facebook    facebook
1   EDU google      google
2   EDU google_usa  gusa
3   EDU tabula      gusa
4   EDU xyz         others
5   EDU abc         others
6   IAR facebook    others
7   IAR google      others

我相信这是理解代码如何工作的最简单的方法，这样您就可以将其应用到比本例更多的情况。这相当直观。您可以在运行时添加逻辑
1） 首先，我们创建一个函数
2） 应用上述功能
def new_col(col):
    if col['col1'] == 'EDU' and col['col2'] == 'facebook':
        return 'facebook'
    if col['col1'] == 'EDU' and col['col2'] == 'google':
        return 'google'
    if col['col2'] == 'google_usa' or col['col2'] == 'tabula':
        return 'gusa'
    if col['col1'] == 'IAR' and col['col2'] == 'facebook':
        return 'facebook'
    return 'others'

df['new_col'] = df.apply(lambda col: new_col (col),axis=1)

输出（我的col1和col2是反向的。别介意，这样读对我来说比较容易）：
只是把这篇文章贴出来，供其他人在这篇文章中结结巴巴地参考：这个例子很好用。但是嵌套的np.where
总是很难让其他人跟随。输出和效率都很高，但可读性可能会不足。@MattR同样对后代来说，这个问题完全是关于嵌套if，then，else。如果可读性是一个优先事项，那么您可以用一个更漂亮的函数包装np.where。
    col_1   col_2   new_col
0   EDU facebook    facebook
1   EDU google      google
2   EDU google_usa  gusa
3   EDU tabula      gusa
4   EDU xyz         others
5   EDU abc         others
6   IAR facebook    others
7   IAR google      others

def new_col(col):
    if col['col1'] == 'EDU' and col['col2'] == 'facebook':
        return 'facebook'
    if col['col1'] == 'EDU' and col['col2'] == 'google':
        return 'google'
    if col['col2'] == 'google_usa' or col['col2'] == 'tabula':
        return 'gusa'
    if col['col1'] == 'IAR' and col['col2'] == 'facebook':
        return 'facebook'
    return 'others'

df['new_col'] = df.apply(lambda col: new_col (col),axis=1)

         col2 col1   new_col
0    facebook  EDU  facebook
1      google  EDU    google
2  google_usa  EDU      gusa
3      tabula  EDU      gusa
4         xyz  EDU    others
5         abc  EDU    others
6    facebook  IAR  facebook
7      google  IAR    others