Python 基于另一列迭代地为列赋值_Python_Pandas_Loops_If Statement

Python 基于另一列迭代地为列赋值

python pandas loops if-statement

Python 基于另一列迭代地为列赋值,python,pandas,loops,if-statement,Python,Pandas,Loops,If Statement,我在Pandas数据框中有一个名为“label”的变量，它包含多个字符串值（例如：“label1”、“label2”、“label3”…）我将所有唯一值输出到一个列表中，然后创建新变量 unique_labels = df['label'].unique() for i in unique_labels: # create new single label variable holders df[str(i)] = 0 现在我有 label label1 label2

我在Pandas数据框中有一个名为“label”的变量，它包含多个字符串值（例如：

“label1”、“label2”、“label3”…

）

我将所有唯一值输出到一个列表中，然后创建新变量

unique_labels = df['label'].unique()

for i in unique_labels: # create new single label variable holders
    df[str(i)] = 0

现在我有

label    label1    label2 .... label23
label1     0         0            0
label23    0         0            0

我想根据

'label'

将相应的值分配到新的单标签变量上，如下所示

label    label1    label2 .... label23
label1     1         0            0
label23    0         0            1

这是我的密码

def single_label(df):
for i in range(len(unique_labels)):
    if df['label'] == str(unique_labels[i]):
        df[unique_labels[i]] == 1


df = df.applymap(single_label)

获取此错误

TypeError: ("'int' object is not subscriptable", 'occurred at index Unnamed: 0')

IIUC，您可以在删除重复项后使用，这将比迭代执行更快，产生更干净的代码：

df.drop_duplicates().join(pd.get_dummies(df.drop_duplicates()))

     label  label_label1  label_label11  label_label23  label_label3
0   label1             1              0              0             0
2  label23             0              0              1             0
3   label3             0              0              0             1
4  label11             0              1              0             0

您可以使用

prefix

和

prefix_sep

参数除去这些

标签

前缀和下划线：

df.drop_duplicates().join(pd.get_dummies(df.drop_duplicates(),
                                         prefix='', prefix_sep=''))

     label  label1  label11  label23  label3
0   label1       1        0        0       0
2  label23       0        0        1       0
3   label3       0        0        0       1
4  label11       0        1        0       0

编辑：带有第二列，即：

>>> df
     label second_column
0   label1             a
1   label1             b
2  label23             c
3   label3             d
4  label11             e

只需调用

pd。仅在标签列上获取虚拟对象：
df.drop_duplicates('label').join(pd.get_dummies(df['label'].drop_duplicates(),
                                         prefix='', prefix_sep=''))

     label second_column  label1  label11  label23  label3
0   label1             a       1        0        0       0
2  label23             c       0        0        1       0
3   label3             d       0        0        0       1
4  label11             e       0        1        0       0

但是，您正在删除没有重复项的行，我认为这不是您想要的（除非我弄错了）。如果没有，只需省略drop duplicates调用：
df.join(pd.get_dummies(df['label'], prefix='', prefix_sep=''))

     label second_column  label1  label11  label23  label3
0   label1             a       1        0        0       0
1   label1             b       1        0        0       0
2  label23             c       0        0        1       0
3   label3             d       0        0        0       1
4  label11             e       0        1        0       0

IIUC，您可以在删除重复项后使用，这将比迭代执行更快，产生更干净的代码：
df.drop_duplicates().join(pd.get_dummies(df.drop_duplicates()))

     label  label_label1  label_label11  label_label23  label_label3
0   label1             1              0              0             0
2  label23             0              0              1             0
3   label3             0              0              0             1
4  label11             0              1              0             0

您可以使用prefix
和prefix_sep
参数除去这些标签
前缀和下划线：
df.drop_duplicates().join(pd.get_dummies(df.drop_duplicates(),
                                         prefix='', prefix_sep=''))

     label  label1  label11  label23  label3
0   label1       1        0        0       0
2  label23       0        0        1       0
3   label3       0        0        0       1
4  label11       0        1        0       0

编辑：带有第二列，即：
>>> df
     label second_column
0   label1             a
1   label1             b
2  label23             c
3   label3             d
4  label11             e

只需调用pd。仅在标签列上获取虚拟对象：
df.drop_duplicates('label').join(pd.get_dummies(df['label'].drop_duplicates(),
                                         prefix='', prefix_sep=''))

     label second_column  label1  label11  label23  label3
0   label1             a       1        0        0       0
2  label23             c       0        0        1       0
3   label3             d       0        0        0       1
4  label11             e       0        1        0       0

但是，您正在删除没有重复项的行，我认为这不是您想要的（除非我弄错了）。如果没有，只需省略drop duplicates调用：
df.join(pd.get_dummies(df['label'], prefix='', prefix_sep=''))

     label second_column  label1  label11  label23  label3
0   label1             a       1        0        0       0
1   label1             b       1        0        0       0
2  label23             c       0        0        1       0
3   label3             d       0        0        0       1
4  label11             e       0        1        0       0

谢谢，但你能演示如何使用“标签”列进行指定（因为我的实际数据包含多列）？我尝试了df['label']，但没有成功。谢谢，但你能演示如何使用“标签”列进行指定（因为我的实际数据包含多列）？我尝试了df['label']，但没有成功。