Python 使用列中的二进制值连接列名
目前,我有一个数据帧,如下所示:Python 使用列中的二进制值连接列名,python,pandas,Python,Pandas,目前,我有一个数据帧,如下所示: date A B C 02/19/2020 0 0 0 02/20/2020 0 0 0 02/21/2020 1 1 1 02/22/2020 0 1 0 02/23/2020 0 1 1 02/24/2020 0 0 1 02/25/2020 1 0 1 02/26/2020 1 0 0 二进制列包含
date A B C
02/19/2020 0 0 0
02/20/2020 0 0 0
02/21/2020 1 1 1
02/22/2020 0 1 0
02/23/2020 0 1 1
02/24/2020 0 0 1
02/25/2020 1 0 1
02/26/2020 1 0 0
二进制列包含整数。“date”列是一个DateTime对象。我想创建一个基于二进制列的新分类列,如下所示
date A B C new
02/19/2020 0 0 0 "None"
02/20/2020 0 0 0 "None"
02/21/2020 1 1 1 A+B+C
02/22/2020 0 1 0 B
02/23/2020 0 1 1 B+C
02/24/2020 0 0 1 C
02/25/2020 1 0 1 A+C
02/26/2020 1 0 0 A
如何实现这一点?使用矩阵乘法与列名称进行乘法,在中按位置省略第一列,在没有第一列和最后一列的列名称中添加分隔符通过索引str[:-1]
:
df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '+').str[:-1]
#set empty string to None
df.loc[df['new'].eq(''), 'new'] = None
print (df)
date A B C new
0 02/19/2020 0 0 0 None
1 02/20/2020 0 0 0 None
2 02/21/2020 1 1 1 A+B+C
3 02/22/2020 0 1 0 B
4 02/23/2020 0 1 1 B+C
5 02/24/2020 0 0 1 C
6 02/25/2020 1 0 1 A+C
7 02/26/2020 1 0 0 A
如果可能,使用NaN
s代替None
s:
df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '+').str[:-1].replace('', np.nan)
print (df)
date A B C new
0 02/19/2020 0 0 0 NaN
1 02/20/2020 0 0 0 NaN
2 02/21/2020 1 1 1 A+B+C
3 02/22/2020 0 1 0 B
4 02/23/2020 0 1 1 B+C
5 02/24/2020 0 0 1 C
6 02/25/2020 1 0 1 A+C
7 02/26/2020 1 0 0 A
或者,如果可能,将第一列设置为DatetimeIndex
使用:
df1 = df.set_index('date')
df1['new'] = df1.dot(df1.columns + '+').str[:-1]
df1.loc[df1['new'].eq(''), 'new'] = None
用于矩阵与列名称相乘,在中按位置省略第一列,为没有第一列和最后一列的列名称添加分隔符通过索引str[:-1]
:
df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '+').str[:-1]
#set empty string to None
df.loc[df['new'].eq(''), 'new'] = None
print (df)
date A B C new
0 02/19/2020 0 0 0 None
1 02/20/2020 0 0 0 None
2 02/21/2020 1 1 1 A+B+C
3 02/22/2020 0 1 0 B
4 02/23/2020 0 1 1 B+C
5 02/24/2020 0 0 1 C
6 02/25/2020 1 0 1 A+C
7 02/26/2020 1 0 0 A
如果可能,使用NaN
s代替None
s:
df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '+').str[:-1].replace('', np.nan)
print (df)
date A B C new
0 02/19/2020 0 0 0 NaN
1 02/20/2020 0 0 0 NaN
2 02/21/2020 1 1 1 A+B+C
3 02/22/2020 0 1 0 B
4 02/23/2020 0 1 1 B+C
5 02/24/2020 0 0 1 C
6 02/25/2020 1 0 1 A+C
7 02/26/2020 1 0 0 A
或者,如果可能,将第一列设置为DatetimeIndex
使用:
df1 = df.set_index('date')
df1['new'] = df1.dot(df1.columns + '+').str[:-1]
df1.loc[df1['new'].eq(''), 'new'] = None
您可以迭代Dataframe来计算新的列值,然后添加它 这是一个基本的例子
new_column = []
for i, row in df.iterrows():
row_val = None
if row["A"]:
if row_val:
row_val += "+A"
else:
row_val = "A"
if row["B"]:
if row_val:
row_val += "+B"
else:
row_val = "B"
if row["C"]:
if row_val:
row_val += "+C"
else:
row_val = "C"
if row_val is None:
row_val = "None"
new_column.append(row_val)
df["new_column_name"] = new_column
您可以迭代Dataframe来计算新的列值,然后添加它 这是一个基本的例子
new_column = []
for i, row in df.iterrows():
row_val = None
if row["A"]:
if row_val:
row_val += "+A"
else:
row_val = "A"
if row["B"]:
if row_val:
row_val += "+B"
else:
row_val = "B"
if row["C"]:
if row_val:
row_val += "+C"
else:
row_val = "C"
if row_val is None:
row_val = "None"
new_column.append(row_val)
df["new_column_name"] = new_column