Python 根据数组条件填写新的dataframe列
我有一个数据帧:Python 根据数组条件填写新的dataframe列,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个数据帧: import numpy as np import pandas as pd arr = np.array([['a', 0, 1.2,12.5,3], ['a',1, 4,5.,6.885], ['a', 2, 2.3,3.133,4.3], ['a', 3, 5.678,6.,7.34556], ['a', 4, 6.5,7,8.1344], ['b',0, 10.7,11.4,12.1332],
import numpy as np
import pandas as pd
arr = np.array([['a', 0, 1.2,12.5,3], ['a',1, 4,5.,6.885],
['a', 2, 2.3,3.133,4.3], ['a', 3, 5.678,6.,7.34556],
['a', 4, 6.5,7,8.1344], ['b',0, 10.7,11.4,12.1332],
['b',1, 14.,15,16.0155], ['b',2, 17.3,18.,9.11],
['b', 3, 22.2, 33.233, 1.2323],
['c', 0, 1.1, 2.2, 3.3],
['c', 1, 2.2, 3.43, 54.5],
['d', 0 , 2.2, 2.2, 3.],
['d',1, 3.4, 4., 5.6],
['d', 2, 3.3, 4, 5.]])
df = pd.DataFrame(arr, columns=['name', 'id', 'x', 'y', 'z'])
df['id'] = pd.to_numeric(df['id'])
df['x'] = pd.to_numeric(df['x'])
df['y'] = pd.to_numeric(df['y'])
df['z'] = pd.to_numeric(df['z'])
df
name id x y z
0 a 0 1.2 12.5 3
1 a 1 4 5.0 6.885
2 a 2 2.3 3.133 4.3
3 a 3 5.678 6.0 7.34556
4 a 4 6.5 7 8.1344
5 b 0 10.7 11.4 12.1332
6 b 1 14.0 15 16.0155
7 b 2 17.3 18.0 9.11
8 b 3 22.2 33.233 1.2323
9 c 0 1.1 2.2 3.3
10 c 1 2.2 3.43 54.5
11 d 0 2.2 2.2 3.0
12 d 1 3.4 4.0 5.6
13 d 2 3.3 4 5.0
我有一个大小相同的数组:
the_array = np.array([['a', 82.365],
['a', 82.365],
['a', 82.365],
['a', 82.365],
['b', 136.879],
['b', 136.879],
['b', 136.879],
['b', 136.879],
[None, None],
[None, None],
[None, None],
[None, None],
[None, None],
[None, None]], dtype=object)
现在,我想在df中创建一个新列,在该列中,我将根据列name
填充数组的值
我希望在df中的每一行中,如果名称与阵列中的名称相同,则具有相同的值(与阵列中的名称相同)
我想要的结果:
name id x y z new_col
0 a 0 1.200 12.500 3.00000 82.365
1 a 1 4.000 5.000 6.88500 82.365
2 a 2 2.300 3.133 4.30000 82.365
3 a 3 5.678 6.000 7.34556 82.365
4 a 4 6.500 7.000 8.13440 82.365
5 b 0 10.700 11.400 12.13320 136.879
6 b 1 14.000 15.000 16.01550 136.879
7 b 2 17.300 18.000 9.11000 136.879
8 b 3 22.200 33.233 1.23230 136.879
9 c 0 1.100 2.200 3.30000 None
10 c 1 2.200 3.430 54.50000 None
11 d 0 2.200 2.200 3.00000 None
12 d 1 3.400 4.000 5.60000 None
13 d 2 3.300 4.000 5.00000 None
我试过:
df['new_col'] = np.where(df['name'] == the_array[:, 0], the_array[:, 1], the_array[:, 1])
但我收到:
name id x y z new_col
0 a 0 1.200 12.500 3.00000 82.365
1 a 1 4.000 5.000 6.88500 82.365
2 a 2 2.300 3.133 4.30000 82.365
3 a 3 5.678 6.000 7.34556 82.365
4 a 4 6.500 7.000 8.13440 136.879
5 b 0 10.700 11.400 12.13320 136.879
6 b 1 14.000 15.000 16.01550 136.879
7 b 2 17.300 18.000 9.11000 136.879
8 b 3 22.200 33.233 1.23230 None
9 c 0 1.100 2.200 3.30000 None
10 c 1 2.200 3.430 54.50000 None
11 d 0 2.200 2.200 3.00000 None
12 d 1 3.400 4.000 5.60000 None
13 d 2 3.300 4.000 5.00000 None
您可以通过以下方式完成此操作:
_数组
与df
大小相同,但未对齐。它似乎表示一组唯一名称的映射name->value
。因此,它应该用dict
表示,而不是数组。通过对数组行进行迭代的dict理解,很容易构造这个dict
:
the_map = {k: v for k, v in the_array if k}
df['new_col'] = df['name'].map(the_map)
思考数据的含义以及数据的最佳表示方式是编写优雅代码并在这种情况下找到解决方案的好方法
the_map = {k: v for k, v in the_array if k}
df['new_col'] = df['name'].map(the_map)