Python 如何从数据框列中提取特定项,并将其用作其余项的标签?
我有一个只有一列的数据框,我想提取某些项目,并将它们转换为一个单独的列,用作其他剩余项目的标签。这有点难以解释,例如,如果我有:Python 如何从数据框列中提取特定项,并将其用作其余项的标签?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个只有一列的数据框,我想提取某些项目,并将它们转换为一个单独的列,用作其他剩余项目的标签。这有点难以解释,例如,如果我有: pd.DataFrame({'Fruits': ['Apple', 'Gala', 'Fuji', 'Grannysmith', 'Honeycrisp', 'Golden', 'pink', 'Orange', 'blood orange', 'Mandrin', 'Tangerine', 'Clementine', 'Banana', 'baby', 'manza
pd.DataFrame({'Fruits': ['Apple', 'Gala', 'Fuji', 'Grannysmith', 'Honeycrisp', 'Golden', 'pink', 'Orange', 'blood orange', 'Mandrin', 'Tangerine', 'Clementine', 'Banana', 'baby', 'manzano', 'burro']})
Fruits
0 Apple
1 Gala
2 Fuji
3 Grannysmith
4 Honeycrisp
5 Golden
6 pink
7 Orange
8 blood orange
9 Mandrin
10 Tangerine
11 Clementine
12 Banana
13 baby
14 manzano
15 burro
但我想把它转换成:
Fruits Types
0 Apple Gala
1 Apple Fuji
2 Apple Grannysmith
3 Apple Honeycrisp
4 Apple Golden
5 Apple pink
6 Orange blood orange
7 Orange Mandrin
8 Orange Tangerine
9 Orange Clementine
10 Banana baby
11 Banana manzano
12 Banana burro
如何将第一个数据帧转换为第二个数据帧?我对此感到困惑,尤其是当有许多种类的水果及其各自的类型时。首先需要在列表中定义水果,然后创建一个新列,其中重复水果由,用于缺失值和正向填充,然后删除两列中相同的值由和最后设置的新列名称:
L = ['Apple','Orange','Banana']
df['a'] = df['Fruits'].where(df['Fruits'].isin(L)).ffill()
df = df.loc[df['a'] != df['Fruits'], ['a','Fruits']]
df.columns = ['Fruits','Types']
print (df)
Fruits Types
1 Apple Gala
2 Apple Fuji
3 Apple Grannysmith
4 Apple Honeycrisp
5 Apple Golden
6 Apple pink
8 Orange blood orange
9 Orange Mandrin
10 Orange Tangerine
11 Orange Clementine
13 Banana baby
14 Banana manzano
15 Banana burro
我将使用一些标准逻辑构建一个字典映射,然后将其用于操作
fruit_classes = ['Apple', 'Orange', 'Banana']
last_class = None
fruit_map = {}
for fruit in df.Fruits:
if fruit in fruit_classes:
last_class = fruit
elif last_class is not None:
fruit_map[fruit] = last_class
df.assign(Types=df.Fruits, Fruits=df.Fruits.map(fruit_map)).dropna()
Fruits Types
1 Apple Gala
2 Apple Fuji
3 Apple Grannysmith
4 Apple Honeycrisp
5 Apple Golden
6 Apple pink
8 Orange blood orange
9 Orange Mandrin
10 Orange Tangerine
11 Orange Clementine
13 Banana baby
14 Banana manzano
15 Banana burro
或者带着理解力
fruit_classes = ['Apple', 'Orange', 'Banana']
水果类=['苹果','橘子','香蕉']
pd.DataFrame(
[[x, None][::(x in fruit_classes) * 2 - 1] for x in df.Fruits],
columns=['Fruits', 'Types']
).assign(Fruits=lambda d: d.Fruits.ffill()).dropna()
Fruits Types
1 Apple Gala
2 Apple Fuji
3 Apple Grannysmith
4 Apple Honeycrisp
5 Apple Golden
6 Apple pink
8 Orange blood orange
9 Orange Mandrin
10 Orange Tangerine
11 Orange Clementine
13 Banana baby
14 Banana manzano
15 Banana burro