Python熊猫:如何将数据框、特定单元格重新设置为新列
我需要先重塑数据帧,然后才能进入下一阶段 我有以下数据帧Python熊猫:如何将数据框、特定单元格重新设置为新列,python,pandas,dataframe,Python,Pandas,Dataframe,我需要先重塑数据帧,然后才能进入下一阶段 我有以下数据帧 +-------------------+-----+--------+ | name | age | gender | +-------------------+-----+--------+ | country India | | | | Ali | 13 | male | | Abu | 12 | male |
+-------------------+-----+--------+
| name | age | gender |
+-------------------+-----+--------+
| country India | | |
| Ali | 13 | male |
| Abu | 12 | male |
| Acik | 13 | male |
| country indonesia | | |
| natasha | 15 | female |
| jenny | 43 | female |
| eric | 23 | male |
| country singapore | | |
| max | 23 | male |
| jason | 32 | male |
| jack | 45 | male |
+-------------------+-----+--------+
我想这样
+---------+-----+--------+-----------+
| name | age | gender | country |
+---------+-----+--------+-----------+
| Ali | 13 | male | india |
| Abu | 12 | male | india |
| Acik | 13 | male | india |
| natasha | 15 | female | indonesia |
| jenny | 43 | female | indonesia |
| eric | 23 | male | indonesia |
| max | 23 | male | singapore |
| jason | 32 | male | singapore |
| jack | 45 | male | singapore |
+---------+-----+--------+-----------+
我不认为pivot/transpose对我有帮助,如果还有,我需要做什么?您可以使用
str.extract
提取国家名称,然后还可以使用它来屏蔽有效行:
countries = df['name'].str.extract('^country (.+)')[0]
df['country'] = countries.ffill()
df = df[countries.isna()]
输出
name age gender country
1 Ali 13.0 male India
2 Abu 12.0 male India
3 Acik 13.0 male India
5 natasha 15.0 female indonesia
6 jenny 43.0 female indonesia
7 eric 23.0 male indonesia
9 max 23.0 male singapore
10 jason 32.0 male singapore
11 jack 45.0 male singapore
替代解决方案:
(df.assign(country=df["name"].str.extract("^country (.+)", expand=False).ffill())
.dropna()
)
您可以使用
str.extract
提取国家名称,然后还可以使用该名称屏蔽有效行:
countries = df['name'].str.extract('^country (.+)')[0]
df['country'] = countries.ffill()
df = df[countries.isna()]
输出
name age gender country
1 Ali 13.0 male India
2 Abu 12.0 male India
3 Acik 13.0 male India
5 natasha 15.0 female indonesia
6 jenny 43.0 female indonesia
7 eric 23.0 male indonesia
9 max 23.0 male singapore
10 jason 32.0 male singapore
11 jack 45.0 male singapore
替代解决方案:
(df.assign(country=df["name"].str.extract("^country (.+)", expand=False).ffill())
.dropna()
)