Python 创建唯一Id,在读取多个文件时枚举不同的行值
我想根据游戏名称和年份为它制作一个唯一的id。主要关注的是col名称 我有多个文件:Python 创建唯一Id,在读取多个文件时枚举不同的行值,python,pandas,dataframe,Python,Pandas,Dataframe,我想根据游戏名称和年份为它制作一个唯一的id。主要关注的是col名称 我有多个文件: Name Year Level Pikachu 2007 30 Pikachu 2007 20 Raichu 2007 20 Mew 2007 35 这就是我想要的结果: Game Name Year Level Id Pokemon Pikachu 2007 30 1 Pokemon
Name Year Level
Pikachu 2007 30
Pikachu 2007 20
Raichu 2007 20
Mew 2007 35
这就是我想要的结果:
Game Name Year Level Id
Pokemon Pikachu 2007 30 1
Pokemon Pikachu 2007 20 1
Pokemon Raichu 2007 20 2
Pokemon Mew 2007 35 3
Pokemon Pikachu 2008 50 1
Pokemon Pikachu 2008 40 1
Pokemon Raichu 2008 55 2
Pokemon Mewtwo 2008 55 3
Pokemon Squirtle 2008 60 1
Pokemon Pidgey 2008 45 2
Pokemon Pidgey 2008 52 2
Pokemon Ekans 2008 51 3
我试过这个:
for file in files:
df = pd.read_csv(file,header=0)
df['Game'] = 'Pokemon'
for i, p in enumerate(df['Pokemon'].unique(), 1):
df.loc[i-1,'id'] = i
df.loc[i-1, 'Pokemon'] = p
df['Id'] = df['Id'].astype('int')
我想您需要根据每个数据帧
,为最终的大数据帧
创建列表,并通过以下方式最后连接在一起:
我想您需要根据每个数据帧
,为最终的大数据帧
创建列表,并通过以下方式最后连接在一起:
问题是什么?您确定您发布的请求结果中的
Id
列是正确的吗?如果是这样,您如何确定每行的Id
?这是我想要的Id,因为在每个文件中,都会形成新的Id。问题是什么?您确定发布的请求结果中的Id
列是正确的吗?如果是这样,您如何确定每行的Id
?这是我想要的Id,因为在每个文件中,新的Id都会形成。我可能会尝试使用df['Id']=df.groupby(['Name'],sort=False).ngroup()+1
以防万一id
需要多个列来计算idI可能会尝试使用df['id']=df.groupby(['Name'],sort=False)。ngroup()+1
以防万一id
需要多个列来计算id
Game Name Year Level Id
Pokemon Pikachu 2007 30 1
Pokemon Pikachu 2007 20 1
Pokemon Raichu 2007 20 2
Pokemon Mew 2007 35 3
Pokemon Pikachu 2008 50 1
Pokemon Pikachu 2008 40 1
Pokemon Raichu 2008 55 2
Pokemon Mewtwo 2008 55 3
Pokemon Squirtle 2008 60 1
Pokemon Pidgey 2008 45 2
Pokemon Pidgey 2008 52 2
Pokemon Ekans 2008 51 3
for file in files:
df = pd.read_csv(file,header=0)
df['Game'] = 'Pokemon'
for i, p in enumerate(df['Pokemon'].unique(), 1):
df.loc[i-1,'id'] = i
df.loc[i-1, 'Pokemon'] = p
df['Id'] = df['Id'].astype('int')
out = []
for file in files:
df = pd.read_csv(file,header=0)
df['Game'] = 'Pokemon'
df['id'] = pd.factorize(df['Name'])[0] + 1
out.append(df)
df = pd.concat(out, ignore_index=True)
print (df)
Name Year Level Game id
0 Pikachu 2007 30 Pokemon 1
1 Pikachu 2007 20 Pokemon 1
2 Raichu 2007 20 Pokemon 2
3 Mew 2007 35 Pokemon 3
4 Pikachu 2008 50 Pokemon 1
5 Pikachu 2008 40 Pokemon 1
6 Raichu 2008 55 Pokemon 2
7 Mew 2008 55 Pokemon 3
8 Squirtle 2008 50 Pokemon 1
9 Pidgey 2008 40 Pokemon 2
10 Pidgey 2008 55 Pokemon 2
11 Ekans 2008 55 Pokemon 3