Python3：通过解析数据帧构造变量_Python_Pandas_Dataframe

Python3：通过解析数据帧构造变量

python pandas dataframe

Python3：通过解析数据帧构造变量,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据框，其中包含列id、开始、结束和名称：我正在用熊猫把这些读入python3：现在，我需要解析df，并将每个id的开始、结束和名称提取到以下格式的列表中： mylist = [GraphicFeature(start=XXX, end=YYY, color="#ffffff", label="ZZZ")] XXX这里是开始，YYY是结束，ZZZ是名字。因此，列表中的项目数与每个id的行数相同。 GraphicFeature只是模块的成员名称我想到像这样在数据帧上循环： uniq

我有以下数据框，其中包含列id、开始、结束和名称：

我正在用熊猫把这些读入python3：

现在，我需要解析df，并将每个id的开始、结束和名称提取到以下格式的列表中：

mylist = [GraphicFeature(start=XXX, end=YYY, color="#ffffff", label="ZZZ")]

XXX这里是开始，YYY是结束，ZZZ是名字。因此，列表中的项目数与每个id的行数相同。 GraphicFeature只是模块的成员名称

我想到像这样在数据帧上循环：

uniq_val = list(df["id"].unique())
for i in uniq_val:
    extracted = df.loc[df["id"] == i]

但我如何构建mylist？构建列表后，将有一些其他打印命令

因此，我在循环中的预期输出是：

对于id A：

对于id B：

对于id C：

一种方法是让

mylists = df.groupby('id').apply(lambda group: group.apply(lambda row: GraphicFeature(start=row['start'], end=row['end'], color='#ffffff', label=row['name']), axis=1).tolist())

请注意，如果采用函数式编程方法，pandas操作往往会非常整齐地组合在一起；我们希望将每一行转换为一个GraphicFeature，反过来，我们希望将具有相同id的每组行转换为一个GraphicFeature列表。因此，上述内容也可以扩展到

def row_to_graphic_feature(row):
    return GraphicFeature(start=row['start'], end=row['end'], color='#ffffff', label=row['name'])

def id_group_to_list(group):
    return group.apply(row_to_graphic_feature, axis=1).tolist()

mylists = df.groupby('id').apply(id_group_to_list)

使用您的示例数据：

In [38]: df
Out[38]:
  id  start  end     name
0  A      7  340  string1
1  B     12  113  string2
2  B    139  287  string3
3  B    301  348  string4
4  B    379  434  string5
5  C     41   73  string6
6  C    105  159  string7

In [39]: mylists = df.groupby('id').apply(id_group_to_list)

In [40]: mylists['A']
Out[40]: [GraphicFeature(start=7, end=340, color='#ffffff', label='string1')]

In [41]: mylists['B']
Out[41]:
[GraphicFeature(start=12, end=113, color='#ffffff', label='string2'),
 GraphicFeature(start=139, end=287, color='#ffffff', label='string3'),
 GraphicFeature(start=301, end=348, color='#ffffff', label='string4'),
 GraphicFeature(start=379, end=434, color='#ffffff', label='string5')]

In [42]: mylists['C']
Out[42]:
[GraphicFeature(start=41, end=73, color='#ffffff', label='string6'),
 GraphicFeature(start=105, end=159, color='#ffffff', label='string7')]

用于循环

l=[[GraphicFeature(start=x[0], end=x[1], color="#ffffff", label=x[2])for x in zip(y.start,y.end,y.name) ] for _,y in df.groupby('id')]

这是为每一行写入变量，而不是为每一个唯一id。GraphicFeature的初始值设定项中的axis=1是什么？这似乎不是他们数据结构的一部分。@Wen Ben：不用担心；我认为投票被否决是因为答案的第一个版本被打破了。@fuglede让我帮你回来：-

mylist = [GraphicFeature(start=41, end=73, color="#ffffff", label="string6"), GraphicFeature(start=105, end=159, color="#ffffff", label="string7")]

mylists = df.groupby('id').apply(lambda group: group.apply(lambda row: GraphicFeature(start=row['start'], end=row['end'], color='#ffffff', label=row['name']), axis=1).tolist())

def row_to_graphic_feature(row):
    return GraphicFeature(start=row['start'], end=row['end'], color='#ffffff', label=row['name'])

def id_group_to_list(group):
    return group.apply(row_to_graphic_feature, axis=1).tolist()

mylists = df.groupby('id').apply(id_group_to_list)

In [38]: df
Out[38]:
  id  start  end     name
0  A      7  340  string1
1  B     12  113  string2
2  B    139  287  string3
3  B    301  348  string4
4  B    379  434  string5
5  C     41   73  string6
6  C    105  159  string7

In [39]: mylists = df.groupby('id').apply(id_group_to_list)

In [40]: mylists['A']
Out[40]: [GraphicFeature(start=7, end=340, color='#ffffff', label='string1')]

In [41]: mylists['B']
Out[41]:
[GraphicFeature(start=12, end=113, color='#ffffff', label='string2'),
 GraphicFeature(start=139, end=287, color='#ffffff', label='string3'),
 GraphicFeature(start=301, end=348, color='#ffffff', label='string4'),
 GraphicFeature(start=379, end=434, color='#ffffff', label='string5')]

In [42]: mylists['C']
Out[42]:
[GraphicFeature(start=41, end=73, color='#ffffff', label='string6'),
 GraphicFeature(start=105, end=159, color='#ffffff', label='string7')]

l=[[GraphicFeature(start=x[0], end=x[1], color="#ffffff", label=x[2])for x in zip(y.start,y.end,y.name) ] for _,y in df.groupby('id')]