Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 从数据帧创建嵌套的dict_Python 3.x_Pandas_Dictionary_Nested_List Comprehension - Fatal编程技术网

Python 3.x 从数据帧创建嵌套的dict

Python 3.x 从数据帧创建嵌套的dict,python-3.x,pandas,dictionary,nested,list-comprehension,Python 3.x,Pandas,Dictionary,Nested,List Comprehension,我有一个pandas数据框架,我想从中提取信息并创建一个嵌套字典供下游使用,但是,我还不太擅长处理pandas,我需要一些帮助 我的数据框如下所示: Sequence A_start A_stop B_start B_stop 0 sequence_1 1 25 26 100 1 sequence_2 1 31 32 201 2 sequence_3 1 27 28 231 3 sequence_4 1 39 40 191

我有一个pandas数据框架,我想从中提取信息并创建一个嵌套字典供下游使用,但是,我还不太擅长处理pandas,我需要一些帮助

我的数据框如下所示:

    Sequence    A_start A_stop  B_start B_stop
0   sequence_1  1   25  26  100
1   sequence_2  1   31  32  201
2   sequence_3  1   27  28  231
3   sequence_4  1   39  40  191
我想把它写到字典里,这样它就有了这样的形式:

d = {‘Sequnce: {(‘A_start’, ‘A_stop’) : [{'repeat_region':{'rpt_type':'long_terminal_repeat', 'note':"5'LTR"}}], ('B_start', 'B_stop): [{'misc_feature':{'gene': 'Gag', 'note': 'deletion of start codon'}}]}}
生成后,看起来是这样的:

{‘sequence_1’: {(‘1’, ‘25’) : [{'repeat_region':{'rpt_type':'long_terminal_repeat', 'note':"5'LTR"}}], (‘26’, '100’): [{'misc_feature':{'gene': 'Gag', 'note': 'deletion of start codon'}}]}, 
‘sequence_2’: {(‘1’, ‘31’) : [{'repeat_region':{'rpt_type':'long_terminal_repeat', 'note':"5'LTR"}}], ('32', '201’): [{'misc_feature':{'gene': 'Gag', 'note': 'deletion of start codon'}}]}, ...}
我认为列表理解可能是处理这个问题的一种简单方法,但它最终可能看起来过于复杂。这是我到目前为止所做的,显然还不起作用。我不确定是否可以使用iteritems()或groupby()以外的其他方法来识别dict中条目的结构。如有任何帮助,将不胜感激

LTR_sub_features = [{'repeat_region':{'rpt_type':'long_terminal_repeat', 'note':"5'LTR"}}]
gag_sub_features = [{'misc_feature':{'gene': 'Gag', 'note': 'deletion of start codon'}}]

ltr_gag_dict = {
Sequence: {(A_start,A_end): LTR_sub_features, (B_start,B_end):gag_sub_features} 
for Sequence, A_start, A_end, B_start, B_end in ltr_gag_df.groupby('Sequence')}
您可以使用iterrows()在运行时更新字典:
iterrows()为每行创建一个元组,其中第一个元素(即行[0])是行的索引,第二个元素是行中所有值的pd.Serie对象

<input>
            A_start A_end   B_start     B_end
sequence_1  0.1     0.025   0.030303    0.001
sequence_2  0.2     0.050   0.060606    0.002
sequence_3  0.3     0.075   0.090909    0.003
sequence_4  0.4     0.100   0.121212    0.004

A_value = 'some value'
B_value = 'other value'
d = dict()


for row in df.iterrows():  
    d[row[0]] = {(row[1]['A_start'], row[1]['A_end']): A_value, (row[1]['B_start'], row[1]['B_end']): B_value}

<output>
{'sequence_1': {(0.10000000000000001, 0.025000000000000001): 'some value', (0.030303030303030304, 0.001): 'other value'}}

A_开始A_结束B_开始B_结束
序列_1 0.1 0.025 0.030303 0.001
顺序_2 0.2 0.050 0.060606 0.002
序列号_3 0.3 0.075 0.0909 0.003
序列号_4 0.4 0.100 0.121212 0.004
A_值='some value'
B_值='其他值'
d=dict()
对于df.iterrows()中的行:
d[行[0]={(行[1]['A_-start'],行[1]['A_-end']):A_值,(行[1]['B_-start'],行[1]['B_-end']):B_值}
{'sequence_1':{(0.100000000000001,0.025000000000001):'some value',(0.030303030304,0.001):'other value'}

尝试pandas.DataFrame.to\u dict