将一组字典解析为单行熊猫（Python）_Python_Pandas_Dataframe

将一组字典解析为单行熊猫（Python）

python pandas dataframe

将一组字典解析为单行熊猫（Python）,python,pandas,dataframe,Python,Pandas,Dataframe,嗨，我有一个类似于下面的例子 information record name apple size {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}} country America partiesrelated [{'nameOfFarmer':'John Smith'},{'farmerID

嗨，我有一个类似于下面的例子

information         record
name                apple
size                {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}
country             America
partiesrelated      [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]

我想把df转换成另一个类似这样的df

information                  record
name                         apple
size_weight_gram             300
size_weight_oz               10.5
size_description_height      10
size_description_width       15 
country                      America
partiesrelated_nameOfFarmer  John Smith
partiesrelated_farmerID      A0001

在这种情况下，字典将被解析为单行，其中

size\u weight\u gram

并包含值

df的代码

df = pd.DataFrame({'information': ['name', 'size', 'country', 'partiesrealated'], 
                   'record': ['apple', {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}, 'America', [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]]})
df = df.set_index('information')

IIUC，您可以定义一个递归函数来取消序列/指令的测试，直到您有一个键、值列表，该列表既可以作为

pd.DataFrame

构造函数的有效输入，也可以按照您描述的方式格式化

看看这个解决方案：

import itertools
import collections

ch = lambda ite: list(itertools.chain.from_iterable(ite))

def isseq(obj):
    if isinstance(obj, str): return False
    return isinstance(obj, collections.abc.Sequence)

def unnest(k, v):
    if isseq(v): return ch([unnest(k, v_) for v_ in v])
    if isinstance(v, dict): return ch([unnest("_".join([k, k_]), v_) for k_, v_ in v.items()])
    return k,v

def pairwise(i):
    _a = iter(i)
    return list(zip(_a, _a))

a = ch([(unnest(k, v)) for k, v in zip(d['information'], d['record'])])
pd.DataFrame(pairwise(a))

    0                                 1
0   name                              apple
1   size_weight_gram                  300
2   size_weight_oz                    10.5
3   size_description_height           10
4   size_description_width            15
5   country                           America
6   partiesrealated_nameOfFarmer      John Smith
7   partiesrealated_farmerID          A0001

由于该解决方案的递归性质，该算法可能无法达到任何深度。例如：

d={
  'information': [
    'row1',
    'row2',
    'row3',
    'row4'
  ],
  'record': [
    'val1',
    {
      'val2': {
        'a': 300,
        'b': [
          {
            "b1": 10.5
          },
          {
            "b2": 2
          }
        ]
      },
      'val3': {
        'a': 10,
        'b': 15
      }
    },
    'val4',
    [
      {
        'val5': [
          {
            'a': {
              'c': [
                {
                  'd': {
                    'e': [
                      {
                        'f': 1
                      },
                      {
                        'g': 3
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      },
      {
        'b': 'bar'
      }
    ]
  ]
}



    0                    1
0   row1                 val1
1   row2_val2_a          300
2   row2_val2_b_b1       10.5
3   row2_val2_b_b2       2
4   row2_val3_a          10
5   row2_val3_b          15
6   row3                 val4
7   row4_val5_a_c_d_e_f  1
8   row4_val5_a_c_d_e_g  3
9   row4_b               bar

可能是重复的不，@Joost这是另一个问题。你的帖子是询问关于迭代行的问题。我的正在基于原始单行解析新行。新的列名是基于字典键和值给出的。您是对的，但您最初的措辞不够含糊，没有充分关注您的特定问题，看起来像是一个简单得多的问题。此外，如果您展示您已经尝试过的内容，我们将不胜感激。不管怎样，你编辑了你的答案，这样你的问题就更清楚了。谢谢你的回答，它完全符合我的要求：）