使用python'；s熊猫将处理aws dynamodb数据_Python_Python 2.7_Pandas_Amazon Dynamodb

使用python'；s熊猫将处理aws dynamodb数据

python python-2.7 pandas amazon-dynamodb

使用python'；s熊猫将处理aws dynamodb数据,python,python-2.7,pandas,amazon-dynamodb,Python,Python 2.7,Pandas,Amazon Dynamodb,我使用python 2.7的boto3从dynamodb表中获取数据，并使用pandas对数据进行分组和排序不幸的是，dynamodb数据格式有点棘手。像这样： data = [{ u 'permaname': { u 'S': u 'facebook' }, u 'uuid': { u 'S': u '4b873085-c995-4ce4-9325-cfc70fcd4040' }, u 'tags':

我使用python 2.7的boto3从dynamodb表中获取数据，并使用pandas对数据进行分组和排序

不幸的是，dynamodb数据格式有点棘手。像这样：

data = [{
      u 'permaname': {
        u 'S': u 'facebook'
      },
      u 'uuid': {
        u 'S': u '4b873085-c995-4ce4-9325-cfc70fcd4040'
      },
      u 'tags': {
        u 'L': []
      },
      u 'type': {
        u 'S': u 'xxxxxx'
      },
      u 'createdOn': {
        u 'N': u '1502099627'
      },
      u 'source': {
        u 'S': u 'xxxxxxx'
      },
      u 'data': {
        u 'NULL': True
      },
      u 'crawler': {
        u 'S': u 'xxxxxxx'
      }
    }, {
      u 'permaname': {
        u 'S': u 'facebook'
      },
      u 'uuid': {
        u 'S': u '25381aef-a7db-4b79-b599-89fd060fcf73'
      },
      u 'tags': {
        u 'L': []
      },
      u 'type': {
        u 'S': u 'xxxxxxx'
      },
      u 'createdOn': {
        u 'N': u '1502096901'
      },
      u 'source': {
        u 'S': u 'xxxxxxx'
      },
      u 'data': {
        u 'NULL': True
      },
      u 'crawler': {
        u 'S': u 'xxxxxxx'
      }
    }]

要进行分组和排序，我必须创建一个pandas对象，但我不知道该怎么做

这就是我尝试的方式：

obj = pandas.DataFrame(data)
print list(obj.sort_values(['createdOn'],ascending=False).groupby('source'))

如果我这样打印obj：

print list(obj)

我有：

[u'crawler'，u'createdOn'，u'data'，u'permaname'，u'source'，u'tags'， u'type'，u'uuid']

有人知道如何使用dynamodb数据创建dataFrame obj吗？

要将dynamodb json转换为常规json，可以使用以下库：

我将尝试用Python 3回答

data = [{
       'permaname': {
         'S':  'facebook'
      },
       'uuid': {
         'S':  '4b873085-c995-4ce4-9325-cfc70fcd4040'
      },
       'tags': {
         'L': []
      },
       'type': {
         'S':  'xxxxxx'
      },
       'createdOn': {
         'N':  '1502099627'
      },
       'source': {
         'S':  'xxxxxxx'
      },
       'data': {
         'NULL': True
      },
       'crawler': {
         'S':  'xxxxxxx'
      }
    }, {
       'permaname': {
         'S':  'facebook'
      },
       'uuid': {
         'S':  '25381aef-a7db-4b79-b599-89fd060fcf73'
      },
     'tags': {
         'L': []
      },
       'type': {
         'S':  'xxxxxxx'
      },
       'createdOn': {
         'N':  '1502096901'
      },
       'source': {
         'S':  'xxxxxxx'
      },
       'data': {
         'NULL': True
      },
       'crawler': {
         'S':  'xxxxxxx'
      }
    }]

如前所述，使用dynamodb_json

from dynamodb_json import json_util as json
obj = pd.DataFrame(json.loads(data))
obj

输出：

    crawler     createdOn   data    permaname   source  tags    type    uuid
0   xxxxxxx     1502099627  None    facebook    xxxxxxx     []  xxxxxx  4b873085-c995-4ce4-9325-cfc70fcd4040
1   xxxxxxx     1502096901  None    facebook    xxxxxxx     []  xxxxxxx     25381aef-a7db-4b79-b599-89fd060fcf73

分组依据（我使用max（）来聚合结果）

有输出

       crawler  createdOn   data    permaname   tags    type    uuid
source                          
xxxxxxx     xxxxxxx     1502099627  NaN     facebook    []  xxxxxxx     4b873085-c995-4ce4-9325-cfc70fcd4040

打印列表

print(list(obj))

输出：

[u'crawler', u'createdOn', u'data', u'permaname', u'source', u'tags', u'type', u'uuid']

我希望它有帮助。

您可以使用pandas.json\u规范化

[u'crawler', u'createdOn', u'data', u'permaname', u'source', u'tags', u'type', u'uuid']