Python 是否使用特定的列名对数据进行Numpy拆分？_Python_Numpy

Python 是否使用特定的列名对数据进行Numpy拆分？

python numpy

Python 是否使用特定的列名对数据进行Numpy拆分？,python,numpy,Python,Numpy,如何为numpy指定一列以拆分数据集现在我正在尝试拆分我拥有的数据集，其格式如下这是dataitems { "tweet_id": "1234456", "tweet": "hello world", "labels": { "item1": 2, "item2": 1 } }, {

如何为numpy指定一列以拆分数据集

现在我正在尝试拆分我拥有的数据集，其格式如下这是dataitems

{
            "tweet_id": "1234456", 
            "tweet": "hello world", 
            "labels": {
                "item1": 2, 
                "item2": 1
            }
        }, 
        {
            "tweet_id": "567890976", 
            "tweet": "testing", 
            "labels": {
                "item1": 2, 
                "item2": 1, 
                "item3": 1, 
                "item4": 1
            }
        }

目前可行的方法是只获取列表中的tweet_id并将其拆分，但我想知道是否有方法可以使用numpy.split（）直接拆分这个json文件

这只是抛出错误

OrderedDict([('tweet_id', '1234456'), ('tweet', "hello world""), ('labels', Counter({'item1': 2, 'item2': 1}))])],
      dtype=object) is not JSON serializable

感谢

pandas

提供了将json数据转换为

DataFrame

对象的功能，该对象的工作原理与表类似。这可能值得考虑，而不是使用

numpy

：

In [1]: from pandas.io.json import json_normalize
   ...: 
   ...: raw = [{"tweet_id": "1234456",
   ...:         "tweet": "hello world",
   ...:         "labels": {
   ...:             "item1": 2,
   ...:             "item2": 1
   ...:         }},
   ...:        {"tweet_id": "567890976",
   ...:         "tweet": "testing",
   ...:         "labels": {
   ...:             "item1": 2,
   ...:             "item2": 1,
   ...:             "item3": 1,
   ...:             "item4": 1
   ...:         }
   ...:         }]
   ...: 
   ...: df = json_normalize(raw)

In [2]: df
Out[2]: 
   labels.item1  labels.item2  labels.item3  labels.item4        tweet  \
0             2             1           NaN           NaN  hello world   
1             2             1           1.0           1.0      testing   

    tweet_id  
0    1234456  
1  567890976

我发现我不能在同一个数据帧上完成这一切。我所做的就是仅将

tweet\u id

s提取到一个数据帧->拆分它们，然后根据

tweet\u id

的拆分匹配初始数据集中的标签

什么是数据项

np.split

用于将numpy数组拆分为数组列表。我在这里没有看到任何数组。充其量，第一个代码块可能是一个字典列表。可以使用切片索引将列表拆分为子列表。很抱歉，

dataitems

是包含上述列表的文件。

In [1]: from pandas.io.json import json_normalize
   ...: 
   ...: raw = [{"tweet_id": "1234456",
   ...:         "tweet": "hello world",
   ...:         "labels": {
   ...:             "item1": 2,
   ...:             "item2": 1
   ...:         }},
   ...:        {"tweet_id": "567890976",
   ...:         "tweet": "testing",
   ...:         "labels": {
   ...:             "item1": 2,
   ...:             "item2": 1,
   ...:             "item3": 1,
   ...:             "item4": 1
   ...:         }
   ...:         }]
   ...: 
   ...: df = json_normalize(raw)

In [2]: df
Out[2]: 
   labels.item1  labels.item2  labels.item3  labels.item4        tweet  \
0             2             1           NaN           NaN  hello world   
1             2             1           1.0           1.0      testing   

    tweet_id  
0    1234456  
1  567890976