Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/csharp/314.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用Pandas将多个嵌套值转换为分类变量?_Python_Pandas_Dataframe_Categorical Data - Fatal编程技术网

Python 如何使用Pandas将多个嵌套值转换为分类变量?

Python 如何使用Pandas将多个嵌套值转换为分类变量?,python,pandas,dataframe,categorical-data,Python,Pandas,Dataframe,Categorical Data,我正在研究yelp数据集,对于企业来说,这是来自yelp\u academic\u dataset\u business.json的第一行json。后续行与此架构匹配: { "business_id":"0DI8Dt2PJp07XkVvIElIcQ", "name":"Innovative Vapors", "neighborhood":"", "address":"227 E Baseline Rd, Ste J2", "city":"Tempe", "state":"

我正在研究yelp数据集,对于企业来说,这是来自
yelp\u academic\u dataset\u business.json
的第一行json。后续行与此架构匹配:

{
  "business_id":"0DI8Dt2PJp07XkVvIElIcQ",
  "name":"Innovative Vapors",
  "neighborhood":"",
  "address":"227 E Baseline Rd, Ste J2",
  "city":"Tempe",
  "state":"AZ",
  "postal_code":"85283",
  "latitude":33.3782141,
  "longitude":-111.936102,
  "stars":4.5,
  "review_count":17,
  "is_open":0,
  "attributes":[
    "BikeParking: True",
    "BusinessAcceptsBitcoin: False",
    "BusinessAcceptsCreditCards: True",
    "BusinessParking: {
      'garage': False,
      'street': False,
      'validated': False,
      'lot': True,
      'valet': False
    }",
    "DogsAllowed: False",
    "RestaurantsPriceRange2: 2",
    "WheelchairAccessible: True"
  ],
  "categories": [
    "Tobacco Shops",
    "Nightlife",
    "Vape Shops",
    "Shopping"
  ],
  "hours":[
    "Monday 11:0-21:0",
    "Tuesday 11:0-21:0",
    "Wednesday 11:0-21:0",
    "Thursday 11:0-21:0",
    "Friday 11:0-22:0",
    "Saturday 10:0-22:0",
    "Sunday 11:0-18:0"
  ],
  "type":"business"
}
我尝试将json解析为csv,并使用
pd导入csv。读取\u csv
,我得到以下DF:

+---+-----------------------------------------------------------------+
|idx|                     attributes                                  |
+---+-----------------------------------------------------------------+
| 0 | BikeParking: True, BusinessAcceptsBitcoin: False,               |
|   | BusinessAcceptsCreditCards: True, ,DogsAllowed: False,          |
|   | RestaurantsPriceRange2: 2, WheelchairAccessible: True,          |
|   | BusinessParking: {'garage': False,                              |
|   |                   'street': False,                              |
|   |                   'validated': False,                           |
|   |                   'lot': True,                                  |
|   |                   'valet': False}                               |
+---+-----------------------------------------------------------------+
但我真正想要的是:

+----+-----------------------------------+-----------------------------------+
| id | attributes_BusinessParking_garage | attributes_BusinessParking_lot    |
+----+-----------------------------------+-----------------------------------+
|  0 |                  1                |                0                  |
+----+-----------------------------------+-----------------------------------+
def split_attributes (row):
    for k, v in row[0].items():
        row[k] = v
df = df.apply(split_attributes)
我知道有pd.get\u dummies,但是由于单元格被视为字符串,所以我没有很好的平面分类列


注意:为了简单起见,我在示例中没有显示更多的列。

您是否尝试过使用映射函数来分隔属性

您可能需要初始化要清空字符串的列或任何需要的数据类型,然后执行以下操作:

+----+-----------------------------------+-----------------------------------+
| id | attributes_BusinessParking_garage | attributes_BusinessParking_lot    |
+----+-----------------------------------+-----------------------------------+
|  0 |                  1                |                0                  |
+----+-----------------------------------+-----------------------------------+
def split_attributes (row):
    for k, v in row[0].items():
        row[k] = v
df = df.apply(split_attributes)
编辑


根据您的最新问题;您是否尝试过使用
pd.read_json

在这种情况下,pandas不会将json视为字典。它存储为字符串。我还必须处理嵌套结构的情况。为什么不将字符串转换为字典?您熟悉
ast
模块吗?我意识到从时间上讲,这不是最有效的解决方案,但如果它能让你达到你想要的目的……数据是这样进来的还是在你导入之后?请显示原始数据源并导入。这是数据的输入方式。请原谅,Yelp没有交出熊猫数据框。它源于json吗?csv?xml?哦。它是作为json文件分发的,我用它来解析csv。请发布一个原始json示例,因为熊猫有I/O方法来导入这些文件,而不是在链接代码中使用。