Python 将列拆分为两个选择文本

Python 将列拆分为两个选择文本,python,pandas,Python,Pandas,我有这个数据框: df = [{"username": "last", "time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"5\",\"topicCount\":\"3\",\"totalCount\":80},{\"postCount\":\"20\",\"topicCount\":\"11\",\"name\":\"Marketplace\",\"url\",\"totalCount\":31},{\"postCount

我有这个数据框:

df = [{"username": "last",
"time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"5\",\"topicCount\":\"3\",\"totalCount\":80},{\"postCount\":\"20\",\"topicCount\":\"11\",\"name\":\"Marketplace\",\"url\",\"totalCount\":31},{\"postCount\":\"26\",\"topicCount\":\"1\",\"name\":\"Atari 5200\",\"url\",\"totalCount\":27},{\"postCount\":\"9\",\"topicCount\":0,\"name\":\"Atari 8\",\"url\"\"totalCount\":9}"
},
{"username": "truk",
 "time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"totalCount\":362},{\"postCount\":\"333\",\"topicCount\":\"22\",\"name\":\"Hardware\",\"url\",\"totalCount\":355},{\"postCount\":\"194\",\"topicCount\":\"8\",\"name\":\"Marketplace\",\"url\",\"totalCount\":202}"
}]
df = pd.DataFrame(df)
df
我运行了以下代码:

df_h0 = df.copy()
df_h0['hour']='00:00' 
df_h0['totalCount']=df.post_time_data.str.split('"00:00","postCount":"').str[1].str.split('","topic').str[0]
df_h0 = df_h0.fillna(0)
df_h0.head()
但事实上,我需要在“totalCount”之后得到数字。我不知道怎么做,因为还有其他的“totalCount”和我需要的一个“00:00”之后的一个

这是预期输出:

       hour    totalCount   username
0     00:00       80         last
1     00:00       362        truk

在你的位置上,我将调查试图模仿json表示的字符串的来源。请确保无法检索/提取相应的词典。 但如果不允许这样做,您可以使用
Series.str.extract
函数:

In [230]: df_h0['totalCount'] = df['time_data'].str.extract(r'totalCount\":(\d+)')                                                             

In [231]: df_h0                                                                                                                                
Out[231]: 
  username   hour totalCount
0     last  00:00         80
1     truk  00:00        362
试试这个:

df_h0 = df.copy()
df_h0['hour']='00:00' 
df_h0['totalCount']=df.time_data.str.split('"totalCount":').str[1].str.split("}").str[0]
df_h0.drop("time_data", axis=1)
df_h0
输出:

  username   hour totalCount
0     last  00:00         80
1     truk  00:00        362

所有这些记录中似乎都有一个没有值的挂起键,
url
,应该是这样吗?这看起来很像一个json字符串,只是
url
在记录中没有值。如果是这样的话,那么您可能必须使用正则表达式,否则使用
json\u normalize