AttributeError:只能使用带字符串值的.str访问器,该访问器在pandas(Python)中使用np.object\dtype

AttributeError:只能使用带字符串值的.str访问器,该访问器在pandas(Python)中使用np.object\dtype,python,pandas,dataframe,Python,Pandas,Dataframe,我正在操作一个JSON文件,从中运行此代码以获取以下数据帧: import pandas as pd topics = df.set_index('username').popular_board_data.str.extractall(r'name":"([^,]*)') total = df.set_index('username').popular_board_data.str.extractall(r'totalCount\":([^,}]*)') data = [] for use

我正在操作一个JSON文件,从中运行此代码以获取以下数据帧:

import pandas as pd

topics = df.set_index('username').popular_board_data.str.extractall(r'name":"([^,]*)')
total = df.set_index('username').popular_board_data.str.extractall(r'totalCount\":([^,}]*)')

data = []
for username in df.username.unique():
for topic in zip(topics[0][username], total[0][username]):
    data.append([username, topic])

df_topic = pd.DataFrame(data, columns='username,topic'.split(','))

    username        topic
0     lukl    (Hardware", 80)
1     lukl    (Marketplace", 31)
2     lukl    (Atari 5200", 27)
3     lukl    (Atari 8-Bit Computers", 9)
4     lukl    (Modern Gaming", 3)
现在,我需要将“主题”列中的信息拆分为两个不同的列:

这是预期的结果:

    username        topic          _topic       _total
0     lukl    (Hardware", 80)      Hardware     80
1     lukl    (Marketplace", 31)   Marketplace  31
2     lukl    (Atari 5200", 27)    Atari 5200   27
3     lukl    (Atari 8", 9)        Atari 8      9
4     lukl    (Modern", 3)         Modern       3
我想用这段代码做这件事:

df_top = df_topic.copy()
df_top['_topic'] = df_topic['topic'].str.split('(').str[1].str.split('",').str[0]
df_top['_total'] = df_topic['topic'].str.split('",').str[1].str.split(')').str[0]
df_top
但我得到了一个错误:


AttributeError:只能使用带字符串值的.str访问器,它在pandas中使用np.object dtype

我认为有元组,所以只能使用
DataFrame
构造函数:

df_topic[['_topic', '_total']]=pd.DataFrame(df_topic['topic'].values.tolist(), 
                                index=df_topic.index)
更好的解决方案是使用您以前的答案数据和:



我将主题作为字符串,如果不是字符串,则将其转换为字符串

df = pd.DataFrame(data={"username":['luk1','luk1','luk1'],
                  'topic':[ '(Hardware, 80)','(Marketplace, 31)', '(Atari 5200, 27)']})
df['_topic'] = df['topic'].apply(lambda x:str(x).split(",")[0][1:])
df['_total'] = df['topic'].apply(lambda x:str(x).split(",")[1][:-1])


您可以使用以下正则表达式:

df['_topic'] = df['topic'].str.extract(r'([a-zA-Z]+)')
df['_total'] = df['topic'].str.extract(r'(\d+)')

  username                        topic       _topic _total
0     lukl              (Hardware", 80)     Hardware     80
1     lukl           (Marketplace", 31)  Marketplace     31
2     lukl            (Atari 5200", 27)        Atari   5200
3     lukl  (Atari 8-Bit Computers", 9)        Atari      8
4     lukl          (Modern Gaming", 3)       Modern      3

您能否将问题
打印(df.head())
添加到问题?因为这里似乎应该是更好的解决方案。
df = pd.DataFrame(data={"username":['luk1','luk1','luk1'],
                  'topic':[ '(Hardware, 80)','(Marketplace, 31)', '(Atari 5200, 27)']})
df['_topic'] = df['topic'].apply(lambda x:str(x).split(",")[0][1:])
df['_total'] = df['topic'].apply(lambda x:str(x).split(",")[1][:-1])

   username         topic      _topic   _total
0   luk1    (Hardware, 80)      Hardware    80
1   luk1    (Marketplace, 31)   Marketplace 31
2   luk1    (Atari 5200, 27)    Atari 5200  27
df['_topic'] = df['topic'].str.extract(r'([a-zA-Z]+)')
df['_total'] = df['topic'].str.extract(r'(\d+)')

  username                        topic       _topic _total
0     lukl              (Hardware", 80)     Hardware     80
1     lukl           (Marketplace", 31)  Marketplace     31
2     lukl            (Atari 5200", 27)        Atari   5200
3     lukl  (Atari 8-Bit Computers", 9)        Atari      8
4     lukl          (Modern Gaming", 3)       Modern      3