AttributeError:只能使用带字符串值的.str访问器,该访问器在pandas(Python)中使用np.object\dtype
我正在操作一个JSON文件,从中运行此代码以获取以下数据帧:AttributeError:只能使用带字符串值的.str访问器,该访问器在pandas(Python)中使用np.object\dtype,python,pandas,dataframe,Python,Pandas,Dataframe,我正在操作一个JSON文件,从中运行此代码以获取以下数据帧: import pandas as pd topics = df.set_index('username').popular_board_data.str.extractall(r'name":"([^,]*)') total = df.set_index('username').popular_board_data.str.extractall(r'totalCount\":([^,}]*)') data = [] for use
import pandas as pd
topics = df.set_index('username').popular_board_data.str.extractall(r'name":"([^,]*)')
total = df.set_index('username').popular_board_data.str.extractall(r'totalCount\":([^,}]*)')
data = []
for username in df.username.unique():
for topic in zip(topics[0][username], total[0][username]):
data.append([username, topic])
df_topic = pd.DataFrame(data, columns='username,topic'.split(','))
username topic
0 lukl (Hardware", 80)
1 lukl (Marketplace", 31)
2 lukl (Atari 5200", 27)
3 lukl (Atari 8-Bit Computers", 9)
4 lukl (Modern Gaming", 3)
现在,我需要将“主题”列中的信息拆分为两个不同的列:
这是预期的结果:
username topic _topic _total
0 lukl (Hardware", 80) Hardware 80
1 lukl (Marketplace", 31) Marketplace 31
2 lukl (Atari 5200", 27) Atari 5200 27
3 lukl (Atari 8", 9) Atari 8 9
4 lukl (Modern", 3) Modern 3
我想用这段代码做这件事:
df_top = df_topic.copy()
df_top['_topic'] = df_topic['topic'].str.split('(').str[1].str.split('",').str[0]
df_top['_total'] = df_topic['topic'].str.split('",').str[1].str.split(')').str[0]
df_top
但我得到了一个错误:
AttributeError:只能使用带字符串值的.str访问器,它在pandas中使用np.object dtype我认为有元组,所以只能使用
DataFrame
构造函数:
df_topic[['_topic', '_total']]=pd.DataFrame(df_topic['topic'].values.tolist(),
index=df_topic.index)
更好的解决方案是使用您以前的答案数据和:
我将主题作为字符串,如果不是字符串,则将其转换为字符串
df = pd.DataFrame(data={"username":['luk1','luk1','luk1'],
'topic':[ '(Hardware, 80)','(Marketplace, 31)', '(Atari 5200, 27)']})
df['_topic'] = df['topic'].apply(lambda x:str(x).split(",")[0][1:])
df['_total'] = df['topic'].apply(lambda x:str(x).split(",")[1][:-1])
您可以使用以下正则表达式:
df['_topic'] = df['topic'].str.extract(r'([a-zA-Z]+)')
df['_total'] = df['topic'].str.extract(r'(\d+)')
username topic _topic _total
0 lukl (Hardware", 80) Hardware 80
1 lukl (Marketplace", 31) Marketplace 31
2 lukl (Atari 5200", 27) Atari 5200
3 lukl (Atari 8-Bit Computers", 9) Atari 8
4 lukl (Modern Gaming", 3) Modern 3
您能否将问题
打印(df.head())
添加到问题?因为这里似乎应该是更好的解决方案。
df = pd.DataFrame(data={"username":['luk1','luk1','luk1'],
'topic':[ '(Hardware, 80)','(Marketplace, 31)', '(Atari 5200, 27)']})
df['_topic'] = df['topic'].apply(lambda x:str(x).split(",")[0][1:])
df['_total'] = df['topic'].apply(lambda x:str(x).split(",")[1][:-1])
username topic _topic _total
0 luk1 (Hardware, 80) Hardware 80
1 luk1 (Marketplace, 31) Marketplace 31
2 luk1 (Atari 5200, 27) Atari 5200 27
df['_topic'] = df['topic'].str.extract(r'([a-zA-Z]+)')
df['_total'] = df['topic'].str.extract(r'(\d+)')
username topic _topic _total
0 lukl (Hardware", 80) Hardware 80
1 lukl (Marketplace", 31) Marketplace 31
2 lukl (Atari 5200", 27) Atari 5200
3 lukl (Atari 8-Bit Computers", 9) Atari 8
4 lukl (Modern Gaming", 3) Modern 3