Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/magento/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典_Python_Pandas_Dictionary_Dataframe_Nested - Fatal编程技术网

将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典

将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典,python,pandas,dictionary,dataframe,nested,Python,Pandas,Dictionary,Dataframe,Nested,我试图获取一个嵌套的数据帧并将其转换为一个嵌套的字典 这是我的原始数据帧,具有以下唯一值: 输入:df.head(5) 输出: reviewerName title reviewerRatings 0 Charles Harry Potter Book Seven News:... 3.0 1 Katherine Harry Potter

我试图获取一个嵌套的数据帧并将其转换为一个嵌套的字典

这是我的原始数据帧,具有以下唯一值:

输入:
df.head(5)

输出:

    reviewerName                                  title    reviewerRatings
0        Charles       Harry Potter Book Seven News:...                3.0
1      Katherine       Harry Potter Boxed Set, Books...                5.0
2           Lora       Harry Potter and the Sorcerer...                5.0
3           Cait       Harry Potter and the Half-Blo...                5.0
4          Diane       Harry Potter and the Order of...                5.0
                                                       reviewerRatings
    reviewerName                               title
         Charles    Harry Potter Book Seven News:...               3.0
                    Harry Potter and the Half-Blo...               3.5
                    Harry Potter and the Order of...               4.0
       Katherine    Harry Potter Boxed Set, Books...               5.0
                    Harry Potter and the Half-Blo...               2.5
                    Harry Potter and the Order of...               5.0
...
230898 rows x 1 columns
{'reviewerRatings': 
 {
  ('Charles', 'Harry Potter Book Seven News:...'): 3.0, 
  ('Charles', 'Harry Potter and the Half-Blo...'): 3.5, 
  ('Charles', 'Harry Potter and the Order of...'): 4.0,   
  ('Katherine', 'Harry Potter Boxed Set, Books...'): 5.0, 
  ('Katherine', 'Harry Potter and the Half-Blo...'): 2.5, 
  ('Katherine', 'Harry Potter and the Order of...'): 5.0,
 ...}
}
输入:
len(df['reviewerName'].unique())

输出:
66130

考虑到66130 unqiue值中的每个值都有多个值(即“Charles”将出现3次),我将66130唯一的“reviewerName”作为新嵌套数据框中的键分配给它们,然后使用“title”和“reviewerRatings”分配值作为同一嵌套数据帧中的另一层key:value

{'Charles': 
 {'Harry Potter Book Seven News:...': 3.0, 
  'Harry Potter and the Half-Blo...': 3.5, 
  'Harry Potter and the Order of...': 4.0},   
 'Katherine':
 {'Harry Potter Boxed Set, Books...': 5.0, 
  'Harry Potter and the Half-Blo...': 2.5, 
  'Harry Potter and the Order of...': 5.0},
...}
输入:
df=df.set_索引(['reviewerName','title'])。排序_索引()

输出:

    reviewerName                                  title    reviewerRatings
0        Charles       Harry Potter Book Seven News:...                3.0
1      Katherine       Harry Potter Boxed Set, Books...                5.0
2           Lora       Harry Potter and the Sorcerer...                5.0
3           Cait       Harry Potter and the Half-Blo...                5.0
4          Diane       Harry Potter and the Order of...                5.0
                                                       reviewerRatings
    reviewerName                               title
         Charles    Harry Potter Book Seven News:...               3.0
                    Harry Potter and the Half-Blo...               3.5
                    Harry Potter and the Order of...               4.0
       Katherine    Harry Potter Boxed Set, Books...               5.0
                    Harry Potter and the Half-Blo...               2.5
                    Harry Potter and the Order of...               5.0
...
230898 rows x 1 columns
{'reviewerRatings': 
 {
  ('Charles', 'Harry Potter Book Seven News:...'): 3.0, 
  ('Charles', 'Harry Potter and the Half-Blo...'): 3.5, 
  ('Charles', 'Harry Potter and the Order of...'): 4.0,   
  ('Katherine', 'Harry Potter Boxed Set, Books...'): 5.0, 
  ('Katherine', 'Harry Potter and the Half-Blo...'): 2.5, 
  ('Katherine', 'Harry Potter and the Order of...'): 5.0,
 ...}
}
作为对 ,我尝试将嵌套数据框转换为嵌套字典

上面新的嵌套数据框列索引在第1行(第3列)显示“reviewerRatings”,在第2行(第1列和第2列)显示“reviewerName”和“title”,当我运行下面的
df.to_dict()
方法时,输出显示
{reviewerRatingsIndexName:{(reviewerName,title):reviewerRatings}

输入:
df.to_dict()

输出:

    reviewerName                                  title    reviewerRatings
0        Charles       Harry Potter Book Seven News:...                3.0
1      Katherine       Harry Potter Boxed Set, Books...                5.0
2           Lora       Harry Potter and the Sorcerer...                5.0
3           Cait       Harry Potter and the Half-Blo...                5.0
4          Diane       Harry Potter and the Order of...                5.0
                                                       reviewerRatings
    reviewerName                               title
         Charles    Harry Potter Book Seven News:...               3.0
                    Harry Potter and the Half-Blo...               3.5
                    Harry Potter and the Order of...               4.0
       Katherine    Harry Potter Boxed Set, Books...               5.0
                    Harry Potter and the Half-Blo...               2.5
                    Harry Potter and the Order of...               5.0
...
230898 rows x 1 columns
{'reviewerRatings': 
 {
  ('Charles', 'Harry Potter Book Seven News:...'): 3.0, 
  ('Charles', 'Harry Potter and the Half-Blo...'): 3.5, 
  ('Charles', 'Harry Potter and the Order of...'): 4.0,   
  ('Katherine', 'Harry Potter Boxed Set, Books...'): 5.0, 
  ('Katherine', 'Harry Potter and the Half-Blo...'): 2.5, 
  ('Katherine', 'Harry Potter and the Order of...'): 5.0,
 ...}
}
但是对于下面我想要的输出,我希望得到的输出是
{reviewerName:{title:reviewerRating}}
,这正是我在嵌套数据框架中排序的方式

{'Charles': 
 {'Harry Potter Book Seven News:...': 3.0, 
  'Harry Potter and the Half-Blo...': 3.5, 
  'Harry Potter and the Order of...': 4.0},   
 'Katherine':
 {'Harry Potter Boxed Set, Books...': 5.0, 
  'Harry Potter and the Half-Blo...': 2.5, 
  'Harry Potter and the Order of...': 5.0},
...}
有没有办法操纵嵌套的数据帧或嵌套的字典,以便在运行
df.to_dict()
方法时,它将显示
{reviewerName:{title:reviewerRating}}

谢谢

与lambda函数一起用于
字典
每个
审阅者姓名
,然后通过以下方式输出
系列
转换:



有两种方法。您可以将
groupby
一起使用来记录
,或者使用
集合来迭代行。defaultdict
。值得注意的是,后者并不一定效率较低

+ 从每个
groupby
对象构造一个序列,并将其转换为字典以给出一系列字典值。最后,通过另一个
to_dict
调用将其转换为字典字典

res = df.groupby('reviewerName')\
        .apply(lambda x: x.set_index('title')['reviewerRatings'].to_dict())\
        .to_dict()
定义
dict
对象的
defaultdict
,并逐行迭代数据帧

from collections import defaultdict

res = defaultdict(dict)
for row in df.itertuples(index=False):
    res[row.reviewerName][row.title] = row.reviewerRatings
生成的
defaultdict
不需要转换回常规
dict
,因为
defaultdict
dict
的子类

绩效基准 基准测试是建立和数据相关的。您应该使用自己的数据进行测试,以查看哪些数据最有效

# Python 3.6.5, Pandas 0.19.2

from collections import defaultdict
from random import sample

# construct sample dataframe
np.random.seed(0)
n = 10**4  # number of rows
names = np.random.choice(['Charles', 'Lora', 'Katherine', 'Matthew',
                          'Mark', 'Luke', 'John'], n)
books = [f'Book_{i}' for i in sample(range(10**5), n)]
ratings = np.random.randint(0, 6, n)

df = pd.DataFrame({'reviewerName': names, 'title': books, 'reviewerRatings': ratings})

def jez(df):
    return df.groupby('reviewerName')['title','reviewerRatings']\
             .apply(lambda x: dict(x.values))\
             .to_dict()

def jpp1(df):
    return df.groupby('reviewerName')\
             .apply(lambda x: x.set_index('title')['reviewerRatings'].to_dict())\
             .to_dict()

def jpp2(df):
    dd = defaultdict(dict)
    for row in df.itertuples(index=False):
        dd[row.reviewerName][row.title] = row.reviewerRatings
    return dd

%timeit jez(df)   # 33.5 ms per loop
%timeit jpp1(df)  # 17 ms per loop
%timeit jpp2(df)  # 21.1 ms per loop