将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典_Python_Pandas_Dictionary_Dataframe_Nested

将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典

python pandas dictionary dataframe

将具有排序唯一值的嵌套数据帧转换为Python中的嵌套字典,python,pandas,dictionary,dataframe,nested,Python,Pandas,Dictionary,Dataframe,Nested,我试图获取一个嵌套的数据帧并将其转换为一个嵌套的字典这是我的原始数据帧，具有以下唯一值：输入：df.head（5）输出： reviewerName title reviewerRatings 0 Charles Harry Potter Book Seven News:... 3.0 1 Katherine Harry Potter

我试图获取一个嵌套的数据帧并将其转换为一个嵌套的字典

这是我的原始数据帧，具有以下唯一值：

输入：

df.head（5）

输出：

    reviewerName                                  title    reviewerRatings
0        Charles       Harry Potter Book Seven News:...                3.0
1      Katherine       Harry Potter Boxed Set, Books...                5.0
2           Lora       Harry Potter and the Sorcerer...                5.0
3           Cait       Harry Potter and the Half-Blo...                5.0
4          Diane       Harry Potter and the Order of...                5.0

                                                       reviewerRatings
    reviewerName                               title
         Charles    Harry Potter Book Seven News:...               3.0
                    Harry Potter and the Half-Blo...               3.5
                    Harry Potter and the Order of...               4.0
       Katherine    Harry Potter Boxed Set, Books...               5.0
                    Harry Potter and the Half-Blo...               2.5
                    Harry Potter and the Order of...               5.0
...
230898 rows x 1 columns

{'reviewerRatings': 
 {
  ('Charles', 'Harry Potter Book Seven News:...'): 3.0, 
  ('Charles', 'Harry Potter and the Half-Blo...'): 3.5, 
  ('Charles', 'Harry Potter and the Order of...'): 4.0,   
  ('Katherine', 'Harry Potter Boxed Set, Books...'): 5.0, 
  ('Katherine', 'Harry Potter and the Half-Blo...'): 2.5, 
  ('Katherine', 'Harry Potter and the Order of...'): 5.0,
 ...}
}

输入：

len（df['reviewerName'].unique（））

输出：

考虑到66130 unqiue值中的每个值都有多个值（即“Charles”将出现3次），我将66130唯一的“reviewerName”作为新嵌套数据框中的键分配给它们，然后使用“title”和“reviewerRatings”分配值作为同一嵌套数据帧中的另一层key:value

{'Charles': {'Harry Potter Book Seven News:...': 3.0, 'Harry Potter and the Half-Blo...': 3.5, 'Harry Potter and the Order of...': 4.0}, 'Katherine': {'Harry Potter Boxed Set, Books...': 5.0, 'Harry Potter and the Half-Blo...': 2.5, 'Harry Potter and the Order of...': 5.0}, ...}
输入：
df=df.set_索引（['reviewerName'，'title']）。排序_索引（）
输出：

reviewerName title reviewerRatings 0 Charles Harry Potter Book Seven News:... 3.0 1 Katherine Harry Potter Boxed Set, Books... 5.0 2 Lora Harry Potter and the Sorcerer... 5.0 3 Cait Harry Potter and the Half-Blo... 5.0 4 Diane Harry Potter and the Order of... 5.0

reviewerRatings reviewerName title Charles Harry Potter Book Seven News:... 3.0 Harry Potter and the Half-Blo... 3.5 Harry Potter and the Order of... 4.0 Katherine Harry Potter Boxed Set, Books... 5.0 Harry Potter and the Half-Blo... 2.5 Harry Potter and the Order of... 5.0 ... 230898 rows x 1 columns

{'reviewerRatings': { ('Charles', 'Harry Potter Book Seven News:...'): 3.0, ('Charles', 'Harry Potter and the Half-Blo...'): 3.5, ('Charles', 'Harry Potter and the Order of...'): 4.0, ('Katherine', 'Harry Potter Boxed Set, Books...'): 5.0, ('Katherine', 'Harry Potter and the Half-Blo...'): 2.5, ('Katherine', 'Harry Potter and the Order of...'): 5.0, ...} }
作为对，我尝试将嵌套数据框转换为嵌套字典
上面新的嵌套数据框列索引在第1行（第3列）显示“reviewerRatings”，在第2行（第1列和第2列）显示“reviewerName”和“title”，当我运行下面的
df.to_dict（）
方法时，输出显示
{reviewerRatingsIndexName:{（reviewerName，title）：reviewerRatings}
输入：
df.to_dict（）
输出：

reviewerName title reviewerRatings 0 Charles Harry Potter Book Seven News:... 3.0 1 Katherine Harry Potter Boxed Set, Books... 5.0 2 Lora Harry Potter and the Sorcerer... 5.0 3 Cait Harry Potter and the Half-Blo... 5.0 4 Diane Harry Potter and the Order of... 5.0

reviewerRatings reviewerName title Charles Harry Potter Book Seven News:... 3.0 Harry Potter and the Half-Blo... 3.5 Harry Potter and the Order of... 4.0 Katherine Harry Potter Boxed Set, Books... 5.0 Harry Potter and the Half-Blo... 2.5 Harry Potter and the Order of... 5.0 ... 230898 rows x 1 columns

{'reviewerRatings': { ('Charles', 'Harry Potter Book Seven News:...'): 3.0, ('Charles', 'Harry Potter and the Half-Blo...'): 3.5, ('Charles', 'Harry Potter and the Order of...'): 4.0, ('Katherine', 'Harry Potter Boxed Set, Books...'): 5.0, ('Katherine', 'Harry Potter and the Half-Blo...'): 2.5, ('Katherine', 'Harry Potter and the Order of...'): 5.0, ...} }
但是对于下面我想要的输出，我希望得到的输出是
{reviewerName:{title:reviewerRating}}
，这正是我在嵌套数据框架中排序的方式

{'Charles': {'Harry Potter Book Seven News:...': 3.0, 'Harry Potter and the Half-Blo...': 3.5, 'Harry Potter and the Order of...': 4.0}, 'Katherine': {'Harry Potter Boxed Set, Books...': 5.0, 'Harry Potter and the Half-Blo...': 2.5, 'Harry Potter and the Order of...': 5.0}, ...}
有没有办法操纵嵌套的数据帧或嵌套的字典，以便在运行
df.to_dict（）
方法时，它将显示
{reviewerName:{title:reviewerRating}}
谢谢
与lambda函数一起用于
字典
每个
审阅者姓名
，然后通过以下方式输出
系列
转换：

有两种方法。您可以将
groupby
与
一起使用来记录，或者使用集合来迭代行。defaultdict 。值得注意的是，后者并不一定效率较低 + 从每个groupby 对象构造一个序列，并将其转换为字典以给出一系列字典值。最后，通过另一个to_dict 调用将其转换为字典字典 res = df.groupby('reviewerName')\ .apply(lambda x: x.set_index('title')['reviewerRatings'].to_dict())\ .to_dict() 定义dict 对象的defaultdict ，并逐行迭代数据帧 from collections import defaultdict res = defaultdict(dict) for row in df.itertuples(index=False): res[row.reviewerName][row.title] = row.reviewerRatings 生成的defaultdict 不需要转换回常规dict ，因为defaultdict 是dict 的子类绩效基准基准测试是建立和数据相关的。您应该使用自己的数据进行测试，以查看哪些数据最有效 # Python 3.6.5, Pandas 0.19.2 from collections import defaultdict from random import sample # construct sample dataframe np.random.seed(0) n = 10**4 # number of rows names = np.random.choice(['Charles', 'Lora', 'Katherine', 'Matthew', 'Mark', 'Luke', 'John'], n) books = [f'Book_{i}' for i in sample(range(10**5), n)] ratings = np.random.randint(0, 6, n) df = pd.DataFrame({'reviewerName': names, 'title': books, 'reviewerRatings': ratings}) def jez(df): return df.groupby('reviewerName')['title','reviewerRatings']\ .apply(lambda x: dict(x.values))\ .to_dict() def jpp1(df): return df.groupby('reviewerName')\ .apply(lambda x: x.set_index('title')['reviewerRatings'].to_dict())\ .to_dict() def jpp2(df): dd = defaultdict(dict) for row in df.itertuples(index=False): dd[row.reviewerName][row.title] = row.reviewerRatings return dd %timeit jez(df) # 33.5 ms per loop %timeit jpp1(df) # 17 ms per loop %timeit jpp2(df) # 21.1 ms per loop