R 制定'；加权平均数'；每个客户端的变量_R

R 制定'；加权平均数'；每个客户端的变量

R 制定'；加权平均数'；每个客户端的变量,r,R,不确定标题是否包含了我在这里要做的一切我正在对客户机数据库进行分析，因此我们有Dataframe 1，其中每一行代表一个唯一的客户机（通过客户机id）然后，我有另一个数据框，其中列出了客户端拥有的资产。但是，每行表示一个唯一的资产（通过资产id）。因此，一个客户机的id可能会出现很多次，这意味着如果不创建另一个变量，就无法合并这两个数据帧我想创建一些变量，这些变量代表客户在某一资产类型中的投资比例以及他们的总资产有没有一个简单的方法可以做到这一点？例如，按clientid分组，然后对资产

不确定标题是否包含了我在这里要做的一切

我正在对客户机数据库进行分析，因此我们有Dataframe 1，其中每一行代表一个唯一的客户机（通过客户机id）

然后，我有另一个数据框，其中列出了客户端拥有的资产。但是，每行表示一个唯一的资产（通过资产id）。因此，一个客户机的id可能会出现很多次，这意味着如果不创建另一个变量，就无法合并这两个数据帧

我想创建一些变量，这些变量代表客户在某一资产类型中的投资比例以及他们的总资产

有没有一个简单的方法可以做到这一点？例如，按clientid分组，然后对资产类型和平均值进行分组？

我已重新创建了一个场景，试图模仿您面临的问题，尽我对您情况的最大理解。希望它至少能让你找到你想要的答案

您可以在R控制台中复制粘贴以下代码以完成所有步骤

library(dplyr)

######## Create the client database, assuming 4 different asset classes and an asset value of 1 for each of them.
df <- cbind.data.frame(clientId = c(1,1,2,3,3,3,4,4,4,5,5,6,6,7,8,9,9,10,10,10),AssetCategory= rep(c('a','b','c','d'),5),AssetValue =rep(c(1),20))

#Calculating the clients' total assets
totalAssetByClient <- df %>% group_by(clientId) %>% summarize(totalAssetByClient = sum(AssetValue))

# Appending TotalAssetByClient variable to the dataframe (client database) <- Answer to your FIRST question
df2 <- left_join(df,totalAssetByClient,by = "clientId")


#  Then Create an empty dataset to host the AssetShareByClient table
AssetShareByClient <- data.frame(clienId = integer(), AssetCategory = character(), AssetShareByClient = double())

# Creating filling the AssetShareByClient table with a nested for Loop (sorry no easy way)
for (client in unique(df2$clientId))
{
for (asset in unique(df2$AssetCategory))
{    
    df3 <- filter(df2, clientId == client, AssetCategory == asset)
    AssetShareByClient <- rbind(AssetShareByClient, c(client,asset,sum(df3$AssetValue)/mean(df3$totalAssetByClient)))
}
}

# We now have a standalone table with a column showing the proportion of investment per asset for each cient <- Answer to your SECOND question
# When the client has 0% share of an asset category it shows NaN. The sum of asset share category  for each client sums to 100%
names(AssetShareByClient) = c("clientId","AssetCategory","AssetShareByClient")
print(AssetShareByClient)

库（dplyr）
########创建客户机数据库，假设4个不同的资产类别，每个类别的资产值为1。
我们需要数据来说明。你能用dput（head（df2，30））
的输出编辑这个问题吗？如果你包含一个简单的示例输入和所需的输出，可以用来测试和验证可能的解决方案，这会更容易帮助你。谢谢，mate，这正是我想要的。一针见血干杯！！