dplyr:使用不同的列名应用计算
我试图创建以下公式:dplyr:使用不同的列名应用计算,r,dplyr,R,Dplyr,我试图创建以下公式: Interest expense / (Total Debt(for all years)) / # number of years 数据如下所示: GE2017 GE2016 GE2015 GE2014 Interest Expense -2753000 -2026000 -1706000 -1579000 Long Term Debt
Interest expense / (Total Debt(for all years)) / # number of years
数据如下所示:
GE2017 GE2016 GE2015 GE2014
Interest Expense -2753000 -2026000 -1706000 -1579000
Long Term Debt 108575000 105080000 144659000 186596000
Short/Current Long Term Debt 134591000 136211000 197602000 261424000
Total_Debt 243166000 241291000 342261000 448020000
GOOG2017 GOOG2016 GOOG2015 GOOG2014
Interest Expense -109000 -124000 -104000 -101000
Long Term Debt 3943000 3935000 1995000 2992000
Short/Current Long Term Debt 3969000 3935000 7648000 8015000
Total_Debt 7912000 7870000 9643000 11007000
NVDA2018 NVDA2017 NVDA2016 NVDA2015
Interest Expense -61000 -58000 -47000 -46000
Long Term Debt 1985000 1985000 7000 1384000
Short/Current Long Term Debt 2000000 2791000 1434000 1398000
Total_Debt 3985000 4776000 1441000 2782000
也就是说,对于GE
,我试图将最近一年的利息费用-2753000
除以GE
所有四年的总债务的平均值
所以,
-2753000/平均值(243166000+241291000+342261000+448020000)=0.0086
然而,在计算平均值时,我遇到了groupby()
的问题,因为GE
和其他公司因年份不同而有不同的列名
cost_of_debt %>%
t() %>%
data.frame() %>%
rownames_to_column('rn') %>%
group_by(rn)
#Calcualtion here
第二;如果可能的话,我想做与上面相同的计算,但只使用每个公司的最后两年
-2753000/平均值(243166000+241291000)=0.01136
这里可能有一个grepl
函数工作吗
我有一个向量叫做符号
symbols对于第一种情况,在将行名创建为列(rownames\u to\u column
-fromtibble
)后,通过在“年”开始和公司结束之间的连接处拆分,将其分隔为“公司”和“年”,名称按“公司”分组,通过将“利息.费用”的比例与“总债务”的平均值一起创建一个“新”列。然后,我们可以按“年”排列,得到每个“公司”最后两个“总债务”的平均值,并除以“利息、费用”
library(dplyr)
cost_of_debt %>%
t() %>%
data.frame() %>%
rownames_to_column('rn') %>%
separate(rn, into = c("firm", "year"),
"(?<=[A-Z])(?=[0-9])", convert = TRUE) %>%
group_by(firm) %>%
mutate(New = Interest.Expense/mean(Total_Debt)) %>%
arrange(firm, year) %>%
mutate(NewLast = Interest.Expense/mean(tail(Total_Debt, 2)))
库(dplyr)
债务成本%>%
t()%>%
data.frame()%>%
行名到列('rn')%>%
单独(rn,分为=c(“公司”、“年度”),
“(?我认为您需要首先清理数据,以便更容易理解什么是观察值,什么是变量。Google tidy data:)这是我的解决方案。首先,我要整理数据,然后计算简单明了
library(tidyverse)
library(stringr)
), class = "data.frame")
# Clean and make the data tidy
cost_of_debt <- cost_of_debt %>%
as_tibble() %>%
rownames_to_column(var = "indicator") %>%
mutate(indicator = str_replace_all(indicator, regex("\\s|\\/"), "_")) %>%
gather(k, value, -indicator) %>%
separate(k, into = c("company", "year"), -4) %>%
spread(indicator, value) %>%
rename_all(tolower)
对于第一种情况,您不需要按i.le.cost\u of_debt%%>%+t()%%>%+data.frame()%%>%+rownames\u to_column('rn')%%>%进行变异(新=利息.费用/总债务)
太好了!谢谢!我只对每家公司的最后期限感兴趣,所以2017
对于GE
和GOOG
和2018
对于NVDA
。我将把最后期限提取到一个新的“df”中。再次感谢!@user113156您可以进行筛选或只是总结
library(tidyverse)
library(stringr)
), class = "data.frame")
# Clean and make the data tidy
cost_of_debt <- cost_of_debt %>%
as_tibble() %>%
rownames_to_column(var = "indicator") %>%
mutate(indicator = str_replace_all(indicator, regex("\\s|\\/"), "_")) %>%
gather(k, value, -indicator) %>%
separate(k, into = c("company", "year"), -4) %>%
spread(indicator, value) %>%
rename_all(tolower)
company year interest_expense long_term_debt short_current_long_term_debt total_debt
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 GE 2014 -1579000 186596000 261424000 448020000
2 GE 2015 -1706000 144659000 197602000 342261000
3 GE 2016 -2026000 105080000 136211000 241291000
4 GE 2017 -2753000 108575000 134591000 243166000
5 GOOG 2014 -101000 2992000 8015000 11007000
cost_of_debt <- cost_of_debt %>%
group_by(company) %>%
mutate(int_over_totdept4 = interest_expense / mean(total_debt),
int_over_totdept2 = interest_expense / mean(total_debt[year %in% c("2017", "2016")]))
company year interest_expense long_term_debt short_current_long_term_debt total_debt int_over_totdept4 int_over_totdept2
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 GE 2014 -1579000 186596000 261424000 448020000 -0.00495 -0.00652
2 GE 2015 -1706000 144659000 197602000 342261000 -0.00535 -0.00704
3 GE 2016 -2026000 105080000 136211000 241291000 -0.00636 -0.00836
4 GE 2017 -2753000 108575000 134591000 243166000 -0.00864 -0.0114
5 GOOG 2014 -101000 2992000 8015000 11007000 -0.0111 -0.0128
# First question:
cost_of_debt %>% filter(company == "GE", year == "2017") %>% select(company, year, int_over_totdept4)
# Second question:
cost_of_debt %>% filter(year == "2017") %>% select(company, year, int_over_totdept2)