如何从具有不同数量变量的嵌套列表创建data.frame
我下载了一个诺贝尔奖得主的json文件,并将其转换为一个名为“nobels”的列表。结构图中显示了几条记录如何从具有不同数量变量的嵌套列表创建data.frame,r,list,nested-lists,R,List,Nested Lists,我下载了一个诺贝尔奖得主的json文件,并将其转换为一个名为“nobels”的列表。结构图中显示了几条记录 str(nobels) List of 1 $ laureates:List of 2 ..$ :List of 13 .. ..$ id : chr "359" .. ..$ firstname : chr "Axel Hugo Theodor" .. ..$ surname : chr "Theorell" ..
str(nobels)
List of 1
$ laureates:List of 2
..$ :List of 13
.. ..$ id : chr "359"
.. ..$ firstname : chr "Axel Hugo Theodor"
.. ..$ surname : chr "Theorell"
.. ..$ born : chr "1903-07-06"
.. ..$ died : chr "1982-08-15"
.. ..$ bornCountry : chr "Sweden"
.. ..$ bornCountryCode: chr "SE"
.. ..$ bornCity : chr "Linköping"
.. ..$ diedCountry : chr "Sweden"
.. ..$ diedCountryCode: chr "SE"
.. ..$ diedCity : chr "Stockholm"
.. ..$ gender : chr "male"
.. ..$ prizes :List of 1
.. .. ..$ :List of 5
.. .. .. ..$ year : chr "1955"
.. .. .. ..$ category : chr "medicine"
.. .. .. ..$ share : chr "1"
.. .. .. ..$ motivation : chr "\"for his discoveries concerning the nature and mode of action of oxidation enzymes\""
.. .. .. ..$ affiliations:List of 1
.. .. .. .. ..$ :List of 3
.. .. .. .. .. ..$ name : chr "Karolinska Institutet, Nobel Medical Institute"
.. .. .. .. .. ..$ city : chr "Stockholm"
.. .. .. .. .. ..$ country: chr "Sweden"
..$ :List of 10
.. ..$ id : chr "774"
.. ..$ firstname : chr "Richard"
.. ..$ surname : chr "Axel"
.. ..$ born : chr "1946-07-02"
.. ..$ died : chr "0000-00-00"
.. ..$ bornCountry : chr "USA"
.. ..$ bornCountryCode: chr "US"
.. ..$ bornCity : chr "New York, NY"
.. ..$ gender : chr "male"
.. ..$ prizes :List of 1
.. .. ..$ :List of 5
.. .. .. ..$ year : chr "2004"
.. .. .. ..$ category : chr "medicine"
.. .. .. ..$ share : chr "2"
.. .. .. ..$ motivation : chr "\"for their discoveries of odorant receptors and the organization of the olfactory system\""
.. .. .. ..$ affiliations:List of 1
.. .. .. .. ..$ :List of 3
.. .. .. .. .. ..$ name : chr "Columbia University"
.. .. .. .. .. ..$ city : chr "New York, NY"
.. .. .. .. .. ..$ country: chr "USA"
我应该如何将其转换为data.frame
虽然列表中有列表,但我很乐意使用,比如说,年份和类别,并且不需要奖品
还有一个问题是,并非每个记录都有相同数量的变量——例如,这里的第二个示例没有提供diedCountry字段
短暂性脑缺血发作
大量道歉。我不应该在晚上这样做。对于我的原始问题,提供的答案很好。但是,当我运行完整列表时,我得到一个错误
Error in data.frame(year = "1931", category = "literature", share = "1", :
arguments imply differing number of rows: 1, 0
下面是导致这种情况的数据。看来这和隶属关系有关
nobels <- list(structure(list(id = "359", firstname = "Axel Hugo Theodor",
surname = "Theorell", born = "1903-07-06", died = "1982-08-15",
bornCountry = "Sweden", bornCountryCode = "SE", bornCity = "Linköping",
diedCountry = "Sweden", diedCountryCode = "SE", diedCity = "Stockholm",
gender = "male", prizes = list(structure(list(year = "1955",
category = "medicine", share = "1", motivation = "\"for his discoveries concerning the nature and mode of action of oxidation enzymes\"",
affiliations = list(structure(list(name = "Karolinska Institutet, Nobel Medical Institute",
city = "Stockholm", country = "Sweden"), .Names = c("name",
"city", "country")))), .Names = c("year", "category",
"share", "motivation", "affiliations")))), .Names = c("id",
"firstname", "surname", "born", "died", "bornCountry", "bornCountryCode",
"bornCity", "diedCountry", "diedCountryCode", "diedCity", "gender",
"prizes")), structure(list(id = "604", firstname = "Erik Axel",
surname = "Karlfeldt", born = "1864-07-20", died = "1931-04-08",
bornCountry = "Sweden", bornCountryCode = "SE", bornCity = "Karlbo",
diedCountry = "Sweden", diedCountryCode = "SE", diedCity = "Stockholm",
gender = "male", prizes = list(structure(list(year = "1931",
category = "literature", share = "1", motivation = "\"The poetry of Erik Axel Karlfeldt\"",
affiliations = list(list())), .Names = c("year", "category",
"share", "motivation", "affiliations")))), .Names = c("id",
"firstname", "surname", "born", "died", "bornCountry", "bornCountryCode",
"bornCity", "diedCountry", "diedCountryCode", "diedCity", "gender",
"prizes")))
nobels正如您正确识别的那样,该问题是由于附属机构
引起的,其子列表为空列表
> str(nobels)
List of 2
$ :List of 13
..$ id : chr "359"
..$ firstname : chr "Axel Hugo Theodor"
..$ surname : chr "Theorell"
..$ born : chr "1903-07-06"
..$ died : chr "1982-08-15"
..$ bornCountry : chr "Sweden"
..$ bornCountryCode: chr "SE"
..$ bornCity : chr "Linköping"
..$ diedCountry : chr "Sweden"
..$ diedCountryCode: chr "SE"
..$ diedCity : chr "Stockholm"
..$ gender : chr "male"
..$ prizes :List of 1
.. ..$ :List of 5
.. .. ..$ year : chr "1955"
.. .. ..$ category : chr "medicine"
.. .. ..$ share : chr "1"
.. .. ..$ motivation : chr "\"for his discoveries concerning the nature and mode of action of oxidation enzymes\""
.. .. ..$ affiliations:List of 1
.. .. .. ..$ :List of 3
.. .. .. .. ..$ name : chr "Karolinska Institutet, Nobel Medical Institute"
.. .. .. .. ..$ city : chr "Stockholm"
.. .. .. .. ..$ country: chr "Sweden"
$ :List of 13
..$ id : chr "604"
..$ firstname : chr "Erik Axel"
..$ surname : chr "Karlfeldt"
..$ born : chr "1864-07-20"
..$ died : chr "1931-04-08"
..$ bornCountry : chr "Sweden"
..$ bornCountryCode: chr "SE"
..$ bornCity : chr "Karlbo"
..$ diedCountry : chr "Sweden"
..$ diedCountryCode: chr "SE"
..$ diedCity : chr "Stockholm"
..$ gender : chr "male"
..$ prizes :List of 1
.. ..$ :List of 5
.. .. ..$ year : chr "1931"
.. .. ..$ category : chr "literature"
.. .. ..$ share : chr "1"
.. .. ..$ motivation : chr "\"The poetry of Erik Axel Karlfeldt\""
.. .. ..$ affiliations:List of 1
.. .. .. ..$ : list() **<--problem here**
您还可以从tidyr
devtools::install_github("hadley/tidyr")
library(tidyr)
使用新的数据集,这似乎是可行的
res1 <-unnest(lapply(nobels, function(x)
as.data.frame.list(rapply(x,unlist), stringsAsFactors=FALSE)))
str(res1)
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 19 variables:
# $ id : chr "359" "604"
#$ firstname : chr "Axel Hugo Theodor" "Erik Axel"
#$ surname : chr "Theorell" "Karlfeldt"
#$ born : chr "1903-07-06" "1864-07-20"
#$ died : chr "1982-08-15" "1931-04-08"
#$ bornCountry : chr "Sweden" "Sweden"
#$ bornCountryCode : chr "SE" "SE"
#$ bornCity : chr "Linköping" "Karlbo"
#$ diedCountry : chr "Sweden" "Sweden"
#$ diedCountryCode : chr "SE" "SE"
#$ diedCity : chr "Stockholm" "Stockholm"
#$ gender : chr "male" "male"
#$ prizes.year : chr "1955" "1931"
#$ prizes.category : chr "medicine" "literature"
#$ prizes.share : chr "1" "1"
#$ prizes.motivation : chr "\"for his discoveries concerning the nature and mode of action of oxidation enzymes\"" "\"The poetry of Erik Axel Karlfeldt\""
#$ prizes.affiliations.name : chr "Karolinska Institutet, Nobel Medical Institute" NA
#$ prizes.affiliations.city : chr "Stockholm" NA
#$ prizes.affiliations.country: chr "Sweden" NA
res1请添加一个dput(头(您的列表)
@pssgue最好显示数据集的dput
(如我在帖子中所示)获取数据结构的精确表示。@akrun。很抱歉,我忘记了dput。谢谢您的帮助suggestion@pssguy请检查我的更新。谢谢,但我在data.frame中发现了错误(参数意味着不同的行数-这可能是因为某些列表项不包括所有行。)variables@pssguy在另一个答案中,我在akrun共享的数据上尝试了这个方法,但没有得到任何错误。@ujwal你说得很对。我得到的错误是在运行完整列表时出现的。我想我已经在编辑中识别了它 above@pssguy我已经编辑了上面的答案。你现在可以试试吗?谢谢。我运行了列表的每个元素,测试长度(nobels[[I]]$奖品[[1]]]$affiliations[[1]])是否为零,如果为零,请使用你的代码。这创建了一个新列(我可以随后省去该列)值为“随机数据”或NA。它还确保为列表为零的类别输入了NA。现在只需为一位获奖者(如玛丽·居里)排序多个奖项的问题希望你能真正分析数据!感谢你花时间纠正我的缺点,并介绍我在tidyr软件包中使用unnest。正如你从上面看到的,我仍然对我上面复制的一些数据存在问题
devtools::install_github("hadley/tidyr")
library(tidyr)
res1 <-unnest(lapply(nobels, function(x)
as.data.frame.list(rapply(x,unlist), stringsAsFactors=FALSE)))
str(res1)
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 19 variables:
# $ id : chr "359" "604"
#$ firstname : chr "Axel Hugo Theodor" "Erik Axel"
#$ surname : chr "Theorell" "Karlfeldt"
#$ born : chr "1903-07-06" "1864-07-20"
#$ died : chr "1982-08-15" "1931-04-08"
#$ bornCountry : chr "Sweden" "Sweden"
#$ bornCountryCode : chr "SE" "SE"
#$ bornCity : chr "Linköping" "Karlbo"
#$ diedCountry : chr "Sweden" "Sweden"
#$ diedCountryCode : chr "SE" "SE"
#$ diedCity : chr "Stockholm" "Stockholm"
#$ gender : chr "male" "male"
#$ prizes.year : chr "1955" "1931"
#$ prizes.category : chr "medicine" "literature"
#$ prizes.share : chr "1" "1"
#$ prizes.motivation : chr "\"for his discoveries concerning the nature and mode of action of oxidation enzymes\"" "\"The poetry of Erik Axel Karlfeldt\""
#$ prizes.affiliations.name : chr "Karolinska Institutet, Nobel Medical Institute" NA
#$ prizes.affiliations.city : chr "Stockholm" NA
#$ prizes.affiliations.country: chr "Sweden" NA