R 在ggplot2中为镶嵌面使用特定值

R 在ggplot2中为镶嵌面使用特定值,r,ggplot2,R,Ggplot2,好吧,我真的被卡住了。我有一个如下所示的数据集: Species Latitude Longitude Oiling Condition BirdCount Date_ Oil_Cond Date week.number 1 Northern Gannet 30.32860 -89.19810 Not Visibly Oiled Live 1 2010-07-21

好吧,我真的被卡住了。我有一个如下所示的数据集:

                  Species Latitude Longitude            Oiling Condition BirdCount      Date_ Oil_Cond       Date week.number
1         Northern Gannet 30.32860 -89.19810 Not Visibly Oiled      Live         1 2010-07-21        1 2010-07-21          30
2           Laughing Gull 30.23172 -88.32127 Not Visibly Oiled      Live         1 2010-05-05        1 2010-05-05          19
3         Northern Gannet 30.26677 -87.59248     Visibly Oiled      Live         1 2010-05-05        2 2010-05-05          19
4  American White Pelican 29.29649 -89.66432 Not Visibly Oiled      Live         1 2010-05-05        1 2010-05-05          19
5           Brown Pelican 29.88244 -88.87624     Visibly Oiled      Live         1 2010-05-08        2 2010-05-08          19
6           Brown Pelican 29.00290 -89.36961 Not Visibly Oiled      Live         1 2010-05-14        1 2010-05-14          20
7         Northern Gannet 30.33390 -85.56565           Unknown      Live         1 2010-05-17        6 2010-05-17          21
8             Common Loon 30.28177 -87.51028 Not Visibly Oiled      Live         1 2010-05-17        1 2010-05-17          21
9           Brown Pelican 30.41410 -88.24542     Visibly Oiled      Live         1 2010-05-18        2 2010-05-18          21
10        Northern Gannet 30.24063 -88.12451 Not Visibly Oiled      Live         1 2010-05-18        1 2010-05-18          21
我正试图得到一个刻面柱状图,它描绘了5种最常见的鸟类(有超过100种独特的鸟类)的可变含油量

起初,我想用所有物种生成一个刻面,并使用以下代码:

qplot(Oil_Cond, data = birds, facets = Species ~., geom = "histogram")
但当然,这会导致过载,而且不会起作用,因为会有超过100个方面。于是我决定不管怎么说,我真的只关心排名前5位的物种,然后我计算出它们是什么以及它们出现的频率(笑鸥:3036,褐鹈鹕:789,北塘鹅:546,皇家燕鸥:321,黑撇鹬:258)。然而,我不知道该怎么做

任何帮助都将不胜感激

谢谢:)


Amy

这里最简单的方法可能是简单地绘制数据的子集。唯一需要注意的潜在问题是,物种变量是否存储为因子,而不是字符串。首先创建一个子集:

birdsSub <- subset(birds, Species %in% c('Laughing Gull','Brown Pelican',
                     'Northern Gannet','Royal Tern','Black Skimmer'))
birdsSub$Species <- droplevels(birdsSub$Species)

birdsub您可以使用优秀的软件包解决此问题

# If you don't already have plyr installed, uncomment the next line:
# install.packages('plyr')
require(plyr)

# First, find out how many of each species you have...

ns=ddply(birds,.(Species),summarise,n=length(Species))

# This will produce a table listing the number of each species you have 
# (in the column 'n'). Type 'ns' to see the table.
# We can then rank the species occurrence, to see how important the different 
# species are

ns$r = rank(-ns$n) # negative because 'rank' starts with the lowest number.

# have a look at the top 5 species:

subset(ns,r<=5)

# There are a couple of ways to proceed from here.  Either we could get the 
# top 5 species names from this 'ns' table:
# names=as.character(subset(ns,r>=5)$Species) 
# and use joran's method, or we could merge the ns table and the original 
# dataset (so that each species has an 'n' and 'r' attribute) and subset the 
# data by species number or rank.  I prefer the latter, as it allows you to 
# flexibly change the species number threshold. i.e.:

birds=merge(birds,ns,by='Species')

# We've now added 'n' and 'r' columns to the birds data, so we can select 
# our subset based on either of these columns:

birds.by.r=subset(birds,r<=5) # selects only the top 5 bird species
birds.by.n=subset(birds,r>=100) # selects all species with over 100 occurrences

# Then just plot away!

qplot(Oil_Cond,data=birds.by.r,facets=Species~.,geom='histogram')

# or

qplot(Oil_Cond,data=birds.by.n,facets=Species~.,geom='histogram')
#如果尚未安装plyr,请取消注释下一行:
#install.packages('plyr')
需要(plyr)
#首先,找出每个物种中有多少。。。
ns=ddply(鸟类,.(物种),总结,n=长度(物种))
#这将生成一个表格,列出您拥有的每个物种的数量
#(在“n”列中)。键入“ns”以查看该表。
#然后我们可以对物种的出现情况进行排序,以了解不同物种的重要性
#物种是
ns$r=rank(-ns$n)#负值,因为“rank”以最低数字开头。
#看看前5个物种:
子集(ns,r=5)$物种)
#使用joran的方法,或者我们可以合并ns表和原始表
#数据集(这样每个物种都有一个“n”和“r”属性)和子集
#按物种数量或等级划分的数据。我更喜欢后者,因为它可以让你
#灵活改变物种数量阈值。即。:
鸟类=合并(鸟类,北半球,按物种)
#我们现在已经在birds数据中添加了'n'和'r'列,因此我们可以选择
#我们的子集基于以下任一列:
birds.by.r=子集(birds,r=100)#选择出现次数超过100次的所有物种
#那就策划吧!
qplot(油料条件,数据=birds.by.r,刻面=物种~,geom='histogram')
#或
qplot(油料条件,数据=鸟类.by.n,分面=物种~,geom='histogram')

关于通过选择数量最多的因子级别来减少密度图上的线数的类似问题,请查看此问题:太好了!非常感谢你,我会继续尝试的!