两个方法未能使用R对数据集进行子集,请求帮助
我试图用R(开源统计脚本语言)生成一些数据的子集。我尝试了两种方法,但两种都不成功。一个返回一个没有数据的表,另一个返回一个包含所有“NA”单元格的表,但其维度显然是正确的 我把代码写得很清楚--两个方法未能使用R对数据集进行子集,请求帮助,r,dataframe,subset,data-cleaning,R,Dataframe,Subset,Data Cleaning,我试图用R(开源统计脚本语言)生成一些数据的子集。我尝试了两种方法,但两种都不成功。一个返回一个没有数据的表,另一个返回一个包含所有“NA”单元格的表,但其维度显然是正确的 我把代码写得很清楚-- 首先,我创建一个邮政编码列表,用于对数据进行子集划分。邮政编码列表来自我将使用的数据集。 邮政编码列表称为“zipCodesOfData” 接下来,我将下载我将要进行细分的犯罪数据。我基本上只是将其子集到我需要的数据集中 最后一部分,第三节,展示了我尝试%in%和filter方法根据邮政编码数据过滤
- 首先,我创建一个邮政编码列表,用于对数据进行子集划分。邮政编码列表来自我将使用的数据集。 邮政编码列表称为“zipCodesOfData”
- 接下来,我将下载我将要进行细分的犯罪数据。我基本上只是将其子集到我需要的数据集中
- 最后一部分,第三节,展示了我尝试%in%和filter方法根据邮政编码数据过滤犯罪数据
####
#### Section zero: references and dependencies
####
# r's "choroplethr" library creator's blog for reference:
# http://www.arilamstein.com/blog/2015/06/25/learn-to-map-census-data-in-r/
# http://stackoverflow.com/questions/30787877/making-a-zip-code-choropleth-in-r-using-ggplot2-and-ggmap
#
# library(choroplethr)
# library(choroplethrMaps)
# library(ggplot2)
# # use the devtools package from CRAN to install choroplethrZip from github
# # install.packages("devtools")
# library(devtools)
# install_github('arilamstein/choroplethrZip')
# library(choroplethrZip)
# library(data.table)
#
####
#### Section one: the data set providing the zipcode we'll use to subset the crime set
####
austin2014_data_raw <- fread('https://data.austintexas.gov/resource/hcnj-rei3.csv')
names(austin2014_data_raw)
nrow(austin2014_data_raw)
## clean up: make any blank cells in column ZipCode say "NA" instead -> source: http://stackoverflow.com/questions/12763890/exclude-blank-and-na-in-r
austin2014_data_raw[austin2014_data_raw$ZipCode==""] <- NA
# keep only rows that do not have "NA"
austin2014_data <- na.omit(austin2014_data_raw)
nrow(austin2014_data) # now there's one less row.
# selecting the first column, which is ZipCode
zipCodesOfData <- austin2014_data[,1]
View(zipCodesOfData)
# Now we have the zipcodes we need: zipCodesOfData
####
#### Section two: Crime data
####
# Crime by zipcode: https://data.austintexas.gov/dataset/Annual-Crime-2014/7g8v-xxja
# (visualized: https://data.austintexas.gov/dataset/Annual-Crime-2014/8mst-ed5t )
# https://data.austintexas.gov/resource/<insertResourceNameHere>.csv w/ resource "7g8v-xxja"
austinCrime2014_data_raw <- fread('https://data.austintexas.gov/resource/7g8v-xxja.csv')
View(austinCrime2014_data_raw)
nrow(austinCrime2014_data_raw)
# First, let's remove the data we don't need
names(austinCrime2014_data_raw)
columnSelection_Crime <- c("GO Location Zip", "GO Highest Offense Desc", "Highest NIBRS/UCR Offense Description")
austinCrime2014_data_selected_columns <- subset(austinCrime2014_data_raw, select=columnSelection_Crime)
names(austinCrime2014_data_selected_columns)
nrow(austinCrime2014_data_selected_columns)
####
#### Section Three: The problem: I am unable to make subsets with the two following methods.
####
# Neither of these methods work:
# Attempt 1:
austinCrime2014_data_selected_columns <- austinCrime2014_data_selected_columns[austinCrime2014_data_selected_columns$`GO Location Zip` %in% zipCodesOfData , ]
View(austinCrime2014_data_selected_columns) # No data in the table
# Attempt 2:
# This initially told me an error:
# Then, I installed dplyr and the error went away.
library(dplyr)
# However, it still doesn't create anything-- just an empty set w/ headers
austinCrime2014_data_selected_zips <- filter(austinCrime2014_data_selected_columns, `GO Location Zip` %in% zipCodesOfData)
View(austinCrime2014_data_selected_zips)
####
####第0节:引用和依赖关系
####
#r的“choroplethr”库创建者的博客供参考:
# http://www.arilamstein.com/blog/2015/06/25/learn-to-map-census-data-in-r/
# http://stackoverflow.com/questions/30787877/making-a-zip-code-choropleth-in-r-using-ggplot2-and-ggmap
#
#图书馆(choroplethr)
#图书馆(choroplethrMaps)
#图书馆(GG2)
##使用CRAN的devtools包从github安装choroplethrZip
##安装程序包(“devtools”)
#图书馆(devtools)
#安装github('arilamstein/choroplethrZip'))
#图书馆(choroplethrZip)
#库(数据表)
#
####
####第一部分:提供zipcode的数据集,我们将使用它来子集犯罪集
####
austin2014\u数据\u原始来源:http://stackoverflow.com/questions/12763890/exclude-blank-and-na-in-r
austin2014_data_raw[austin2014_data_raw$ZipCode==“”]我不知道你为什么要do。打电话ing和t
传输你的数据。您可以使用类似于dplyr
的semi_-join
来仅获取所需的zipcodes:
库(data.table)
图书馆(dplyr)
#> -------------------------------------------------------------------------
#>data.table+dplyr代码现在位于dtplyr中。
#>请图书馆(dtplyr)!
#> -------------------------------------------------------------------------
#>
#>正在附加包:“dplyr”
#>以下对象已从“package:data.table”屏蔽:
#>
#>在…之间,在…之间
#>以下对象已从“package:stats”屏蔽:
#>
#>滤波器,滞后
#>以下对象已从“package:base”屏蔽:
#>
#>相交、setdiff、setequal、并集
zipCodesOfData%
变异(`Zip Code`=ifelse(`Zip Code`==“”,NA,`Zip Code`))%>%
na.省略()%>%
选择(`Zip Code`)
奥斯汀2014年数据原始%
选择(`GO Location Zip`、`GO Highest ADVICE Desc`、`Highest NIBRS/UCR ADVICE Description`)%>%
半联接(zipCodesOfData,by=c(“GO Location Zip”=“Zip Code”))%>%
重命名(zipcode=`GO Location Zip`,
highestOffenseDesc=`GO highestOffenseDesc`,
NIBRS_OffenseDesc=`最高NIBRS/UCR攻击描述`)
我不知道你为什么要这样做。打电话给ing和t
传输你的数据。您可以使用类似于dplyr
的semi_-join
来仅获取所需的zipcodes:
库(data.table)
图书馆(dplyr)
#> -------------------------------------------------------------------------
#>data.table+dplyr代码现在位于dtplyr中。
#>请图书馆(dtplyr)!
#> -------------------------------------------------------------------------
#>
#>正在附加包:“dplyr”
#>以下对象已从“package:data.table”屏蔽:
#>
#>在…之间,在…之间
#>以下对象已从“package:stats”屏蔽:
#>
#>滤波器,滞后
#>以下对象已从“package:base”屏蔽:
#>
#>相交、setdiff、setequal、并集
zipCodesOfData%
变异(`Zip Code`=ifelse(`Zip Code`==“”,NA,`Zip Code`))%>%
na.省略()%>%
选择(`Zip Code`)
奥斯汀2014年数据原始%
选择(`GO Location Zip`、`GO Highest ADVICE Desc`、`Highest NIBRS/UCR ADVICE Description`)%>%
半联接(zipCodesOfData,by=c(“GO Location Zip”=“Zip Code”))%>%
重命名(zipcode=`GO Location Zip`,
highestOffenseDesc=`GO highestOffenseDesc`,
NIBRS_OffenseDesc=`最高NIBRS/UCR攻击描述`)
我认为readr
和dplyr
可以解决您的问题。很简单:
library(readr)
library(dplyr)
### SECTION 1
# Import data
austin2014_data_raw <- read_csv('https://data.austintexas.gov/resource/hcnj-rei3.csv', na = '')
glimpse(austin2014_data_raw)
nrow(austin2014_data_raw)
# Remove NAs
austin2014_data <- na.omit(austin2014_data_raw)
nrow(austin2014_data) # now there's one less row.
# Get zip codes
zipCodesOfData <- austin2014_data$`Zip Code`
### SECTION 2
# Import data
austinCrime2014_data_raw <- read_csv('https://data.austintexas.gov/resource/7g8v-xxja.csv', na = '')
glimpse(austinCrime2014_data_raw)
nrow(austinCrime2014_data_raw)
# Select and rename required columns
columnSelection_Crime <- c("GO Location Zip", "GO Highest Offense Desc", "Highest NIBRS/UCR Offense Description")
austinCrime_df <- select(austinCrime2014_data_raw, one_of(columnSelection_Crime))
names(austinCrime_df) <- c("zipcode", "highestOffenseDesc", "NIBRS_OffenseDesc")
glimpse(austinCrime_df)
nrow(austinCrime_df)
### SECTION 3
# Filter by zipcode
austinCrime2014_data_selected_zips <- filter(austinCrime_df, zipcode %in% zipCodesOfData)
glimpse(austinCrime2014_data_selected_zips)
nrow(austinCrime2014_data_selected_zips)
库(readr)
图书馆(dplyr)
###第一节
#导入数据
austin2014_data_raw我认为readr
和dplyr
可以解决您的问题。很简单:
library(readr)
library(dplyr)
### SECTION 1
# Import data
austin2014_data_raw <- read_csv('https://data.austintexas.gov/resource/hcnj-rei3.csv', na = '')
glimpse(austin2014_data_raw)
nrow(austin2014_data_raw)
# Remove NAs
austin2014_data <- na.omit(austin2014_data_raw)
nrow(austin2014_data) # now there's one less row.
# Get zip codes
zipCodesOfData <- austin2014_data$`Zip Code`
### SECTION 2
# Import data
austinCrime2014_data_raw <- read_csv('https://data.austintexas.gov/resource/7g8v-xxja.csv', na = '')
glimpse(austinCrime2014_data_raw)
nrow(austinCrime2014_data_raw)
# Select and rename required columns
columnSelection_Crime <- c("GO Location Zip", "GO Highest Offense Desc", "Highest NIBRS/UCR Offense Description")
austinCrime_df <- select(austinCrime2014_data_raw, one_of(columnSelection_Crime))
names(austinCrime_df) <- c("zipcode", "highestOffenseDesc", "NIBRS_OffenseDesc")
glimpse(austinCrime_df)
nrow(austinCrime_df)
### SECTION 3
# Filter by zipcode
austinCrime2014_data_selected_zips <- filter(austinCrime_df, zipcode %in% zipCodesOfData)
glimpse(austinCrime2014_data_selected_zips)
nrow(austinCrime2014_data_selected_zips)
库(readr)
图书馆(dplyr)
###第一节
#导入数据
austin2014_data_rawaustinCrime_df
是一个矩阵austinCrime_df
是一个矩阵!我没有意识到我可以在打文件下载电话时做到这一点!我得调查一下dplyr!是的,在发布后,我最终删除了你引用的部分。不过,为了不让任何人对你在我的原始帖子中引用的内容感到困惑,我又添加了它们。谢谢!我没有意识到我可以在打文件下载电话时做到这一点!我得调查一下dplyr!是的,在发布后,我最终删除了你引用的部分。不过,为了不让任何人对你在我的原始帖子中引用的内容感到困惑,我又重新添加了它们。谢谢你让它保持简单,并介绍我一瞥!不客气!我是dplyr
和所有其他设施的忠实粉丝。感谢您让它保持简单,并介绍我一瞥!不客气!我非常喜欢dplyr和所有其他设施。