两个方法未能使用R对数据集进行子集，请求帮助_R_Dataframe_Subset_Data Cleaning

两个方法未能使用R对数据集进行子集，请求帮助

r dataframe

两个方法未能使用R对数据集进行子集，请求帮助,r,dataframe,subset,data-cleaning,R,Dataframe,Subset,Data Cleaning,我试图用R（开源统计脚本语言）生成一些数据的子集。我尝试了两种方法，但两种都不成功。一个返回一个没有数据的表，另一个返回一个包含所有“NA”单元格的表，但其维度显然是正确的我把代码写得很清楚-- 首先，我创建一个邮政编码列表，用于对数据进行子集划分。邮政编码列表来自我将使用的数据集。邮政编码列表称为“zipCodesOfData” 接下来，我将下载我将要进行细分的犯罪数据。我基本上只是将其子集到我需要的数据集中最后一部分，第三节，展示了我尝试%in%和filter方法根据邮政编码数据过滤

我试图用R（开源统计脚本语言）生成一些数据的子集。我尝试了两种方法，但两种都不成功。一个返回一个没有数据的表，另一个返回一个包含所有“NA”单元格的表，但其维度显然是正确的

我把代码写得很清楚--

首先，我创建一个邮政编码列表，用于对数据进行子集划分。邮政编码列表来自我将使用的数据集。邮政编码列表称为“zipCodesOfData”
接下来，我将下载我将要进行细分的犯罪数据。我基本上只是将其子集到我需要的数据集中
最后一部分，第三节，展示了我尝试%in%和filter方法根据邮政编码数据过滤犯罪数据

不幸的是，这两种方法都不起作用。我希望有人能指出我的错误，或者为第三部分推荐一种不同的分组方法

（作为旁白，在第二节中，我试图将列表转换为数据帧，但它不起作用。我很好奇为什么，如果有人能帮我解释一下的话。）

感谢您的时间和帮助

####
#### Section zero: references and dependencies
####
# r's "choroplethr" library creator's blog for reference:
# http://www.arilamstein.com/blog/2015/06/25/learn-to-map-census-data-in-r/
# http://stackoverflow.com/questions/30787877/making-a-zip-code-choropleth-in-r-using-ggplot2-and-ggmap
# 
# library(choroplethr)
# library(choroplethrMaps)
# library(ggplot2)
# # use the devtools package from CRAN to install choroplethrZip from github
# # install.packages("devtools")
# library(devtools)
# install_github('arilamstein/choroplethrZip')
# library(choroplethrZip)
# library(data.table)
# 
####
#### Section one: the data set providing the zipcode we'll use to subset the crime set
####
austin2014_data_raw <- fread('https://data.austintexas.gov/resource/hcnj-rei3.csv')
names(austin2014_data_raw)
nrow(austin2014_data_raw)
## clean up: make any blank cells in column ZipCode say "NA" instead -> source:  http://stackoverflow.com/questions/12763890/exclude-blank-and-na-in-r
austin2014_data_raw[austin2014_data_raw$ZipCode==""] <- NA
# keep only rows that do not have "NA"
austin2014_data <- na.omit(austin2014_data_raw)
nrow(austin2014_data) # now there's one less row.

# selecting the first column, which is ZipCode
zipCodesOfData <- austin2014_data[,1]
View(zipCodesOfData)
# Now we have the zipcodes we need: zipCodesOfData

####
#### Section two: Crime data
####
# Crime by zipcode: https://data.austintexas.gov/dataset/Annual-Crime-2014/7g8v-xxja
#   (visualized: https://data.austintexas.gov/dataset/Annual-Crime-2014/8mst-ed5t )
# https://data.austintexas.gov/resource/<insertResourceNameHere>.csv  w/ resource "7g8v-xxja"

austinCrime2014_data_raw <- fread('https://data.austintexas.gov/resource/7g8v-xxja.csv')
View(austinCrime2014_data_raw)
nrow(austinCrime2014_data_raw)

# First, let's remove the data we don't need
names(austinCrime2014_data_raw)
columnSelection_Crime <- c("GO Location Zip", "GO Highest Offense Desc", "Highest NIBRS/UCR Offense Description")
austinCrime2014_data_selected_columns <- subset(austinCrime2014_data_raw, select=columnSelection_Crime)
names(austinCrime2014_data_selected_columns)
nrow(austinCrime2014_data_selected_columns)


####
#### Section Three: The problem: I am unable to make subsets with the two following methods.
####
# Neither of these methods work: 

# Attempt 1:

austinCrime2014_data_selected_columns <- austinCrime2014_data_selected_columns[austinCrime2014_data_selected_columns$`GO Location Zip` %in% zipCodesOfData , ]
View(austinCrime2014_data_selected_columns) # No data in the table

# Attempt 2:

# This initially told me an error:
# Then, I installed dplyr and the error went away.  
library(dplyr)
# However, it still doesn't create anything-- just an empty set w/ headers
austinCrime2014_data_selected_zips <- filter(austinCrime2014_data_selected_columns, `GO Location Zip` %in% zipCodesOfData)
View(austinCrime2014_data_selected_zips)

####
####第0节：引用和依赖关系
####
#r的“choroplethr”库创建者的博客供参考：
# http://www.arilamstein.com/blog/2015/06/25/learn-to-map-census-data-in-r/
# http://stackoverflow.com/questions/30787877/making-a-zip-code-choropleth-in-r-using-ggplot2-and-ggmap
# 
#图书馆（choroplethr）
#图书馆（choroplethrMaps）
#图书馆（GG2）
##使用CRAN的devtools包从github安装choroplethrZip
##安装程序包（“devtools”）
#图书馆（devtools）
#安装github（'arilamstein/choroplethrZip'））
#图书馆（choroplethrZip）
#库（数据表）
# 
####
####第一部分：提供zipcode的数据集，我们将使用它来子集犯罪集
####
austin2014\u数据\u原始来源：http://stackoverflow.com/questions/12763890/exclude-blank-and-na-in-r
austin2014_data_raw[austin2014_data_raw$ZipCode==“”]我不知道你为什么要do。打电话ing和t
传输你的数据。您可以使用类似于dplyr
的semi_-join
来仅获取所需的zipcodes：



库（data.table）
图书馆（dplyr）
#> -------------------------------------------------------------------------
#>data.table+dplyr代码现在位于dtplyr中。
#>请图书馆（dtplyr）！
#> -------------------------------------------------------------------------
#> 
#>正在附加包：“dplyr”
#>以下对象已从“package:data.table”屏蔽：
#> 
#>在…之间，在…之间
#>以下对象已从“package:stats”屏蔽：
#> 
#>滤波器，滞后
#>以下对象已从“package:base”屏蔽：
#> 
#>相交、setdiff、setequal、并集
zipCodesOfData%
变异（`Zip Code`=ifelse（`Zip Code`==“”，NA，`Zip Code`））%>%
na.省略（）%>%
选择（`Zip Code`）
奥斯汀2014年数据原始%
选择（`GO Location Zip`、`GO Highest ADVICE Desc`、`Highest NIBRS/UCR ADVICE Description`）%>%
半联接（zipCodesOfData，by=c（“GO Location Zip”=“Zip Code”））%>%
重命名（zipcode=`GO Location Zip`，
highestOffenseDesc=`GO highestOffenseDesc`，
NIBRS_OffenseDesc=`最高NIBRS/UCR攻击描述`）
我不知道你为什么要这样做。打电话给ing和t
传输你的数据。您可以使用类似于dplyr
的semi_-join
来仅获取所需的zipcodes：



库（data.table）
图书馆（dplyr）
#> -------------------------------------------------------------------------
#>data.table+dplyr代码现在位于dtplyr中。
#>请图书馆（dtplyr）！
#> -------------------------------------------------------------------------
#> 
#>正在附加包：“dplyr”
#>以下对象已从“package:data.table”屏蔽：
#> 
#>在…之间，在…之间
#>以下对象已从“package:stats”屏蔽：
#> 
#>滤波器，滞后
#>以下对象已从“package:base”屏蔽：
#> 
#>相交、setdiff、setequal、并集
zipCodesOfData%
变异（`Zip Code`=ifelse（`Zip Code`==“”，NA，`Zip Code`））%>%
na.省略（）%>%
选择（`Zip Code`）
奥斯汀2014年数据原始%
选择（`GO Location Zip`、`GO Highest ADVICE Desc`、`Highest NIBRS/UCR ADVICE Description`）%>%
半联接（zipCodesOfData，by=c（“GO Location Zip”=“Zip Code”））%>%
重命名（zipcode=`GO Location Zip`，
highestOffenseDesc=`GO highestOffenseDesc`，
NIBRS_OffenseDesc=`最高NIBRS/UCR攻击描述`）
我认为readr
和dplyr
可以解决您的问题。很简单：
library(readr)
library(dplyr)

### SECTION 1

# Import data
austin2014_data_raw <- read_csv('https://data.austintexas.gov/resource/hcnj-rei3.csv', na = '')
glimpse(austin2014_data_raw)
nrow(austin2014_data_raw)

# Remove NAs
austin2014_data <- na.omit(austin2014_data_raw)
nrow(austin2014_data) # now there's one less row.

# Get zip codes
zipCodesOfData <- austin2014_data$`Zip Code`

### SECTION 2

# Import data
austinCrime2014_data_raw <- read_csv('https://data.austintexas.gov/resource/7g8v-xxja.csv', na = '')
glimpse(austinCrime2014_data_raw)
nrow(austinCrime2014_data_raw)

# Select and rename required columns
columnSelection_Crime <- c("GO Location Zip", "GO Highest Offense Desc", "Highest NIBRS/UCR Offense Description")
austinCrime_df <- select(austinCrime2014_data_raw, one_of(columnSelection_Crime))
names(austinCrime_df) <- c("zipcode", "highestOffenseDesc", "NIBRS_OffenseDesc")
glimpse(austinCrime_df)
nrow(austinCrime_df)

### SECTION 3

# Filter by zipcode
austinCrime2014_data_selected_zips <- filter(austinCrime_df, zipcode %in% zipCodesOfData)
glimpse(austinCrime2014_data_selected_zips)
nrow(austinCrime2014_data_selected_zips)

库（readr）
图书馆（dplyr）
###第一节
#导入数据
austin2014_data_raw我认为readr
和dplyr
可以解决您的问题。很简单：
library(readr)
library(dplyr)

### SECTION 1

# Import data
austin2014_data_raw <- read_csv('https://data.austintexas.gov/resource/hcnj-rei3.csv', na = '')
glimpse(austin2014_data_raw)
nrow(austin2014_data_raw)

# Remove NAs
austin2014_data <- na.omit(austin2014_data_raw)
nrow(austin2014_data) # now there's one less row.

# Get zip codes
zipCodesOfData <- austin2014_data$`Zip Code`

### SECTION 2

# Import data
austinCrime2014_data_raw <- read_csv('https://data.austintexas.gov/resource/7g8v-xxja.csv', na = '')
glimpse(austinCrime2014_data_raw)
nrow(austinCrime2014_data_raw)

# Select and rename required columns
columnSelection_Crime <- c("GO Location Zip", "GO Highest Offense Desc", "Highest NIBRS/UCR Offense Description")
austinCrime_df <- select(austinCrime2014_data_raw, one_of(columnSelection_Crime))
names(austinCrime_df) <- c("zipcode", "highestOffenseDesc", "NIBRS_OffenseDesc")
glimpse(austinCrime_df)
nrow(austinCrime_df)

### SECTION 3

# Filter by zipcode
austinCrime2014_data_selected_zips <- filter(austinCrime_df, zipcode %in% zipCodesOfData)
glimpse(austinCrime2014_data_selected_zips)
nrow(austinCrime2014_data_selected_zips)

库（readr）
图书馆（dplyr）
###第一节
#导入数据
austin2014_data_rawaustinCrime_df
是一个矩阵austinCrime_df
是一个矩阵！我没有意识到我可以在打文件下载电话时做到这一点！我得调查一下dplyr！是的，在发布后，我最终删除了你引用的部分。不过，为了不让任何人对你在我的原始帖子中引用的内容感到困惑，我又添加了它们。谢谢！我没有意识到我可以在打文件下载电话时做到这一点！我得调查一下dplyr！是的，在发布后，我最终删除了你引用的部分。不过，为了不让任何人对你在我的原始帖子中引用的内容感到困惑，我又重新添加了它们。谢谢你让它保持简单，并介绍我一瞥！不客气！我是dplyr
和所有其他设施的忠实粉丝。感谢您让它保持简单，并介绍我一瞥！不客气！我非常喜欢dplyr和所有其他设施。