R：根据连续变量确定最大分隔两组的阈值？_R

R：根据连续变量确定最大分隔两组的阈值？

R：根据连续变量确定最大分隔两组的阈值？,r,R,假设我有200名受试者，100名在A组，100名在B组，每名受试者我测量一些连续参数 require(ggplot2) set.seed(100) value <- c(rnorm(100, mean = 5, sd = 3), rnorm(100, mean = 10, sd = 3)) group <- c(rep('A', 100), rep('B', 100)) data <- data.frame(value, group) ggplot(data = data

假设我有200名受试者，100名在A组，100名在B组，每名受试者我测量一些连续参数

require(ggplot2)
set.seed(100)

value <- c(rnorm(100, mean = 5, sd = 3), rnorm(100, mean = 10, sd = 3))
group <- c(rep('A', 100), rep('B', 100))

data <- data.frame(value, group)

ggplot(data = data, aes(x = value)) +
  geom_bar(aes(color = group))

一种简单的方法是编写一个函数来计算给定阈值的精度：

accuracy = Vectorize(function(th) mean(c("A", "B")[(value > th) + 1] == group))

然后使用

优化找到最大值：
optimize(accuracy, c(min(value), max(value)), maximum=TRUE)
# $maximum
# [1] 8.050888
# 
# $objective
# [1] 0.86

一种简单的方法是编写一个函数来计算给定阈值的精度：
accuracy = Vectorize(function(th) mean(c("A", "B")[(value > th) + 1] == group))

然后使用优化找到最大值：
optimize(accuracy, c(min(value), max(value)), maximum=TRUE)
# $maximum
# [1] 8.050888
# 
# $objective
# [1] 0.86

我得到了我需要的答案，多亏了@Thomas和@BenBolker的帮助
摘要

我试图通过逻辑回归来解决这个问题的问题是，我没有指定family=二项
在给定glm拟合的情况下，MASS中的dose.p（）函数将为我完成这项工作

代码
# Include libraries
require(ggplot2)
require(MASS)

# Set seed
set.seed(100)

# Put together some dummy data
value <- c(rnorm(100, mean = 5, sd = 3), rnorm(100, mean = 10, sd = 3))
group <- c(rep(0, 100), rep(1, 100))
data <- data.frame(value, group)

# Plot the distribution -- visually
# The answer appears to be b/t 7 and 8
ggplot(data = data, aes(x = value)) +
  geom_bar(aes(color = group))

# Fit a glm model, specifying the binomial distribution
my.glm <- glm(group~value, data = data, family = binomial)
b0 <- coef(my.glm)[[1]]
b1 <- coef(my.glm)[[2]]

# See what the probability function looks like
lr <- function(x, b0, b1) {
  prob <- 1 / (1 + exp(-1*(b0 + b1*x)))
  return(prob)                  
}

# The line appears to cross 0.5 just above 7.5
x <- -0:12
y <- lr(x, b0, b1)
lr.val <- data.frame(x, y)
ggplot(lr.val, aes(x = x, y = y)) +
  geom_line()

# The inverse of this function computes the threshold for a given probability
inv.lr <- function(p, b0, b1) {
  x <- (log(p / (1 - p)) - b0)/b1
  return(x)
}

# With the betas from this function, we get 7.686814
inv.lr(0.5, b0, b1)

# Or, feeding the glm model into dose.p from MASS, we get the same answer
dose.p(my.glm, p = 0.5)

#包括库
需要（ggplot2）
要求（质量）
#播种
种子集（100）
#把一些虚拟数据放在一起
value多亏了@Thomas和@BenBolker的帮助，我得到了我需要的答案
摘要

我试图通过逻辑回归来解决这个问题的问题是，我没有指定family=二项
在给定glm拟合的情况下，MASS中的dose.p（）函数将为我完成这项工作

代码
# Include libraries
require(ggplot2)
require(MASS)

# Set seed
set.seed(100)

# Put together some dummy data
value <- c(rnorm(100, mean = 5, sd = 3), rnorm(100, mean = 10, sd = 3))
group <- c(rep(0, 100), rep(1, 100))
data <- data.frame(value, group)

# Plot the distribution -- visually
# The answer appears to be b/t 7 and 8
ggplot(data = data, aes(x = value)) +
  geom_bar(aes(color = group))

# Fit a glm model, specifying the binomial distribution
my.glm <- glm(group~value, data = data, family = binomial)
b0 <- coef(my.glm)[[1]]
b1 <- coef(my.glm)[[2]]

# See what the probability function looks like
lr <- function(x, b0, b1) {
  prob <- 1 / (1 + exp(-1*(b0 + b1*x)))
  return(prob)                  
}

# The line appears to cross 0.5 just above 7.5
x <- -0:12
y <- lr(x, b0, b1)
lr.val <- data.frame(x, y)
ggplot(lr.val, aes(x = x, y = y)) +
  geom_line()

# The inverse of this function computes the threshold for a given probability
inv.lr <- function(p, b0, b1) {
  x <- (log(p / (1 - p)) - b0)/b1
  return(x)
}

# With the betas from this function, we get 7.686814
inv.lr(0.5, b0, b1)

# Or, feeding the glm model into dose.p from MASS, we get the same answer
dose.p(my.glm, p = 0.5)

#包括库
需要（ggplot2）
要求（质量）
#播种
种子集（100）
#把一些虚拟数据放在一起
值我不知道这样的函数，但如果你知道如何计算“断点”或“阈值”，你应该能够自己编写一个函数。我不确定我是否完全理解你的问题（特别是，你是否根据计算出的阈值重新分配组？），但我想知道你是否看过聚类方法，就像统计库中的kmeans函数一样？我想您正在寻找一个二进制分类器。逻辑回归是一种方法。估计模型并求解阈值。你的数据把它放在7.7的范围内。@blindJesse:不，我没有——当我需要为一个以上的变量做这类事情时，这看起来非常有用。谢谢尝试dose.p
从MASS
软件包中获取MASS
功能…我不知道这类功能，但如果你知道如何计算“断点”或“阈值”，你应该能够自己编写一个功能。我不确定我是否完全理解你的问题（具体来说，你是否根据计算的阈值重新分配了组？），但是我想知道你是否看过聚类方法，比如统计库中的kmeans函数？我想你在寻找一个二进制分类器。逻辑回归是一种方法。估计模型并求解阈值。你的数据把它放在7.7的范围内。@blindJesse:不，我没有——当我需要为一个以上的变量做这类事情时，这看起来非常有用。谢谢从MASS
包装中尝试dose.p
。。。