Python 列表示行值的表_Python_Sql_R_Shell

Python 列表示行值的表

python sql r shell

Python 列表示行值的表,python,sql,r,shell,Python,Sql,R,Shell,创建列值为1和0的表的最佳方法是什么 id,string 1,"x,y,z" 2,"x,z" 3,"x" 我需要一张像这样的桌子 id,x,y,z,a,b,c 1,1,1,1,0,0,0 2,1,0,1,0,0,0 3,1,0,0,0,0,0 col x y z a b c . . . 此外，字符串中所有可能的唯一值的完整列表也是预定义的。我有一个csv，列表如下 id,x,y,z,a,b,c 1,1,1,1,0,0,0 2,1,0,1,0,0,0 3,1,0,0,0,0,0 col

创建列值为1和0的表的最佳方法是什么

id,string
1,"x,y,z"
2,"x,z"
3,"x"

我需要一张像这样的桌子

id,x,y,z,a,b,c
1,1,1,1,0,0,0
2,1,0,1,0,0,0
3,1,0,0,0,0,0

col
x
y
z
a
b
c
.
.
.

此外，字符串中所有可能的唯一值的完整列表也是预定义的。我有一个csv，列表如下

id,x,y,z,a,b,c
1,1,1,1,0,0,0
2,1,0,1,0,0,0
3,1,0,0,0,0,0

col
x
y
z
a
b
c
.
.
.

在SQL中，您可以使用

case

：

select id,
       (case when string like '%x%' then 1 else 0 end) as x,
       (case when string like '%y%' then 1 else 0 end) as y,
       . . .
from t;

可能有更简单的公式，具体取决于数据库。此外，这假设值不会重叠——正如您的问题中所述。例如，“苹果”和“菠萝”会引起问题。

在SQL中，您可以使用

case

：

select id,
       (case when string like '%x%' then 1 else 0 end) as x,
       (case when string like '%y%' then 1 else 0 end) as y,
       . . .
from t;

可能有更简单的公式，具体取决于数据库。此外，这假设值不会重叠——正如您的问题中所述。例如，“苹果”和“菠萝”会引起问题。

你可以通过重塑来实现这一点

library(dplyr)
library(stringi)
library(tidyr)

'id,string
1,"x,y,z"
2,"x,z"
3,"x"' %>% 
  read.csv(text = .) %>%
  mutate(string_split = 
           string %>% 
           stri_split_fixed(",") ) %>%
  unnest(string_split) %>%
  mutate(value = 1) %>%
  spread(string_split, value, fill = 0)

您可以通过重塑来实现这一点

library(dplyr)
library(stringi)
library(tidyr)

'id,string
1,"x,y,z"
2,"x,z"
3,"x"' %>% 
  read.csv(text = .) %>%
  mutate(string_split = 
           string %>% 
           stri_split_fixed(",") ) %>%
  unnest(string_split) %>%
  mutate(value = 1) %>%
  spread(string_split, value, fill = 0)

您可以拆分字符串，然后使用%中的

%将拆分值与预定义的可能值列表中的值进行匹配
例如：
mydf <- read.csv(text = 'id,string\n1,"x,y,z"\n2,"x,z"\n3,"x"')

matches <- c("x", "y", "z", "a", "b", "c")

cbind(mydf[1], 
      `colnames<-`(t(vapply(strsplit(as.character(mydf$string), ",", TRUE), 
                            function(x) {
                              matches %in% x
                            }, 
                            numeric(length(matches)))), 
                   matches))
#   id x y z a b c
# 1  1 1 1 1 0 0 0
# 2  2 1 0 1 0 0 0
# 3  3 1 0 0 0 0 0

mydf您可以拆分字符串，然后使用%
中的%将拆分值与预定义的可能值列表中的值相匹配
例如：
mydf <- read.csv(text = 'id,string\n1,"x,y,z"\n2,"x,z"\n3,"x"')

matches <- c("x", "y", "z", "a", "b", "c")

cbind(mydf[1], 
      `colnames<-`(t(vapply(strsplit(as.character(mydf$string), ",", TRUE), 
                            function(x) {
                              matches %in% x
                            }, 
                            numeric(length(matches)))), 
                   matches))
#   id x y z a b c
# 1  1 1 1 1 0 0 0
# 2  2 1 0 1 0 0 0
# 3  3 1 0 0 0 0 0

mydf这里是使用R
中的table
和melt
的另一个选项。我们通过，
将'string'列拆分为列表
，将列表
元素的名称设置为'id'，将列表
融化为a数据框架
，将'value'列更改为因子
，并包括'a'，'b'，'c'，级别，获取带有'id'列的表和cbind

library(reshape2)
tbl <-  table(transform(melt(setNames(strsplit(df1$string, 
          ','),df1$id)),
        value=factor(value, levels=c(levels(value), letters[1:3])))[2:1])
cbind(df1['id'], as.data.frame.matrix(tbl))
#  id x y z a b c
#1  1 1 1 1 0 0 0
#2  2 1 0 1 0 0 0
#3  3 1 0 0 0 0 0

library（重塑2）
tbl这里是另一个使用R
中的table
和melt
的选项。我们通过，
将'string'列拆分为列表
，将列表
元素的名称设置为'id'，将列表
融化为a数据框架
，将'value'列更改为因子
，并包括'a'，'b'，'c'，级别，获取带有'id'列的表和cbind

library(reshape2)
tbl <-  table(transform(melt(setNames(strsplit(df1$string, 
          ','),df1$id)),
        value=factor(value, levels=c(levels(value), letters[1:3])))[2:1])
cbind(df1['id'], as.data.frame.matrix(tbl))
#  id x y z a b c
#1  1 1 1 1 0 0 0
#2  2 1 0 1 0 0 0
#3  3 1 0 0 0 0 0

library（重塑2）
tbl如果您能添加代码，将对我们有所帮助。换句话说，您已经尝试了什么？如果您可以添加代码，这将对我们有所帮助。换句话说，您已经尝试了什么？您可能还需要在其中添加一个因子
，以获得其他值。您可能还需要在其中添加一个因子
，以获得其他值。