在R、Python或excel中获取单个列中每行中最频繁的字符串

在R、Python或excel中获取单个列中每行中最频繁的字符串,python,r,excel,Python,R,Excel,我有一个像image1这样的数据帧。我想把它转换成image2。 我尝试过r、python和excel,但都失败了。Excel公式:=INDEXAV2:AW2,MODEMATCHAV2:AW2,AV2:AW2,0给我不适用的输出。 k2列将是knumbers列中最常见的元素。任何帮助。最好的,齐鲁 在R中,您可以使用逗号分割字符串,使用表计算频率,并获取最频繁出现的字符串 df$k2 <- sapply(strsplit(df$knumbers, ','), function(x)

我有一个像image1这样的数据帧。我想把它转换成image2。 我尝试过r、python和excel,但都失败了。Excel公式:=INDEXAV2:AW2,MODEMATCHAV2:AW2,AV2:AW2,0给我不适用的输出。 k2列将是knumbers列中最常见的元素。任何帮助。最好的,齐鲁


在R中,您可以使用逗号分割字符串,使用表计算频率,并获取最频繁出现的字符串

df$k2 <- sapply(strsplit(df$knumbers, ','), function(x) 
                 names(sort(table(x), decreasing = TRUE)[1]))
Python解决方案:

# Initialise pandas, and mode in session: 
import pandas as pd
from statistics import mode

# Scalar denoting the full path to file (including file name): filepath => string scalar
filepath = ''

# Read in the Excel sheet: df => Data Frame 
df = pd.read_excel(filepath)

# Find modal element per row: k2 => string vector
df['k2'] = [*map(lambda x: mode(str(x).split(',')), df['knumbers'])]
基本R解决方案:

# Define a function to retrieve the modal element in a factor/character vector: mode_stat => function
mode_stat <- function(chr_vec){names(sort(table(as.character(chr_vec)), decreasing = TRUE)[1])}

# Apply the function to a list of split knumber strings: k2 => character vector
df$k2 <- sapply(strsplit(df$knumbers, ","), mode_stat)
df <- structure(list(Total = c(446, 346, 332, 308), knumbers = c("K10401", 
"K10413,K10413,K10412", "K13844,K13844,K13845", "K19206,K19207,K19207"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
 (goodluck)