Frequency of Characters in Strings as columns in data frame using R -
i have data frame initial
of following format
> head(initial) strings 1 a,a,b,c 2 a,b,c 3 a,a,a,a,a,b 4 a,a,b,c 5 a,b,c 6 a,a,a,a,a,b
and data frame want final
> head(final) strings b c 1 a,a,b,c 2 1 1 2 a,b,c 1 1 1 3 a,a,a,a,a,b 5 1 0 4 a,a,b,c 2 1 1 5 a,b,c 1 1 1 6 a,a,a,a,a,b 5 1 0
to generate data frames following codes can used keep number of rows high
initial<-data.frame(strings=rep(c("a,a,b,c","a,b,c","a,a,a,a,a,b"),100)) final<-data.frame(strings=rep(c("a,a,b,c","a,b,c","a,a,a,a,a,b"),100),a=rep(c(2,1,5),100),b=rep(c(1,1,1),100),c=rep(c(1,1,0),100))
what fastest way can achieve this? appreciated
we can use base r
methods task. split 'strings' column (strsplit(...)
), set names of output list
sequence of rows, stack
convert data.frame
key/value columns, frequency table
, convert 'data.frame' , cbind
original dataset.
cbind(df1, as.data.frame.matrix( table( stack( setnames( strsplit(as.character(df1$strings),','), 1:nrow(df1)) )[2:1]))) # strings b c d #1 a,b,c,d 1 1 1 1 #2 a,b,b,d,d,d 1 2 0 3 #3 a,a,a,a,b,c,d,d 4 1 1 2
or can use mtabulate
after splitting column.
library(qdaptools) cbind(df1, mtabulate(strsplit(as.character(df1$strings), ','))) # strings b c d #1 a,b,c,d 1 1 1 1 #2 a,b,b,d,d,d 1 2 0 3 #3 a,a,a,a,b,c,d,d 4 1 1 2
update
for new dataset 'initial', second method works. if need use first method correct order, convert factor
class levels
specified unique
elements of 'ind'.
df1 <- stack(setnames(strsplit(as.character(initial$strings), ','), seq_len(nrow(initial)))) df1$ind <- factor(df1$ind, levels=unique(df1$ind)) cbind(initial, as.data.frame.matrix(table(df1[2:1])))
Comments
Post a Comment