regex - partial matching in r- multiple matches -
i leveraging code below partial match 1 match have follow question: supposed had additional criteria fish, , wanted "dog fish" categorized both fish , canine. possible?
d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger", "black panther", "short cat", "red bird", "short bird stuffed", "big eagle", "bad sparrow", "dog fish", "head dog", "brown yorkie", "lab short bulldog"), label=1:14)
define regexes @ beginning of code
regexes <- list(c("(cat|lion|tiger|panther)","feline"), c("(bird|eagle|sparrow)","avian"), c("(dog|yorkie|bulldog)","canine"))
create vector, same length df
output_vector <- character(nrow(d))
for each regex..
for(i in seq_along(regexes)){ #grep through d$name, , when find matches, insert relevant 'tag' #the output vector output_vector[grepl(x = d$name, pattern = regexes[[i]][1])] <- regexes[[i]][2]}
insert now-filled output vector dataframe
d$species <- output_vector
desired output
# name label species #1 brown cat 1 feline #2 blue cat 2 feline #3 big lion 3 feline #4 tall tiger 4 feline #5 black panther 5 feline #6 short cat 6 feline #7 red bird 7 avian #8 short bird stuffed 8 avian #9 big eagle 9 avian #10 bad sparrow 10 avian #11 dog fish 11 canine, fish #12 head dog 12 canine #13 brown yorkie 13 canine #14 lab short bulldog 14 canine
the original stack overflow question here: partial string matching r
i'd through cross join.
library(dplyr) library(stringi) key = data_frame(partial = c("cat", "lion", "tiger", "panther", "bird", "eagle", "sparrow", "dog", "yorkie", "bulldog"), category = c("feline", "feline", "feline", "feline", "avian", "avian", "avian", "canine", "canine", "canine")) d %>% merge(key) %>% filter(name %>% stri_detect_fixed(partial) )
Comments
Post a Comment