select - R: obtaining subset of a column that matches a certain criteria -
let's have table of data of students in school. want @ family size of students male (1) , @ least considered "tall". how in r?
i can seem figure out how column of family size of students, student_data$family_size
, can't figure out how narrow down further.
family_size ... gender ... height 1 6 1 tall 2 3 0 tall 3 5 1 tall 4 4 1 tall 5 10 0 short 6 2 1 average
so want:
family_size 1 6 2 5 3 4
i'm not sure how indexing turn out, maybe corresponds original indexing of first table, that's not important.
also, i'm not sure if i've uploaded data frame or not, when execute typeof(student_data)
, returns "list"
we can use subset
. has subset
, select
argument pass logical index subset rows , select columns based on column index or name respectively. in op's post, mentioned extract rows have 'male' gender i.e. represented 1 in binary column. so, gender==1
gives logical true/false
converting 1 true , other values (0 here) false. condition check rows have 'tall' substring in 'height' column. use grepl
match substring 'tall' in 'height' column. couple both conditions &
, , select
column 'family_size'.
subset(df1, gender==1 & grepl('tall', height), select= family_size) # family_size #1 6 #3 5 #4 4
or using [
instead of subset
. [
recommended option use inside functions. default option drop=true
. so, if subsetting single column, might end vector
. avoid that, can use drop=false
.
df1[with(df1, gender==1 & grepl('tall', height)), 'family_size', drop=false]
data
df1 <- structure(list(family_size = c(6l, 3l, 5l, 4l, 10l, 2l), gender = c(1l, 0l, 1l, 1l, 0l, 1l), height = c("very tall", "tall", "tall", "tall", "very short", "average")), .names = c("family_size", "gender", "height"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))
Comments
Post a Comment