best way to create response curves for a GLM species distribution model in R? -
i'm running binomial glm predict probability of species occurrence, training on 1 dataset , testing model on dataset: here's data
trainingdata<-read.csv("trainingdata.csv")[,-1] trainingdata[,1]<-as.factor(trainingdata[,1]) trainingdata[,4]<-as.factor(trainingdata[,4]) testdata<-read.csv("testdata.csv")[,-1] testdata[,1]<-as.factor(testdata[,1]) testdata[,4]<-as.factor(testdata[,4]) mod<-glm(presence~var1+var2+var3, family=binomial, data=trainingdata) probs=predict(mod, testdata, type="response")
what best way (or function) create response curves plot relationship between probability of presence , each predictor variable?
thanks!
the marginal probabilities can calculated predict.glm type = "terms", since each of terms calculated remaining variables set @ mean values.
converted probabilty scale plogis(term + intercept).
second, because data set contains , combination of continuous values , factors predictor variables, separate plots made each type , combined grid.arrange.
although answers question directly based on glm model presented, still recommend examining spatial autocorrelation of both predictor , response variables, have impact on final model.
library(reshape2) library(dplyr) library(tidyr) library(ggplot2) library(gridextra) trainingdata <- read.csv("~/downloads/trainingdata.csv", header = true) trainingdata[['presence']] <- as.factor(trainingdata[['presence']]) trainingdata[['var3']] <- as.factor(trainingdata[['var3']]) trainingdata[['x']] <- null # not used in model testdata <- read.csv("~/downloads/testdata.csv", header = true) testdata[['presence']] <- as.factor(testdata[['presence']]) testdata[['var3']] <- as.factor(testdata[['var3']]) testdata[['x']] <- null
presence/absence model
mod <- glm(presence ~ var1 + var2 + var3, family = binomial, data = trainingdata)
get predicted probabilities each of centered variables (i.e remaining variables set mean).
mod_terms <- predict(mod, newdata = testdata, type = "terms") mod_prob <- data.frame(idx = 1:nrow(testdata), plogis(mod_terms + attr(mod_terms, "constant"))) mod_probg <- mod_prob %>% gather(variable, probability, -idx)
melt test data long format
testdata['idx'] <- 1:nrow(testdata) # add index data testdata[['x']] <- null # drop x variable since not used in model data_long <- melt(testdata, id = c("presence","idx")) data_long[['value']] <- as.numeric(data_df[['value']])
merge testdata predictions , separate data containing continuous (var1 , var2) , factors (var3).
# merge testdata predictions data_df <- merge(data_long, mod_probg, = c("idx", "variable")) data_df <- data_df %>% arrange(variable, value) data_continuous <- data_df %>% filter(., variable != "var3") %>% transform(value = as.numeric(value)) %>% arrange(variable, value) data_factor <- data_df %>% filter(., variable == "var3") %>% transform(value = as.factor(value))%>% arrange(idx)
ggplot output
g_continuous <- ggplot(data_continuous, aes(x = value, y = probability)) + geom_point()+ facet_wrap(~variable, scales = "free_x") g_factor <- ggplot(data = data_factor, aes(x = value, y = probability)) + geom_boxplot() + facet_wrap(~variable) grid.arrange(g_continuous, g_factor, nrow = 1)
Comments
Post a Comment