best way to create response curves for a GLM species distribution model in R? -


i'm running binomial glm predict probability of species occurrence, training on 1 dataset , testing model on dataset: here's data

trainingdata<-read.csv("trainingdata.csv")[,-1] trainingdata[,1]<-as.factor(trainingdata[,1]) trainingdata[,4]<-as.factor(trainingdata[,4])  testdata<-read.csv("testdata.csv")[,-1] testdata[,1]<-as.factor(testdata[,1]) testdata[,4]<-as.factor(testdata[,4])  mod<-glm(presence~var1+var2+var3, family=binomial, data=trainingdata)  probs=predict(mod, testdata, type="response") 

what best way (or function) create response curves plot relationship between probability of presence , each predictor variable?

thanks!

the marginal probabilities can calculated predict.glm type = "terms", since each of terms calculated remaining variables set @ mean values.
converted probabilty scale plogis(term + intercept).

second, because data set contains , combination of continuous values , factors predictor variables, separate plots made each type , combined grid.arrange.

although answers question directly based on glm model presented, still recommend examining spatial autocorrelation of both predictor , response variables, have impact on final model.

library(reshape2) library(dplyr) library(tidyr) library(ggplot2) library(gridextra)  trainingdata <- read.csv("~/downloads/trainingdata.csv", header = true) trainingdata[['presence']] <- as.factor(trainingdata[['presence']]) trainingdata[['var3']] <- as.factor(trainingdata[['var3']]) trainingdata[['x']] <- null # not used in model  testdata <- read.csv("~/downloads/testdata.csv", header = true) testdata[['presence']] <- as.factor(testdata[['presence']]) testdata[['var3']] <- as.factor(testdata[['var3']]) testdata[['x']] <- null 

presence/absence model

mod <- glm(presence ~ var1 + var2 + var3, family = binomial, data = trainingdata) 

get predicted probabilities each of centered variables (i.e remaining variables set mean).

mod_terms <- predict(mod, newdata = testdata, type = "terms") mod_prob <- data.frame(idx = 1:nrow(testdata), plogis(mod_terms +      attr(mod_terms, "constant"))) mod_probg <- mod_prob %>% gather(variable, probability, -idx) 

melt test data long format

testdata['idx'] <- 1:nrow(testdata) # add index data testdata[['x']] <- null # drop x variable since not used in model  data_long <- melt(testdata, id = c("presence","idx"))  data_long[['value']] <- as.numeric(data_df[['value']]) 

merge testdata predictions , separate data containing continuous (var1 , var2) , factors (var3).

# merge testdata predictions data_df <- merge(data_long, mod_probg, = c("idx", "variable")) data_df <- data_df %>% arrange(variable, value)   data_continuous <- data_df %>% filter(., variable != "var3") %>%      transform(value = as.numeric(value)) %>% arrange(variable, value)   data_factor <- data_df %>% filter(., variable == "var3") %>%     transform(value = as.factor(value))%>%     arrange(idx)  

ggplot output

g_continuous <- ggplot(data_continuous, aes(x = value, y = probability)) + geom_point()+  facet_wrap(~variable, scales = "free_x")   g_factor <-  ggplot(data = data_factor, aes(x = value, y = probability)) + geom_boxplot() + facet_wrap(~variable)   grid.arrange(g_continuous, g_factor, nrow = 1) 

enter image description here


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -