Thursday, April 4, 2013

Logistic regression in BUGS: model fit and performance using Chi-square and deviance

Having narrowly escaped being an April-fools baby, this week saw me celebrating my birthday (no, I won't say which one!). The great thing about being an Easter birthday is that I can legitimately scoff a lot of chocolate.

Project-wise, I've managed to code a full two path models using my PhD data and test the independence claims in each model, as well as the overall model, following some methods in Clough (2012). This has been quite an achievement for me. The models run in R, though I can't say whether they're totally correct or not yet. This exercise has been useful because I've had to code logistic (Bernoulli), Poisson and regular Normal linear regression models in BUGS. I've also had to tackle some issue around data standardisation prior to modelling.

Another issue I wanted to resolve was how to calculate something akin to Bayesian p-values, or a measure of the fit, for a BUGS logistic regression model. Here is some code provided by Richard Duncan that does just this using a Chi-square goodness of fit tests and deviance measure. However, I haven't managed to get this working on my own data yet.

# load packages
  library(rjags)
  library(R2jags)
  library(mcmcplots)
  load.module("glm")          # loads the glm module for JAGS, which
                              # may help with convergence
  
# set working directory
setwd("c:\\pgrad\\kirsty mcgregor")

# generate some data
  unlogit <- function(x) exp(x)/(1+exp(x))
  
# an explanatory variable
  x <- seq(1, 10, 0.1)
# probability as a function of x on the logit scale  
  p <- unlogit(-1 + 0.3*x)
  plot(p ~ x)
  
# now generate some bernoulli data using this probability info
  y <- rbinom(length(p), 1, p)
  y
  
# logistic regression model in R
  summary(m1 <- glm(y ~ x, family=binomial))
  
  N <- length(y)
  
# in JAGS
  mod <- "model
  {
  for(i in 1:N) {
    y[i] ~ dbern(p[i])
    logit(p[i]) <- b0 + b1*x[i]

# calculate goodness of fit statistic for logistic regression model using data
    predicted[i] <- p[i]
    res.y[i] <- ((y[i] - predicted[i]) / sqrt(predicted[i]*(1-predicted[i])))^2

# calculate goodness of fit statistic for logistic regression model using new predicted data
    y.rep[i] ~ dbern(p[i])
    res.y.rep[i] <- ((y.rep[i] - predicted[i]) / sqrt(predicted[i]*(1-predicted[i])))^2
  }

  fit <- sum(res.y[])           # test statistic for data   
  fit.rep <- sum(res.y.rep[])   # test statistic for new predicted data   
  test <- step(fit.rep - fit)   # Test whether new data set more extreme
  bpvalue <- mean(test)   # Bayesian p-value 

  #priors
  b0 ~ dnorm(0, 0.0001)
  b1 ~ dnorm(0, 0.0001)
}"


  # write model
  write(mod, "modelRD.txt")
  
  set.seed(rnbinom(1, mu=200, size=1))
  mod <- jags(model = "modelRD.txt",
              data = list(N=N, y=y, x=x),
              inits = function() list(b0=rnorm(1), b1=rnorm(1)), 
              param = c("b0", "b1", "bpvalue"),
              n.chains = 3,
              n.iter =20000,
              n.burnin = 10000)
  

# put output into mcmc form and plot chains  
  out <- as.mcmc(mod)
  mcmcplot(out, parms=c("b0", "b1"))

# save and view output  
  all.sum <- mod$BUGSoutput$summary
  all.sum


Reading this week:
Clough, Y. 2012. A generalized approach to modeling and estimating indirect effects in ecology. Ecology 93:1809–1815. http://dx.doi.org/10.1890/11-1899.1

Imai, K., Keele, L., and Yamamoto, T. 2010. Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science 25: 51–71. http://dx.doi.org/10.1214/10-STS321

Morin, L., Paini, D.R., Randall, R.P. 2013. Can global weed assemblages be used to predict future weeds? PLoS ONE 8(2): e55547.  http://dx.doi.org/10.1371/journal.pone.0055547

Bechara, F.C., Reis, A., Bourscheid, K., Vieira, N.K., and Trentin, B.E. 2013. Reproductive biology and early establishment of Pinus elliottii var. elliottii in Brazilian sandy coastal plain vegetation: implications for biological invasion. Scientia Agricola 70: 88-92. Available at: http://www.scielo.br/pdf/sa/v70n2/05.pdf