Kirsty F. McGregor: Logistic regression in BUGS: model fit and performance using Chi-square and deviance

Having narrowly escaped being an April-fools baby, this week saw me celebrating my birthday (no, I won't say which one!). The great thing about being an Easter birthday is that I can legitimately scoff a lot of chocolate.

Project-wise, I've managed to code a full two path models using my PhD data and test the independence claims in each model, as well as the overall model, following some methods in Clough (2012). This has been quite an achievement for me. The models run in R, though I can't say whether they're totally correct or not yet. This exercise has been useful because I've had to code logistic (Bernoulli), Poisson and regular Normal linear regression models in BUGS. I've also had to tackle some issue around data standardisation prior to modelling.

Another issue I wanted to resolve was how to calculate something akin to Bayesian p-values, or a measure of the fit, for a BUGS logistic regression model. Here is some code provided by Richard Duncan that does just this using a Chi-square goodness of fit tests and deviance measure. However, I haven't managed to get this working on my own data yet.

# load packages

library(rjags)

library(R2jags)

library(mcmcplots)

load.module("glm") # loads the glm module for JAGS, which

# may help with convergence

# set working directory

setwd("c:\\pgrad\\kirsty mcgregor")

# generate some data

unlogit <- function(x) exp(x)/(1+exp(x))

# an explanatory variable

x <- seq(1, 10, 0.1)

# probability as a function of x on the logit scale

p <- unlogit(-1 + 0.3*x)

plot(p ~ x)

# now generate some bernoulli data using this probability info

y <- rbinom(length(p), 1, p)

# logistic regression model in R

summary(m1 <- glm(y ~ x, family=binomial))

N <- length(y)

# in JAGS

mod <- "model

{

for(i in 1:N) {

y[i] ~ dbern(p[i])

logit(p[i]) <- b0 + b1*x[i]

# calculate goodness of fit statistic for logistic regression model using data

predicted[i] <- p[i]

res.y[i] <- ((y[i] - predicted[i]) / sqrt(predicted[i]*(1-predicted[i])))^2

# calculate goodness of fit statistic for logistic regression model using new predicted data

y.rep[i] ~ dbern(p[i])

res.y.rep[i] <- ((y.rep[i] - predicted[i]) / sqrt(predicted[i]*(1-predicted[i])))^2

}

fit <- sum(res.y[]) # test statistic for data

fit.rep <- sum(res.y.rep[]) # test statistic for new predicted data

test <- step(fit.rep - fit) # Test whether new data set more extreme

bpvalue <- mean(test) # Bayesian p-value

#priors

b0 ~ dnorm(0, 0.0001)

b1 ~ dnorm(0, 0.0001)

# write model

write(mod, "modelRD.txt")

set.seed(rnbinom(1, mu=200, size=1))

mod <- jags(model = "modelRD.txt",

data = list(N=N, y=y, x=x),

inits = function() list(b0=rnorm(1), b1=rnorm(1)),

param = c("b0", "b1", "bpvalue"),

n.chains = 3,

n.iter =20000,

n.burnin = 10000)

# put output into mcmc form and plot chains

out <- as.mcmc(mod)

mcmcplot(out, parms=c("b0", "b1"))

# save and view output

all.sum <- mod$BUGSoutput$summary

all.sum

Reading this week:

Clough, Y. 2012. A generalized approach to modeling and estimating indirect effects in ecology. Ecology 93:1809–1815. http://dx.doi.org/10.1890/11-1899.1

Imai, K., Keele, L., and Yamamoto, T. 2010. Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science 25: 51–71. http://dx.doi.org/10.1214/10-STS321

Morin, L., Paini, D.R., Randall, R.P. 2013. Can global weed assemblages be used to predict future weeds? PLoS ONE 8(2): e55547. http://dx.doi.org/10.1371/journal.pone.0055547

Bechara, F.C., Reis, A., Bourscheid, K., Vieira, N.K., and Trentin, B.E. 2013. Reproductive biology and early establishment of Pinus elliottii var. elliottii in Brazilian sandy coastal plain vegetation: implications for biological invasion. Scientia Agricola 70: 88-92. Available at: http://www.scielo.br/pdf/sa/v70n2/05.pdf

Thursday, April 4, 2013

Logistic regression in BUGS: model fit and performance using Chi-square and deviance