Having narrowly escaped being an April-fools baby, this week saw me celebrating my birthday (no, I won't say which one!). The great thing about being an Easter birthday is that I can legitimately scoff a lot of chocolate.
Project-wise, I've managed to code a full two path models using my PhD data and test the independence claims in each model, as well as the overall model, following some methods in Clough (2012). This has been quite an achievement for me. The models run in R, though I can't say whether they're totally correct or not yet. This exercise has been useful because I've had to code logistic (Bernoulli), Poisson and regular Normal linear regression models in BUGS. I've also had to tackle some issue around data standardisation prior to modelling.
Another issue I wanted to resolve was how to calculate something akin to Bayesian p-values, or a measure of the fit, for a BUGS logistic regression model. Here is some code provided by Richard Duncan that does just this using a Chi-square goodness of fit tests and deviance measure. However, I haven't managed to get this working on my own data yet.
# load packages
library(rjags)
library(R2jags)
library(mcmcplots)
load.module("glm") # loads the glm module for JAGS, which
# may help with convergence
# set working directory
setwd("c:\\pgrad\\kirsty mcgregor")
# generate some data
unlogit <- function(x) exp(x)/(1+exp(x))
# an explanatory variable
x <- seq(1, 10, 0.1)
# probability as a function of x on the logit scale
p <- unlogit(-1 + 0.3*x)
plot(p ~ x)
# now generate some bernoulli data using this probability info
y <- rbinom(length(p), 1, p)
y
# logistic regression model in R
summary(m1 <- glm(y ~ x, family=binomial))
N <- length(y)
# in JAGS
mod <- "model
{
for(i in 1:N) {
y[i] ~ dbern(p[i])
logit(p[i]) <- b0 + b1*x[i]
# calculate goodness of fit statistic for logistic regression model using data
predicted[i] <- p[i]
res.y[i] <- ((y[i] - predicted[i]) / sqrt(predicted[i]*(1-predicted[i])))^2
# calculate goodness of fit statistic for logistic regression model using new predicted data
y.rep[i] ~ dbern(p[i])
res.y.rep[i] <- ((y.rep[i] - predicted[i]) / sqrt(predicted[i]*(1-predicted[i])))^2
}
fit <- sum(res.y[]) # test statistic for data
fit.rep <- sum(res.y.rep[]) # test statistic for new predicted data
test <- step(fit.rep - fit) # Test whether new data set more extreme
bpvalue <- mean(test) # Bayesian p-value
#priors
b0 ~ dnorm(0, 0.0001)
b1 ~ dnorm(0, 0.0001)
}"
# write model
write(mod, "modelRD.txt")
set.seed(rnbinom(1, mu=200, size=1))
mod <- jags(model = "modelRD.txt",
data = list(N=N, y=y, x=x),
inits = function() list(b0=rnorm(1), b1=rnorm(1)),
param = c("b0", "b1", "bpvalue"),
n.chains = 3,
n.iter =20000,
n.burnin = 10000)
# put output into mcmc form and plot chains
out <- as.mcmc(mod)
mcmcplot(out, parms=c("b0", "b1"))
# save and view output
all.sum <- mod$BUGSoutput$summary
all.sum
Reading this week:
Clough, Y. 2012. A generalized approach to modeling and estimating indirect effects in ecology. Ecology 93:1809–1815. http://dx.doi.org/10.1890/11-1899.1
Imai, K., Keele, L., and Yamamoto, T. 2010. Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science 25: 51–71. http://dx.doi.org/10.1214/10-STS321
Morin, L., Paini, D.R., Randall, R.P. 2013. Can global weed assemblages be used to predict future weeds? PLoS ONE 8(2):
e55547. http://dx.doi.org/10.1371/journal.pone.0055547
Bechara, F.C., Reis, A., Bourscheid, K., Vieira, N.K., and Trentin, B.E. 2013. Reproductive biology and early establishment of Pinus elliottii var.
elliottii in Brazilian sandy coastal plain vegetation: implications for
biological invasion. Scientia Agricola 70: 88-92. Available at: http://www.scielo.br/pdf/sa/v70n2/05.pdf