mcmc-sampling-stan

安装量: 35
排名: #19849

安装

npx skills add https://github.com/letta-ai/skills --skill mcmc-sampling-stan
MCMC Sampling with Stan
Overview
This skill provides procedural guidance for implementing Markov Chain Monte Carlo (MCMC) sampling using Stan and RStan. It covers the complete workflow from environment setup through model specification to convergence validation, with emphasis on avoiding common pitfalls in hierarchical Bayesian modeling.
When to Use This Skill
Installing and configuring RStan in R environments
Writing Stan models for Bayesian inference
Implementing hierarchical/multilevel models
Configuring MCMC sampling parameters
Diagnosing and validating MCMC convergence
Pre-Implementation Checklist
Before writing any code, verify the following in order:
1. System Dependencies First
Critical
Always check system dependencies before attempting R package installation.

Check for C++ compiler and build tools

which g++ || which clang++ dpkg -l | grep -E "(build-essential|r-base-dev)"

Debian/Ubuntu

Required system packages for RStan: build-essential or equivalent C++ toolchain r-base-dev for R development headers Sufficient memory (Stan compilation is memory-intensive) 2. RStan Installation Strategy Install RStan with explicit version pinning when specified:

If specific version required (e.g., 2.32.7)

install.packages ( "rstan" , repos = "https://cloud.r-project.org/" )

Verify installation and version

library ( rstan ) packageVersion ( "rstan" ) Common installation failures : Missing C++ compiler → Install build-essential first Compilation errors → Check g++ version compatibility Memory exhaustion → Close other applications, increase swap 3. Environment Verification Before proceeding with model development:

Verify RStan loads without errors

library ( rstan )

Test Stan compilation with minimal model

stan_model
(
model_code
=
"parameters { real x; } model { x ~ normal(0,1); }"
)
Stan Model Specification
Hierarchical Model Structure
For hierarchical Bayesian models, follow this block structure:
data
{
// Declare all input data with types and constraints
int
<
lower
=
0
>
N
;
// Number of observations
int
<
lower
=
0
>
y
[
N
]
;
// Observed counts
int
<
lower
=
0
>
n
[
N
]
;
// Trial sizes
}
parameters
{
// Hyperparameters (population-level)
real
<
lower
=
0
>
alpha
;
real
<
lower
=
0
>
beta
;
// Group-level parameters
real
<
lower
=
0
,
upper
=
1
>
theta
[
N
]
;
}
model
{
// Hyperprior (if using custom/improper priors)
// Example: p(α,β) ∝ (α+β)^(-5/2)
target
+=
-
2.5
*
log
(
alpha
+
beta
)
;
// Prior for group parameters
theta
~
beta
(
alpha
,
beta
)
;
// Likelihood
y
~
binomial
(
n
,
theta
)
;
}
Custom Prior Implementation
When implementing custom or improper priors:
Prior Form
Stan Implementation
p(x) ∝ x^a
target += a * log(x);
p(x) ∝ (a+b)^c
target += c * log(a + b);
Flat/improper
No explicit statement needed
Document the mathematical transformation
Always comment the relationship between the mathematical prior and the target += statement. MCMC Sampling Configuration Recommended Control Parameters For hierarchical models with potential sampling difficulties: fit <- sampling ( model , data = stan_data , chains = 4 ,

Multiple chains for convergence assessment

iter

100000 ,

Total iterations (including warmup)

seed

1 ,

Reproducibility

control

list ( adapt_delta = 0.95 ,

Increase for divergent transitions

max_treedepth

15

Increase for complex posteriors

) ) Parameter Selection Rationale Parameter Default When to Increase Why adapt_delta 0.8 Divergent transitions Smaller step sizes improve exploration max_treedepth 10 "Maximum treedepth" warnings Allows longer trajectories iter 2000 Low effective sample size More samples for inference warmup iter/2 Slow convergence More adaptation time Warmup Considerations Default warmup is 50% of iter For 100,000 iterations: 50,000 warmup + 50,000 sampling per chain Explicitly specify if different warmup proportion needed: sampling ( ... , iter = 100000 , warmup = 25000 )

25% warmup

Convergence Diagnostics Required Validation Steps Always verify these after sampling :

1. Check Rhat (potential scale reduction factor)

summary ( fit ) $ summary [ , "Rhat" ]

Target: All Rhat < 1.01 (ideally < 1.005)

2. Check effective sample size

summary ( fit ) $ summary [ , "n_eff" ]

Target: n_eff > 400 for reliable inference

3. Check for divergent transitions

get_num_divergent ( fit )

Target: 0 divergent transitions

4. Check treedepth saturation

get_num_max_treedepth ( fit )

Target: 0 or minimal saturation

Diagnostic Interpretation
Issue
Indicator
Resolution
Poor mixing
Rhat > 1.01
Increase iter, reparameterize
Inefficient sampling
Low n_eff
Increase iter, adjust adapt_delta
Geometric problems
Divergent transitions
Increase adapt_delta, reparameterize
Complex posterior
Max treedepth hits
Increase max_treedepth
Common Pitfalls and Solutions
1. Installation Order Errors
Wrong
Install RStan → Installation fails → Check dependencies
Right
Check dependencies → Install prerequisites → Install RStan
2. Missing Convergence Checks
Wrong
Run sampling → Extract means → Report results
Right
Run sampling → Check Rhat/n_eff/divergences → Validate → Extract results 3. Improper Prior Issues When using improper priors like p(α,β) ∝ (α+β)^(-5/2): Ensure parameters are constrained ( ) Verify posterior is proper (converges) Check for boundary behavior near zero 4. Command Syntax in Shell Avoid piping R output with shell redirection that causes parsing errors:

Problematic

Rscript -e "code" 2

&1 | grep .. .

Safer

Rscript
-e
"code"
>
output.txt
2
>
&1
grep
..
. output.txt
Verification Strategy
Stepwise Verification Approach
Environment
Verify RStan installation before model development
Compilation
Test Stan model compiles without errors
Sampling
Run with reduced iterations first to catch issues
Diagnostics
Check all convergence metrics before trusting results
Results
Extract posteriors only after validation passes Sanity Checks for Results Posterior means should be plausible given prior and data 95% credible intervals should have reasonable width Multiple chains should show similar posteriors Results should be consistent across different seeds (approximately) References For detailed Stan documentation and examples, consult: Stan User's Guide: Model specification and best practices RStan Getting Started: Installation and basic usage Stan Functions Reference: Available distributions and functions
返回排行榜