This tutorial demonstrates how to perform a Mendelian Randomization (MR) analysis using multiple data sources. We explore the causal relationship between C-reactive protein (CRP) and coronary heart disease (CHD), followed by a deeper look at interleukin-6 (IL-6), a potential upstream regulator of CRP. The results highlight the contrast between observational and MR-based evidence and suggest that IL-6 signalling may play a more direct causal role in CHD risk.
gwasvcf_to_TwoSampleMR
vcf_crp <- VariantAnnotation::readVcf("data/ieugwas/CRP_ebi-a-GCST005067_sub.vcf.gz")
crp_exposure <- gwasvcf_to_TwoSampleMR(vcf_crp, "exposure")
crp_exposure <- crp_exposure %>% filter(pval.exposure < 5*10^-8)
vcf_chd <- VariantAnnotation::readVcf("data/ieugwas/CHD_ebi-a-GCST000998_sub.vcf.gz")
chd_outcome <- gwasvcf_to_TwoSampleMR(vcf_chd, "outcome")
harm_data <- harmonise_data(exposure_dat = crp_exposure, outcome_dat = chd_outcome)
MRInstruments
vignette("MRBase")
data(gwas_catalog)
exposure_gwas <- subset(gwas_catalog, grepl("GCST007615", STUDY.ACCESSION) &
Phenotype_simple == "C-reactive protein levels")
exposure_gwas <- exposure_gwas[exposure_gwas$pval < 5*10^-8, ]
exposure_data <- format_data(exposure_gwas)
head(exposure_data)
## SNP chr.exposure gene.exposure effect_allele.exposure
## 1 rs10832027 11 ARNTL A
## 2 rs111269058 7 BCL7B T
## 3 rs11130206 3 BSN C
## 4 rs112635299 14 SERPINA1 T
## 5 rs11265263 1 CRP A
## 6 rs116971887 16 SALL1 T
## other_allele.exposure beta.exposure se.exposure pval.exposure units.exposure
## 1 G 0.023 0.004081633 3e-08 unit increase
## 2 C -0.054 0.006122449 3e-19 unit decrease
## 3 G -0.025 0.004081633 4e-10 unit decrease
## 4 G -0.107 0.016836735 2e-10 unit decrease
## 5 <NA> -0.273 0.007653061 1e-197 unit decrease
## 6 G -0.118 0.012244898 9e-25 unit decrease
## eaf.exposure exposure mr_keep.exposure
## 1 0.67 C-reactive protein levels (unit increase) TRUE
## 2 0.12 C-reactive protein levels (unit decrease) TRUE
## 3 0.60 C-reactive protein levels (unit decrease) TRUE
## 4 0.02 C-reactive protein levels (unit decrease) TRUE
## 5 0.07 C-reactive protein levels (unit decrease) FALSE
## 6 0.05 C-reactive protein levels (unit decrease) TRUE
## pval_origin.exposure units.exposure_dat id.exposure
## 1 reported unit increase IuPDPk
## 2 reported unit decrease 34CpJc
## 3 reported unit decrease 34CpJc
## 4 reported unit decrease 34CpJc
## 5 reported unit decrease 34CpJc
## 6 reported unit decrease 34CpJc
# The new update requires authentication through accsess token. More details at https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
# Check available outcomes
#ao <- available_outcomes()
#head(ao)
# Extract CHD outcome data
#chd_outcome_api <- extract_outcome_data(
# snps = exposure_data$SNP,
# outcomes = 'ieu-a-7'
#)
read_exposure_data
exposure <- read_exposure_data(
filename = "data/gwas_catalogue/CRP_gwas_catalogue.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
exposure$exposure <- "CRP"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)
outcome <- read_outcome_data(
snps = exposure$SNP,
filename = "data/gwas_catalogue/CHD_gwas_catalogue.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
outcome$outcome <- "CHD"
This part investigates the causal cascade of IL-6 signaling, C-reactive protein (CRP), and coronary heart disease (CHD) using Mendelian Randomization (MR). We’ll test whether genetic proxies for IL-6, IL6R, and CRP are causally associated with each other and with CHD.
Summary of the cascade:
dat <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
dplyr::tibble(rsid=dat$SNP, pval=dat$pval.exposure, id=dat$exposure),
clump_kb = 10000,
clump_r2 = 0.001,
clump_p = 0.99,
pop = "EUR",
opengwas_jwt = get_opengwas_jwt(),
bfile = NULL,
plink_bin = NULL
)
#the ld_clump function uses API to clump provided SNPs using LD information from 1kg data. If this fails to run then we can also use local plink binaries for LD reference. link: https://mrcieu.github.io/gwasglue/index.html
#clump <- ld_clump(
# dplyr::tibble(rsid=dat$SNP, pval=dat$pval.exposure, id=dat$exposure),
# pop = "EUR",
# plink_bin = genetics.binaRies::get_plink_binary(),
# bfile = "data/1kg.v3/EUR"
#)
clump <- readRDS("data/clump_crp.rds")
harm_clump <- merge(dat, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "id")) %>% unique()
res <- mr(harm_clump, method_list = c("mr_egger_regression", "mr_ivw"))
## Analysing 'CrGgMg' on 'DxhV47'
res
## id.exposure id.outcome outcome exposure method nsnp
## 1 CrGgMg DxhV47 CHD CRP MR Egger 223
## 2 CrGgMg DxhV47 CHD CRP Inverse variance weighted 223
## b se pval
## 1 -0.006549546 0.11068717 0.9528688
## 2 0.070479077 0.06541103 0.2812659
mr_heterogeneity(harm_clump)
## id.exposure id.outcome outcome exposure method Q
## 1 CrGgMg DxhV47 CHD CRP MR Egger 442.9316
## 2 CrGgMg DxhV47 CHD CRP Inverse variance weighted 444.4240
## Q_df Q_pval
## 1 221 5.527723e-17
## 2 222 5.384369e-17
mr_pleiotropy_test(harm_clump)
## id.exposure id.outcome outcome exposure egger_intercept se pval
## 1 CrGgMg DxhV47 CHD CRP 0.002345081 0.002717589 0.3891129
# Plots
p1 <- mr_scatter_plot(res, harm_clump)
p1
## $CrGgMg.DxhV47
##
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
## id.exposure id.outcome
## 1 CrGgMg DxhV47
res_single <- mr_singlesnp(harm_clump)
p2 <- mr_forest_plot(res_single)
p2
## $CrGgMg.DxhV47
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_errorbarh()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
##
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
## id.exposure id.outcome
## 1 CrGgMg DxhV47
res_loo <- mr_leaveoneout(harm_clump)
p3 <- mr_leaveoneout_plot(res_loo)
p3
## $CrGgMg.DxhV47
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_errorbarh()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
##
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
## id.exposure id.outcome
## 1 CrGgMg DxhV47
# Generate report
#mr_report(harm_clump, output_path = "figure/")
Exercise: - Do we observe a causal effect of CRP on CHD? - Are the MR assumptions violated based on pleiotropy and heterogeneity tests?
Let’s test whether IL-6 levels are causally associated with CHD risk.
exposure <- read_exposure_data(
filename = "data/gwas_catalogue/IL6-gwas_catalogue_sub.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
exposure$exposure <- "IL-6"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)
outcome <- read_outcome_data(
snps = exposure$SNP,
filename = "data/gwas_catalogue/CHD_gwas_catalogue.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
outcome$outcome <- "CHD"
il6_data <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
dplyr::tibble(rsid=il6_data$SNP, pval=il6_data$pval.exposure,exposure=il6_data$exposure),
clump_kb = 10000,
clump_r2 = 0.001,
clump_p = 0.99,
pop = "EUR",
opengwas_jwt = get_opengwas_jwt(),
bfile = NULL,
plink_bin = NULL
)
clump <- readRDS("data/il6_clump.rds")
harm_clump_il6 <- merge(il6_data, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "exposure")) %>% unique()
#filter for variant rs2228145
res_il6 <- mr_singlesnp(harm_clump_il6)
res_il6 <- res_il6[1,] %>%
mutate(
ci.lower = b - 1.96 * se,
ci.upper = b + 1.96 * se
)
p2 <- ggplot(res_il6, aes(x = b, y = SNP)) +
geom_point(size = 3, color = "steelblue") +
geom_errorbar(aes(xmin = ci.lower, xmax = ci.upper), width = 0.1, color = "steelblue") +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(
title = paste0("Causal Effect of ", res_il6$exposure, " on ", res_il6$outcome),
x = "Causal Estimate",
y = "SNP"
) +
theme_minimal(base_size = 14)
p2
Exercise: - Is IL-6 likely to be causally associated with CHD?
Let’s test whether IL6R levels influence CRP levels.
exposure <- read_exposure_data(
filename = "data/gwas_catalogue/IL6R-gwas_catalogue.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
exposure$exposure <- "IL6R"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)
outcome <- read_outcome_data(
snps = exposure$SNP,
filename = "data/gwas_catalogue/CRP_gwas_catalogue.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
outcome$outcome <- "CRP"
il6R_data <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
dplyr::tibble(rsid=il6_data$SNP, pval=il6_data$pval.exposure,exposure=il6_data$exposure),
clump_kb = 10000,
clump_r2 = 0.001,
clump_p = 0.99,
pop = "EUR",
opengwas_jwt = get_opengwas_jwt(),
bfile = NULL,
plink_bin = NULL
)
clump <- readRDS("data/il6R_crp_clump.rds")
harm_clump_il6R_crp <- merge(il6R_data, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "exposure")) %>% unique()
res_il6R_crp <- mr(harm_clump_il6R_crp, method_list = c("mr_egger_regression", "mr_ivw"))
res_il6R_crp
## id.exposure id.outcome outcome exposure method nsnp
## 1 lU5rYS Jd0U6J CRP IL6R MR Egger 6
## 2 lU5rYS Jd0U6J CRP IL6R Inverse variance weighted 6
## b se pval
## 1 -0.10293813 0.01899183 5.616537e-03
## 2 -0.07310965 0.01580093 3.711434e-06
mr_heterogeneity(harm_clump_il6R_crp)
## id.exposure id.outcome outcome exposure method Q
## 1 lU5rYS Jd0U6J CRP IL6R MR Egger 102.6186
## 2 lU5rYS Jd0U6J CRP IL6R Inverse variance weighted 211.6416
## Q_df Q_pval
## 1 4 2.724098e-21
## 2 5 9.161419e-44
mr_pleiotropy_test(harm_clump_il6R_crp)
## id.exposure id.outcome outcome exposure egger_intercept se pval
## 1 lU5rYS Jd0U6J CRP IL6R 0.01425723 0.00691607 0.1082725
p1_il6R <- mr_scatter_plot(res_il6R_crp, harm_clump_il6R_crp)
p1_il6R
## $lU5rYS.Jd0U6J
##
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
## id.exposure id.outcome
## 1 lU5rYS Jd0U6J
res_single_il6R <- mr_singlesnp(harm_clump_il6R_crp)
p2_il6R <- mr_forest_plot(res_single_il6R)
p2_il6R
## $lU5rYS.Jd0U6J
##
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
## id.exposure id.outcome
## 1 lU5rYS Jd0U6J
Exercise: - Is there evidence that IL6R variants causally affect CRP levels? - Does this support IL6R as an upstream regulator of inflammation?
Finally, let’s test whether IL6R levels have a causal effect on CHD.
exposure <- read_exposure_data(
filename = "data/gwas_catalogue/IL6R-gwas_catalogue.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
exposure$exposure <- "IL6R"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)
outcome <- read_outcome_data(
snps = exposure$SNP,
filename = "data/gwas_catalogue/CHD_gwas_catalogue.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
outcome$outcome <- "CHD"
il6R_data <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
dplyr::tibble(rsid=il6R_data$SNP, pval=il6R_data$pval.exposure,exposure=il6R_data$exposure),
clump_kb = 10000,
clump_r2 = 0.001,
clump_p = 0.99,
pop = "EUR",
opengwas_jwt = get_opengwas_jwt(),
bfile = NULL,
plink_bin = NULL
)
clump <- readRDS("data/clump_il6R_chd.rds")
harm_clump_il6R <- merge(il6R_data, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "exposure")) %>% unique()
res_il6R <- mr(harm_clump_il6R, method_list = c("mr_egger_regression", "mr_ivw"))
res_il6R
## id.exposure id.outcome outcome exposure method nsnp
## 1 vGH6MB mDojjJ CHD IL6R MR Egger 5
## 2 vGH6MB mDojjJ CHD IL6R Inverse variance weighted 5
## b se pval
## 1 -0.08190643 0.02769831 0.059683306
## 2 -0.05416385 0.01851339 0.003437292
mr_heterogeneity(harm_clump_il6R)
## id.exposure id.outcome outcome exposure method Q
## 1 vGH6MB mDojjJ CHD IL6R MR Egger 3.388334
## 2 vGH6MB mDojjJ CHD IL6R Inverse variance weighted 5.232818
## Q_df Q_pval
## 1 3 0.3355362
## 2 4 0.2642321
mr_pleiotropy_test(harm_clump_il6R)
## id.exposure id.outcome outcome exposure egger_intercept se pval
## 1 vGH6MB mDojjJ CHD IL6R 0.01619687 0.01267436 0.2911893
p1_il6R <- mr_scatter_plot(res_il6R, harm_clump_il6R)
p1_il6R
## $vGH6MB.mDojjJ
##
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
## id.exposure id.outcome
## 1 vGH6MB mDojjJ
res_single_il6R <- mr_singlesnp(harm_clump_il6R)
p2_il6R <- mr_forest_plot(res_single_il6R)
p2_il6R
## $vGH6MB.mDojjJ
##
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
## id.exposure id.outcome
## 1 vGH6MB mDojjJ
Exercise: - Do IL6R variants predict CHD risk? - Would this support therapeutic targeting of IL6R (e.g., using tocilizumab)?
exposure <- read_exposure_data(
filename = "data/gwas_catalogue/CHD_gwas_catalogue.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
exposure$exposure <- "CHD"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)
outcome <- read_outcome_data(
snps = exposure$SNP,
filename = "data/gwas_catalogue/IL6R-gwas_catalogue.tsv.gz",
sep = "\t",
snp_col = "variant_id",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
)
outcome$outcome <- "IL6-R"
dat <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
dplyr::tibble(rsid=dat$SNP, pval=dat$pval.exposure, id=dat$exposure),
pop = "EUR",
plink_bin = genetics.binaRies::get_plink_binary(),
bfile = "data/1kg.v3/EUR"
)
harm_clump <- merge(dat, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "id")) %>% unique()
res <- mr(harm_clump, method_list = c("mr_egger_regression", "mr_ivw"))
res
## id.exposure id.outcome outcome exposure method nsnp
## 1 FAYMRq aeSiyn IL6-R CHD MR Egger 15
## 2 FAYMRq aeSiyn IL6-R CHD Inverse variance weighted 15
## b se pval
## 1 0.09366950 0.14412990 0.5270770
## 2 0.08418497 0.04742827 0.0758986
p1 <- mr_scatter_plot(res, harm_clump)
p1
## $FAYMRq.aeSiyn
##
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
## id.exposure id.outcome
## 1 FAYMRq aeSiyn
mr_heterogeneity(harm_clump)
## id.exposure id.outcome outcome exposure method Q
## 1 FAYMRq aeSiyn IL6-R CHD MR Egger 16.00903
## 2 FAYMRq aeSiyn IL6-R CHD Inverse variance weighted 16.01507
## Q_df Q_pval
## 1 13 0.2486426
## 2 14 0.3124550
mr_pleiotropy_test(harm_clump)
## id.exposure id.outcome outcome exposure egger_intercept se pval
## 1 FAYMRq aeSiyn IL6-R CHD -0.001371454 0.0195887 0.9452493
These results suggest that targeting IL6R may influence CHD risk through inflammatory pathways without CRP being a causal intermediary. This has direct implications for drug repurposing and therapeutic targeting.
MR effect estimates are often reported as odds ratios (OR) with confidence intervals (CI).
# Assuming `mr_results` contains columns: method, b, se
mr_results_odds <- res_il6R %>%
mutate(
OR = exp(b),
CI_lower = exp(b - 1.96 * se),
CI_upper = exp(b + 1.96 * se)
)
# Plotting Odds Ratios with ggplot2
ggplot(mr_results_odds, aes(x = OR, y = method)) +
geom_point() +
geom_errorbar(aes(xmin = CI_lower, xmax = CI_upper), width = 0) +
geom_vline(xintercept = 1, linetype = "dashed") +
xlab("Odds Ratio") +
ylab("MR Method") +
theme_bw()
# This plot helps visualize the causal effect of LDL on CAD across different MR methods and whether the 95% confidence intervals exclude the null (OR = 1).
This tutorial demonstrated how to conduct MR using multiple data formats (VCF, TSV, API) and data harmonization pipelines. The CRP–CHD analysis showed no evidence of a causal effect, while IL-6—upstream of CRP—demonstrated a potential causal effect on CHD, supporting the hypothesis that IL-6 may be a better intervention target. Reverse causality check confirms that the uni-directional relationship between iL6 -> CHD.