Introduction

This tutorial demonstrates how to perform a Mendelian Randomization (MR) analysis using multiple data sources. We explore the causal relationship between C-reactive protein (CRP) and coronary heart disease (CHD), followed by a deeper look at interleukin-6 (IL-6), a potential upstream regulator of CRP. The results highlight the contrast between observational and MR-based evidence and suggest that IL-6 signalling may play a more direct causal role in CHD risk.

1. Accessing GWAS Data from Different Sources

1.1 From Local VCF Files Using gwasvcf_to_TwoSampleMR

vcf_crp <- VariantAnnotation::readVcf("data/ieugwas/CRP_ebi-a-GCST005067_sub.vcf.gz")
crp_exposure <- gwasvcf_to_TwoSampleMR(vcf_crp, "exposure")
crp_exposure <- crp_exposure %>% filter(pval.exposure < 5*10^-8)

vcf_chd <- VariantAnnotation::readVcf("data/ieugwas/CHD_ebi-a-GCST000998_sub.vcf.gz")
chd_outcome <- gwasvcf_to_TwoSampleMR(vcf_chd, "outcome")

harm_data <- harmonise_data(exposure_dat = crp_exposure, outcome_dat = chd_outcome)

1.2 From MRBase API Using MRInstruments

vignette("MRBase")
data(gwas_catalog)
exposure_gwas <- subset(gwas_catalog, grepl("GCST007615", STUDY.ACCESSION) &
                          Phenotype_simple == "C-reactive protein levels")
exposure_gwas <- exposure_gwas[exposure_gwas$pval < 5*10^-8, ]
exposure_data <- format_data(exposure_gwas)
head(exposure_data)
##           SNP chr.exposure gene.exposure effect_allele.exposure
## 1  rs10832027           11         ARNTL                      A
## 2 rs111269058            7         BCL7B                      T
## 3  rs11130206            3           BSN                      C
## 4 rs112635299           14      SERPINA1                      T
## 5  rs11265263            1           CRP                      A
## 6 rs116971887           16         SALL1                      T
##   other_allele.exposure beta.exposure se.exposure pval.exposure units.exposure
## 1                     G         0.023 0.004081633         3e-08  unit increase
## 2                     C        -0.054 0.006122449         3e-19  unit decrease
## 3                     G        -0.025 0.004081633         4e-10  unit decrease
## 4                     G        -0.107 0.016836735         2e-10  unit decrease
## 5                  <NA>        -0.273 0.007653061        1e-197  unit decrease
## 6                     G        -0.118 0.012244898         9e-25  unit decrease
##   eaf.exposure                                  exposure mr_keep.exposure
## 1         0.67 C-reactive protein levels (unit increase)             TRUE
## 2         0.12 C-reactive protein levels (unit decrease)             TRUE
## 3         0.60 C-reactive protein levels (unit decrease)             TRUE
## 4         0.02 C-reactive protein levels (unit decrease)             TRUE
## 5         0.07 C-reactive protein levels (unit decrease)            FALSE
## 6         0.05 C-reactive protein levels (unit decrease)             TRUE
##   pval_origin.exposure units.exposure_dat id.exposure
## 1             reported      unit increase      IuPDPk
## 2             reported      unit decrease      34CpJc
## 3             reported      unit decrease      34CpJc
## 4             reported      unit decrease      34CpJc
## 5             reported      unit decrease      34CpJc
## 6             reported      unit decrease      34CpJc
# The new update requires authentication through accsess token. More details at https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
# Check available outcomes
#ao <- available_outcomes()
#head(ao)

# Extract CHD outcome data
#chd_outcome_api <- extract_outcome_data(
#  snps = exposure_data$SNP,
#  outcomes = 'ieu-a-7'
#)

1.3 From Local TSV Files Using read_exposure_data

exposure <- read_exposure_data(
  filename = "data/gwas_catalogue/CRP_gwas_catalogue.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
exposure$exposure <- "CRP"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)

outcome <- read_outcome_data(
  snps = exposure$SNP,
  filename = "data/gwas_catalogue/CHD_gwas_catalogue.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
outcome$outcome <- "CHD"

Overview

This part investigates the causal cascade of IL-6 signaling, C-reactive protein (CRP), and coronary heart disease (CHD) using Mendelian Randomization (MR). We’ll test whether genetic proxies for IL-6, IL6R, and CRP are causally associated with each other and with CHD.

Summary of the cascade:

  1. IL6R → CRP: Evidence supports a strong causal effect, consistent with IL6R as an upstream driver of inflammation.
  2. CRP → CHD: Evidence does not support a causal effect, suggesting CRP may be a marker rather than a driver of CHD.
  3. IL-6 → CHD: Results suggest a potential causal role for IL-6 in CHD risk, but pleiotropy and heterogeneity must be considered.
  4. IL6R → CHD: Provides a druggable link; may support IL6R-targeting interventions like tocilizumab.

Harmonize and clump

dat <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
  dplyr::tibble(rsid=dat$SNP, pval=dat$pval.exposure, id=dat$exposure),
  clump_kb = 10000,
  clump_r2 = 0.001,
  clump_p = 0.99,
  pop = "EUR",
  opengwas_jwt = get_opengwas_jwt(),
  bfile = NULL,
  plink_bin = NULL
)
#the ld_clump function uses API to clump provided SNPs using LD information from 1kg data. If this fails to run then we can also use local plink binaries for LD reference. link: https://mrcieu.github.io/gwasglue/index.html
#clump <- ld_clump(
#  dplyr::tibble(rsid=dat$SNP, pval=dat$pval.exposure, id=dat$exposure),
#  pop = "EUR",
#  plink_bin = genetics.binaRies::get_plink_binary(),
#  bfile = "data/1kg.v3/EUR"
#)
clump <- readRDS("data/clump_crp.rds")
harm_clump <- merge(dat, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "id")) %>% unique()

1. MR Analysis of CRP on CHD

Let’s test whether CRP levels have a causal effect on coronary heart disease.

res <- mr(harm_clump, method_list = c("mr_egger_regression", "mr_ivw"))
## Analysing 'CrGgMg' on 'DxhV47'
res
##   id.exposure id.outcome outcome exposure                    method nsnp
## 1      CrGgMg     DxhV47     CHD      CRP                  MR Egger  223
## 2      CrGgMg     DxhV47     CHD      CRP Inverse variance weighted  223
##              b         se      pval
## 1 -0.006549546 0.11068717 0.9528688
## 2  0.070479077 0.06541103 0.2812659
mr_heterogeneity(harm_clump)
##   id.exposure id.outcome outcome exposure                    method        Q
## 1      CrGgMg     DxhV47     CHD      CRP                  MR Egger 442.9316
## 2      CrGgMg     DxhV47     CHD      CRP Inverse variance weighted 444.4240
##   Q_df       Q_pval
## 1  221 5.527723e-17
## 2  222 5.384369e-17
mr_pleiotropy_test(harm_clump)
##   id.exposure id.outcome outcome exposure egger_intercept          se      pval
## 1      CrGgMg     DxhV47     CHD      CRP     0.002345081 0.002717589 0.3891129
# Plots
p1 <- mr_scatter_plot(res, harm_clump)
p1
## $CrGgMg.DxhV47

## 
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
##   id.exposure id.outcome
## 1      CrGgMg     DxhV47
res_single <- mr_singlesnp(harm_clump)
p2 <- mr_forest_plot(res_single)
p2
## $CrGgMg.DxhV47
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_errorbarh()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## 
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
##   id.exposure id.outcome
## 1      CrGgMg     DxhV47
res_loo <- mr_leaveoneout(harm_clump)
p3 <- mr_leaveoneout_plot(res_loo)
p3
## $CrGgMg.DxhV47
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_errorbarh()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## 
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
##   id.exposure id.outcome
## 1      CrGgMg     DxhV47
# Generate report
#mr_report(harm_clump, output_path = "figure/")

Exercise: - Do we observe a causal effect of CRP on CHD? - Are the MR assumptions violated based on pleiotropy and heterogeneity tests?

2. MR Analysis of IL-6 on CHD

Let’s test whether IL-6 levels are causally associated with CHD risk.

exposure <- read_exposure_data(
  filename = "data/gwas_catalogue/IL6-gwas_catalogue_sub.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
exposure$exposure <- "IL-6"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)

outcome <- read_outcome_data(
  snps = exposure$SNP,
  filename = "data/gwas_catalogue/CHD_gwas_catalogue.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
outcome$outcome <- "CHD"

il6_data <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
  dplyr::tibble(rsid=il6_data$SNP, pval=il6_data$pval.exposure,exposure=il6_data$exposure),
  clump_kb = 10000,
  clump_r2 = 0.001,
  clump_p = 0.99,
  pop = "EUR",
  opengwas_jwt = get_opengwas_jwt(),
  bfile = NULL,
  plink_bin = NULL
)
clump <- readRDS("data/il6_clump.rds")
harm_clump_il6 <- merge(il6_data, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "exposure")) %>% unique()
#filter for variant rs2228145
res_il6 <- mr_singlesnp(harm_clump_il6)
res_il6 <- res_il6[1,] %>%
  mutate(
    ci.lower = b - 1.96 * se,
    ci.upper = b + 1.96 * se
  )
p2 <- ggplot(res_il6, aes(x = b, y = SNP)) +
  geom_point(size = 3, color = "steelblue") +
  geom_errorbar(aes(xmin = ci.lower, xmax = ci.upper), width = 0.1, color = "steelblue") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = paste0("Causal Effect of ", res_il6$exposure, " on ", res_il6$outcome),
    x = "Causal Estimate",
    y = "SNP"
  ) +
  theme_minimal(base_size = 14)
p2

Exercise: - Is IL-6 likely to be causally associated with CHD?

3. MR Analysis of IL6R on CRP

Let’s test whether IL6R levels influence CRP levels.

exposure <- read_exposure_data(
  filename = "data/gwas_catalogue/IL6R-gwas_catalogue.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
exposure$exposure <- "IL6R"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)

outcome <- read_outcome_data(
  snps = exposure$SNP,
  filename = "data/gwas_catalogue/CRP_gwas_catalogue.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
outcome$outcome <- "CRP"

il6R_data <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
  dplyr::tibble(rsid=il6_data$SNP, pval=il6_data$pval.exposure,exposure=il6_data$exposure),
  clump_kb = 10000,
  clump_r2 = 0.001,
  clump_p = 0.99,
  pop = "EUR",
  opengwas_jwt = get_opengwas_jwt(),
  bfile = NULL,
  plink_bin = NULL
)
clump <- readRDS("data/il6R_crp_clump.rds")
harm_clump_il6R_crp <- merge(il6R_data, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "exposure")) %>% unique()

res_il6R_crp <- mr(harm_clump_il6R_crp, method_list = c("mr_egger_regression", "mr_ivw"))
res_il6R_crp
##   id.exposure id.outcome outcome exposure                    method nsnp
## 1      lU5rYS     Jd0U6J     CRP     IL6R                  MR Egger    6
## 2      lU5rYS     Jd0U6J     CRP     IL6R Inverse variance weighted    6
##             b         se         pval
## 1 -0.10293813 0.01899183 5.616537e-03
## 2 -0.07310965 0.01580093 3.711434e-06
mr_heterogeneity(harm_clump_il6R_crp)
##   id.exposure id.outcome outcome exposure                    method        Q
## 1      lU5rYS     Jd0U6J     CRP     IL6R                  MR Egger 102.6186
## 2      lU5rYS     Jd0U6J     CRP     IL6R Inverse variance weighted 211.6416
##   Q_df       Q_pval
## 1    4 2.724098e-21
## 2    5 9.161419e-44
mr_pleiotropy_test(harm_clump_il6R_crp)
##   id.exposure id.outcome outcome exposure egger_intercept         se      pval
## 1      lU5rYS     Jd0U6J     CRP     IL6R      0.01425723 0.00691607 0.1082725
p1_il6R <- mr_scatter_plot(res_il6R_crp, harm_clump_il6R_crp)
p1_il6R
## $lU5rYS.Jd0U6J

## 
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
##   id.exposure id.outcome
## 1      lU5rYS     Jd0U6J
res_single_il6R <- mr_singlesnp(harm_clump_il6R_crp)
p2_il6R <- mr_forest_plot(res_single_il6R)
p2_il6R
## $lU5rYS.Jd0U6J

## 
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
##   id.exposure id.outcome
## 1      lU5rYS     Jd0U6J

Exercise: - Is there evidence that IL6R variants causally affect CRP levels? - Does this support IL6R as an upstream regulator of inflammation?

4. MR Analysis of IL6R on CHD

Finally, let’s test whether IL6R levels have a causal effect on CHD.

exposure <- read_exposure_data(
  filename = "data/gwas_catalogue/IL6R-gwas_catalogue.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
exposure$exposure <- "IL6R"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)

outcome <- read_outcome_data(
  snps = exposure$SNP,
  filename = "data/gwas_catalogue/CHD_gwas_catalogue.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
outcome$outcome <- "CHD"

il6R_data <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
  dplyr::tibble(rsid=il6R_data$SNP, pval=il6R_data$pval.exposure,exposure=il6R_data$exposure),
  clump_kb = 10000,
  clump_r2 = 0.001,
  clump_p = 0.99,
  pop = "EUR",
  opengwas_jwt = get_opengwas_jwt(),
  bfile = NULL,
  plink_bin = NULL
)
clump <- readRDS("data/clump_il6R_chd.rds")
harm_clump_il6R <- merge(il6R_data, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "exposure")) %>% unique()

res_il6R <- mr(harm_clump_il6R, method_list = c("mr_egger_regression", "mr_ivw"))
res_il6R
##   id.exposure id.outcome outcome exposure                    method nsnp
## 1      vGH6MB     mDojjJ     CHD     IL6R                  MR Egger    5
## 2      vGH6MB     mDojjJ     CHD     IL6R Inverse variance weighted    5
##             b         se        pval
## 1 -0.08190643 0.02769831 0.059683306
## 2 -0.05416385 0.01851339 0.003437292
mr_heterogeneity(harm_clump_il6R)
##   id.exposure id.outcome outcome exposure                    method        Q
## 1      vGH6MB     mDojjJ     CHD     IL6R                  MR Egger 3.388334
## 2      vGH6MB     mDojjJ     CHD     IL6R Inverse variance weighted 5.232818
##   Q_df    Q_pval
## 1    3 0.3355362
## 2    4 0.2642321
mr_pleiotropy_test(harm_clump_il6R)
##   id.exposure id.outcome outcome exposure egger_intercept         se      pval
## 1      vGH6MB     mDojjJ     CHD     IL6R      0.01619687 0.01267436 0.2911893
p1_il6R <- mr_scatter_plot(res_il6R, harm_clump_il6R)
p1_il6R
## $vGH6MB.mDojjJ

## 
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
##   id.exposure id.outcome
## 1      vGH6MB     mDojjJ
res_single_il6R <- mr_singlesnp(harm_clump_il6R)
p2_il6R <- mr_forest_plot(res_single_il6R)
p2_il6R
## $vGH6MB.mDojjJ

## 
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
##   id.exposure id.outcome
## 1      vGH6MB     mDojjJ

Exercise: - Do IL6R variants predict CHD risk? - Would this support therapeutic targeting of IL6R (e.g., using tocilizumab)?

5: Reverse Causality: CHD -> IL6R

exposure <- read_exposure_data(
  filename = "data/gwas_catalogue/CHD_gwas_catalogue.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
exposure$exposure <- "CHD"
exposure <- exposure %>% filter(pval.exposure < 5*10^-8)

outcome <- read_outcome_data(
  snps = exposure$SNP,
  filename = "data/gwas_catalogue/IL6R-gwas_catalogue.tsv.gz",
  sep = "\t",
  snp_col = "variant_id",
  beta_col = "beta",
  se_col = "standard_error",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  eaf_col = "effect_allele_frequency",
  pval_col = "p_value"
)
outcome$outcome <- "IL6-R"
dat <- harmonise_data(exposure_dat = exposure, outcome_dat = outcome)
clump <- ld_clump(
  dplyr::tibble(rsid=dat$SNP, pval=dat$pval.exposure, id=dat$exposure),
  pop = "EUR",
  plink_bin = genetics.binaRies::get_plink_binary(),
  bfile = "data/1kg.v3/EUR"
)

harm_clump <- merge(dat, clump, by.x = c("SNP", "pval.exposure", "exposure"), by.y = c("rsid", "pval", "id")) %>% unique()
res <- mr(harm_clump, method_list = c("mr_egger_regression", "mr_ivw"))
res
##   id.exposure id.outcome outcome exposure                    method nsnp
## 1      FAYMRq     aeSiyn   IL6-R      CHD                  MR Egger   15
## 2      FAYMRq     aeSiyn   IL6-R      CHD Inverse variance weighted   15
##            b         se      pval
## 1 0.09366950 0.14412990 0.5270770
## 2 0.08418497 0.04742827 0.0758986
p1 <- mr_scatter_plot(res, harm_clump)
p1
## $FAYMRq.aeSiyn

## 
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
##   id.exposure id.outcome
## 1      FAYMRq     aeSiyn
mr_heterogeneity(harm_clump)
##   id.exposure id.outcome outcome exposure                    method        Q
## 1      FAYMRq     aeSiyn   IL6-R      CHD                  MR Egger 16.00903
## 2      FAYMRq     aeSiyn   IL6-R      CHD Inverse variance weighted 16.01507
##   Q_df    Q_pval
## 1   13 0.2486426
## 2   14 0.3124550
mr_pleiotropy_test(harm_clump)
##   id.exposure id.outcome outcome exposure egger_intercept        se      pval
## 1      FAYMRq     aeSiyn   IL6-R      CHD    -0.001371454 0.0195887 0.9452493

These results suggest that targeting IL6R may influence CHD risk through inflammatory pathways without CRP being a causal intermediary. This has direct implications for drug repurposing and therapeutic targeting.

Add-on: Reporting MR Estimates as Odds Ratios

MR effect estimates are often reported as odds ratios (OR) with confidence intervals (CI).

# Assuming `mr_results` contains columns: method, b, se

mr_results_odds <- res_il6R %>%
  mutate(
    OR = exp(b),
    CI_lower = exp(b - 1.96 * se),
    CI_upper = exp(b + 1.96 * se)
  )

# Plotting Odds Ratios with ggplot2
ggplot(mr_results_odds, aes(x = OR, y = method)) +
  geom_point() +
  geom_errorbar(aes(xmin = CI_lower, xmax = CI_upper), width = 0) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  xlab("Odds Ratio") +
  ylab("MR Method") +
  theme_bw()

# This plot helps visualize the causal effect of LDL on CAD across different MR methods and whether the 95% confidence intervals exclude the null (OR = 1).

Conclusion

This tutorial demonstrated how to conduct MR using multiple data formats (VCF, TSV, API) and data harmonization pipelines. The CRP–CHD analysis showed no evidence of a causal effect, while IL-6—upstream of CRP—demonstrated a potential causal effect on CHD, supporting the hypothesis that IL-6 may be a better intervention target. Reverse causality check confirms that the uni-directional relationship between iL6 -> CHD.