Identifying and Classifying Causal Biochemical Biomarkers for Breast Cancer: A Mendelian Randomization Study | BMC Medicine

Analysis plan

Our prospective plan was to perform a series of two-sample univariable MR analyzes (UVMR) to examine the associations of each of the UKB biochemical biomarkers with the overall responsibility of breast cancer, ER-positive and ER-negative. After our UVMR analyzes showed significant associations, we performed additional multivariable MR analyzes (MVMR) to adjust for known risk factors and bidirectional analyses. We finally ranked our nominally significant biomarkers by importance using a multivariate Bayesian approach [4]. Our analysis follows the guidelines for performing MRI investigations [5] and our reporting follows guidelines to strengthen reporting of Mendelian Randomization Studies (STROBE-MR) (Additional File 2: Checklist S1) [6]. We have not pre-registered the study protocol.

Studying populations

Our study used summary-level exposure data from the UKB study [7] and summary-level outcome data from the Breast Cancer Association Consortium (BCAC) [8]. The BCAC includes ~6000 samples from the UK [8]which equates, at most, to a sample overlap of approximately 1.4% between exposure and result samples. Our data includes only women of European descent to reduce bias due to population stratification.

Exhibition dates

We obtained publicly available genome-wide association study summary-level (GWAS) statistics on 34 levels of serum, urine, and red blood cell biomarkers; body mass index (BMI); and frequency of alcohol intake by unrelated female participants of White British ancestry (no = 194,174) in the UKB cohort study by Neale et al. [9]. Genotypes and levels of 34 biomarkers were collected from the baseline UKB study between 2006 and 2010 using various techniques and laboratory instruments from different vendors [7, 10]. GWAS were performed using age, age^2, and top 20 principal components (PC) as covariates [11]. Reverse rank normalized GWAS data were used because many of the quantitative biomarker traits were non-normally distributed. Most of the women (at least 59%) in the UKB cohort were postmenopausal [12]. More information on the UKB Biomarker Panel and the original UKB study can be found elsewhere [3, 7].

Outcome dates

Publicly available GWAS summary statistics on overall breast cancer cases (no = 122,977) and controls (no = 105.974) or European ancestry were obtained from the BCAC [13]. Of the breast cancer cases, 69,501 were ER-positive, 21,468 were ER-negative, and most developed postmenopausal. More details on the original studies are described elsewhere [8, 14, 15].

Statistic analysis

Selection of instrumental variables

For each exposure, we screened for associated single nucleotide polymorphisms (SNPs) at the genome-wide significance level (P <5×10-8) and ensured their independence by removing those in linkage disequilibrium using the PLINK method (r
2 < 0.001, aggregation distance = 10,000 kb). We therefore harmonized the directions of effect alleles between exposures and outcomes.

In all of our MR analyses, SNPs must meet three assumptions to be considered valid IVs. Genetic variants must (1) be strongly associated with exposure (the relevance hypothesis), (2) be independent of confounders (the independence hypothesis), and (3) affect outcome only through their effect on exposure (the exclusion restriction hypothesis).

Univariable analysis

The main univariate analysis consisted of a weighted inverse variance (IVW) MR between each exposure and each outcome. The IVW method first estimates the Wald ratio for each SNP by dividing the SNP-outcome association by the SNP-exposure association and then combines these ratios into a fixed-effect meta-analysis in which each ratio is weighted by the inverse of the variance of the SNP association to outcome [16]. We have used P < 0.05 as nominal significance threshold. We also derived the adjusted false discovery rate (FDR). P-values ​​with the Benjamini-Hochberg method (BH) and used P < 0.05 as the FDR-corrected significance threshold. For exposures for which only 1 IV could be identified, the Wald ratio was estimated [17]. Our results are reported as odds ratio (OR) per standard deviation (SD) change in genetically predicted biomarker concentration.

A common violation of the IV bypass restriction condition is caused by horizontal pleiotropy, where a genetic variant has an effect on outcome that does not occur through exposure [18]. Therefore, we used several additional univariable approaches with different underlying assumptions about the pleiotropy structure for all exposures, including the MR-Egger [19]weighted median [20]and MR Pleiotropy RESidual Sum and Outlier (MR-PRESSO) [21]. The MR-Egger allows for some directional pleiotropy in its causal effect estimation by making the additional Instrument Strength Independent of Direct Effect (InSIDE) assumption, which states that in all instruments, the magnitude of the pleiotropic effect is independent of the strength of the association of exposure to genetic variant [19]. Weighted median allows for sparse or balanced pleiotropy by reducing outliers [20]. The MR-PRESSO method allows for some directional pleiotropy by identifying and adjusting for outliers [21].

Sensitivity analysis

We tested the robustness of our univariable results by running MVMR [22, 23] and bidirectional MRI. MVMR was used to adjust for previously reported risk factors, while bidirectional MRI was employed to rule out potential reverse causes.

We performed two-sample MVMR analyzes for all seven biomarkers that were nominally significantly associated with overall breast cancer in IVW MR. We searched for associations in P < 10-8 or all variants used as an IV in Phenoscanner [24, 25] (Additional File 3: T1-T7), a database providing summarized and adjusted GWAS for traits that could be considered patterns of horizontal pleiotropy. MVMR assumes that pleiotropic pathways operate through the risk factors included in the model [18]. For all MVMR analyses, we included SNPs that were significantly genome-wide associated (P <5×10-8) with any exposure or risk factor taken into account in an MVMR model and not in linkage disequilibrium (r
2 < 0.001, aggregation distance = 10,000 kb).

Since lipids are related [26]we included HDL cholesterol, low-density lipoprotein (LDL) cholesterol, triglycerides, and lipoprotein A into the MVMR models to observe the direct associations of each lipid with each outcome.

As a body mass index [27] and alcohol intake [28] are associated with breast cancer risk, we included body mass index and frequency of alcohol intake in the MVMR models for each of the seven biomarkers that we found nominally significantly associated with overall breast cancer in the IVW MR.

Because estrogen reduces the expression and activity of alkaline phosphatase (ALP) in breast cancer cells [29] and we could not get enough genetic variants for estradiol, we adjusted for testosterone and SHBG in an MVMR model with ALP.

After adjustment for BMI in MVMR, significant associations were found between SHBG and breast cancer risk [28]Like this [30] we included BMI and SHBG in the MVMR models.

Because of the low prior probability of association between ALP and breast cancer, we performed a two-way univariate MR analysis of the liability levels of general, ER-positive and ER-negative breast cancer, and genetically predicted ALP.

Exposure rankings

We used MR Bayesian model mean (MR-BMA) to agnostically rank the causal importance of the seven biomarkers found nominally significantly associated with overall breast cancer in IVW MR taking into account potential pleiotropy [4]. Empirical P-Values ​​were calculated using a permutation approach [31] and adjusted for multiple testing using the BH method with P < 0.05 as the threshold of significance. all independent (r
2 <0.001) Genetic variants associated with any of the biomarkers at the genome-wide significance level (no = 460).

We used MR-BMA to consider each biomarker combination (all single biomarkers, all biomarker pairs, all triplets, etc.) as a candidate model in an MVMR analysis using weighted regression. Each candidate model was assigned a posterior probability (PP) expressing the probability that the candidate model contains the true set of causal biomarkers using the regression goodness-of-fit measure.

Then, we used MR-BMA to model averaging to assign each biomarker a marginal inclusion probability (MIP) and report the model mean causal effect (MACE) of each biomarker on overall breast cancer. The MIP represents the probability that the biomarker is a causal determinant of breast cancer risk and the MACE represents the weighted average direct causal effect of the biomarker on risk across all candidate models. The MIP was calculated by summing the posterior probabilities of all candidate models in which the biomarker is present. MACE underestimates the true causal effect of a biomarker on breast cancer in general and should not be interpreted in absolute terms, but as an indication of the direction of effect and for comparing relative causal effects between biomarkers.

We used 0.5 as the prior probability for inclusion in the main analysis, which reflected a prior belief that half of the candidate models or half of the nominally significant biomarkers were causal, and priors of 0.25 and 0.75 as sensitivity analysis.


We employed TwoSampleMR [31]Mendelian randomization [32]MRPRESSO [33]and eugwasr [34] R packages, as well as the GitHub repository for MR-BMA for our analyzes using R (version 4.0.5). We searched for subtrait associations using Phenoscanner [24, 25].

Leave a Comment

%d bloggers like this: