Study design and data sources
We conducted a two-sample Mendelian randomization (MR) analysis to assess the causal associations of sex hormones with COVID-19 susceptibility and severity. MR is a causal inference approach, which uses germline genetic variants as instrumental variables (IVs) to estimate possible causal effects of modifiable risk factors on health outcomes. This approach is less prone to non-genetic confounding and reverse causation bias 15,16.
We used data from the UKB and COVID-19 HGI. Summary statistics on sex hormones levels (including estradiol, total testosterone (TT), bioavailable testosterone (BT), and sex hormone binding globulin (SHBG)) were obtained from the largest genome-wide association studies (GWASs) of sex hormones 17,18, in up to 230,454 women and 194,453 men of European ancestry in the UKB.
In the estradiol GWAS, individuals’ estradiol levels were analyzed as a binary phenotype, with values equal to or above the detection limit (175 pmol/L) considered as one group, and values below the limit as another group 18. Moreover, for quantitative analysis, individuals with estradiol levels below the detection limit were included by using censored regression modeling with a Tobit type I technique 19. This approach allowed analyzing estradiol levels as a continuous phenotype in a total of 163,985 women and 147,690 men 18.
Testosterone and SHBG levels were measured and analyzed as continuous phenotypes. In the original GWAS of SHBG levels, body mass index (BMI) was unadjusted and adjusted for, in order to assess the potential impact of collider bias 20. In this study, we took potential collider bias into account by using summary data from GWAS of SHBG levels, where BMI was unadjusted and adjusted for, to estimate the causal effects of genetically predicted SHBG on COVID-19 susceptibility and severity, respectively.
For the outcomes in this study (i.e., COVID-19 susceptibility and severity), summary statistics were obtained from the latest and largest GWAS of COVID-19 outcomes in European ancestry conducted by HGI with data freeze 6 (excluding UKB and 23andMe participants)21. Three COVID-19 related phenotypes were selected as the outcomes: (1) severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection (as cases) and the general population (as controls) (74,614 cases and 1,803,529 controls); (2) COVID-19 hospitalization (as cases) and the general population (as controls) (14,925 cases and 1,393,029 controls); and (3) COVID-19 critical illness (as cases) and the general population (as controls) (4,297 cases and 378,521 controls) (Table 1 ). Due to a lack of European ancestry GWAS of COVID-19 critical illness in data freeze 6, summary statistics on this outcome were obtained from GWAS data freeze 5 instead. This study used publicly available data and was not subject to institutional review board approval.