-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8acdf44
commit dab8c4b
Showing
21 changed files
with
1,432 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,175 @@ | ||
--- | ||
title: "Example analysis" | ||
author: "Yingying Yu" | ||
date: "2023-11-1" | ||
--- | ||
|
||
# Introduction | ||
|
||
![](menopause.png){style="float:right;" width="170"} Menopause is a natural biological event triggered by the decline in ovarian follicular function and a reduction in circulating blood estrogen levels. According to the World Health Organization (WHO), the majority of women experience menopause in their forties and fifties, typically defined as twelve consecutive months without menstruation, with no other underlying cause. Notably, during pregnancy, women do not have their regular menstrual period for approximately nine months due to the profound impact on ovarian activity. | ||
|
||
------------------------------------------------------------------------ | ||
|
||
# Research Objective | ||
|
||
The objective of this study is to investigate whether the age at which natural menopause occurs is influenced by the number of childbirths among American women. This research utilizes data from the 2017-18 National Health and Nutrition Examination Survey (NHANES). Our hypothesis posits that an increased age at menopause is associated with a higher number of childbirths. | ||
|
||
------------------------------------------------------------------------ | ||
|
||
# Methodology | ||
|
||
### Study population | ||
|
||
Performed by the National Center for Health Statistics (NCHS) within the Centers for Disease Control and Prevention (CDC), the NHANES is a program of cross-sectional studies collecting data regarding the health and nutritional status of Americans. This study is an analysis of the 2017-2018 data for female participants located in different counties across the nation, including demographics, socioeconomic, body measurements, reproductive health-related information gathered by interviews and physical examinations. The dataset can be download [here](https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017). | ||
|
||
### Measures | ||
|
||
The outcome of interest in this study is the age at natural menopause of a participant. Age at menopause was obtained from the self-reported question: 'About how old were you when you had your last menstrual period?' In this study, we defined age at menopause as the age in years when the woman had her last menstrual period. | ||
|
||
The primary covariate in this analysis is the number of childbirths. The number of childbirths was obtained from the self-reported questions: 'How many of your deliveries resulted in a live birth?' In this study, we used number of childbirths as a continuous variable (0--10 childbirths). | ||
|
||
After reviewing previous literatures, the following factors are included as additional covariates to address for potential confounding. Age in years was analyzed as continuous variable. Smoking status was dichotomized as "never" and "ever". Alcohol use was dichotomized as "never" and "ever". Educational level was categorized as "Less than high school diploma", "High school diploma", "College" and "College graduate or above". Body mass index was analyzed as continues variable measured in kg/m2. | ||
|
||
### Statistical Analysis | ||
|
||
Data of the outcome variable and covariates will be extracted from five sub-data files and combined into a single analytic dataset prior to analysis. | ||
|
||
```{r} | ||
# Load data | ||
library("foreign") | ||
BodyMeasure <- read.xport("data/BMX_J.XPT") | ||
Demographic <- read.xport("data/DEMO_J.XPT") | ||
Alcohol <- read.xport("data/ALQ_J.XPT") | ||
Smoking <- read.xport("data/SMQ_J.XPT") | ||
Reproductive <- read.xport("data/RHQ_J.XPT") | ||
BMI <- BodyMeasure[,c('SEQN','BMXBMI')] | ||
Demo <- Demographic[, c('SEQN','RIDAGEYR','RIDRETH3','DMDEDUC2')] | ||
AlcUse <- Alcohol[,c('SEQN','ALQ111')] | ||
SmoUse <- Smoking[,c('SEQN','SMQ020')] | ||
FeHealth <- Reproductive[,c('SEQN','RHQ060','RHQ171','RHQ420')] | ||
# Merge data | ||
data <- merge(BMI, Demo, by = "SEQN", all.x = T, all.y = F) | ||
data <- merge(AlcUse, data, by = "SEQN", all.x = T, all.y = F) | ||
data <- merge(SmoUse, data, by = "SEQN", all.x = F, all.y = T) | ||
data <- merge(FeHealth, data, by = "SEQN", all.x = T, all.y = F) | ||
# Change column names | ||
colnames(data) <- c('ID','age_meno','livebirths','pills','smoke','drink','bmi', | ||
'age_sur','race','education') | ||
head(data,5) | ||
``` | ||
|
||
```{r} | ||
# drop NA and weird values | ||
library("tidyr") | ||
data <- drop_na(data) | ||
data <- data[data$age_meno != 999, ] # exclude those dont know when is their last period | ||
data <- data[data$pills != 9, ] # exclude those dont know whether they ever taken birth control pills | ||
data <- data[data$education != 9, ] # exclude those dont know their education level | ||
head(data,5) | ||
``` | ||
|
||
```{r} | ||
# Make categorical variables | ||
data$race <- factor(data$race, labels = c("Mexican American","Other Hispanic","White","Black","Asian","Multi-Racial")) %>% relevel(data$race, ref = "Black") | ||
data$drink <- factor(data$drink, labels = c("Ever", "Never")) | ||
data$smoke <- factor(data$smoke, labels = c("Ever", "Never")) | ||
data$pills <- factor(data$pills, labels = c("Ever", "Never")) | ||
data$education[data$education == 1] <- 2 | ||
data$education <- factor(data$education, labels = c("Less Than High School Diploma", | ||
"High School Diploma","College","College Graduate Or Above")) | ||
data$lbgroup <- data$livebirths | ||
data$lbgroup[data$livebirths <= 2] <- "<=2" | ||
data$lbgroup[data$livebirths > 2 & data$livebirths < 6 ] <- "3-5" | ||
data$lbgroup[data$livebirths >= 6] <- ">=6" | ||
data$lbgroup <- factor(data$lbgroup, levels=c("<=2","3-5",">=6")) | ||
``` | ||
|
||
# Results | ||
|
||
### Participant Characteristics | ||
|
||
The mean age of participants the survey was done is 62.7 (SD = 11.3) years old and ranged from 22 to 80. Of the 1189 female participants, most of them are non-Hispanic White (38.7%), have some college experience (33.3%), drink alcohol at some point in their life (85.4%), and never smoke (64.1%). Table 1 below presents specific breakdowns of each variable considered by this study. | ||
|
||
```{r} | ||
#| warning: false | ||
# Plot distribution of age | ||
library(ggplot2) | ||
ggplot(data=data) + | ||
geom_bar(mapping=aes(x=age_sur)) + | ||
geom_vline(xintercept = mean(data$age_sur), color = "red", linetype = "dashed", size = 1) + | ||
labs(title = "Distribution of Participant Age at survey ", x = "Age", y = "Frequency") | ||
``` | ||
|
||
::: callout-note | ||
Note that the frequency of 80 years old is extremely high because it includes 80 years of age and over. | ||
::: | ||
|
||
```{r} | ||
#| warning: false | ||
ggplot(data = data) + | ||
geom_point(mapping = aes(x = age_meno, y = livebirths)) + | ||
facet_wrap(~ race, nrow = 2) + | ||
labs(title = "Distribution of Menopausal Age and Livebirths by Race/Ethnicity", | ||
x = "Menopausal Age", y = "Number of Livebirths") | ||
``` | ||
|
||
```{r} | ||
#| warning: false | ||
ggplot(data = data) + | ||
geom_smooth(mapping = aes(x = bmi, y = age_meno)) + | ||
labs(title = "Distribution of BMI and Menopausal Age", x = "BMI", | ||
y = "Menopausal Age") | ||
``` | ||
|
||
```{r} | ||
#| warning: false | ||
library("table1") | ||
label(data$age_meno) <- "Menopause Age" | ||
label(data$livebirths) <- "Number of Childbirths" | ||
label(data$age_sur) <- "Age" | ||
label(data$race) <- "Race" | ||
label(data$education) <- "Education" | ||
label(data$drink) <- "Alcohol use" | ||
label(data$smoke) <- "Smoking status" | ||
label(data$pills) <- "Oral contraceptive use" | ||
label(data$bmi) <- "BMI" | ||
units(data$bmi) <- "kg/m2" | ||
caption <- "Table 1. Characteristics of the study population according to number of livebirths" | ||
footnote <- "Note: Continuous variables were displayed as mean (SD) and categorical variables were displayed as number (percentage)." | ||
mytable <- table1(~ age_sur + race + education + drink + smoke + pills + bmi | lbgroup, data=data, footnote=footnote, caption=caption, overall=c(left="Total")) | ||
mytable | ||
``` | ||
|
||
|
||
### Regression Analysis | ||
|
||
Univariate and multivariate analyses were performed with simple and multiple linear regression, respectively. In univariate analysis, “umber of childbirth”, “age at survey”, and “smoking status” show statistically significance with our outcome variable. For race, only the subgroups “other Hispanic” and “Asian” are significant relative to the reference group. For education level, “high school diploma” and “college graduate or above” rose to the level of statistical significance. | ||
|
||
This association continued to appear after adjustment in the full model. After comparing between different combinations of covariates, the model with the lowest AIC score were picked as our final adjusted model shown in Table 2 below. In multivariable linear regression analyses, the number of childbirth was associated with the age of menopause, in which each additional childbirth is associated with a 0.34 (95% CI: 0.02, 0.66) increase of the menopausal age, while holding all other covariates constant. Future research will be needed to validify this association and explore the underlining biological mechanism. | ||
|
||
```{r} | ||
#| warning: false | ||
Slin1 = lm(age_meno ~ livebirths, data = data) | ||
Mlin3 = lm(age_meno ~ livebirths + age_sur + race + education + pills + smoke, data = data) | ||
summary(Mlin3) | ||
AIC(Mlin3) | ||
library(gtsummary) | ||
t1 <- tbl_regression(Slin1)%>%add_global_p() | ||
t2 <- tbl_regression(Mlin3)%>%add_global_p() | ||
tbl_merge <- tbl_merge(tbls = list(t1,t2), | ||
tab_spanner = c("**Unadjusted**","**Adjusted**")) | ||
tbl_merge | ||
``` | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,7 +9,6 @@ website: | |
- href: index.qmd | ||
text: Home | ||
- href: about.qmd | ||
text: Who am I | ||
- Example analysis.qmd | ||
|
||
format: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Oops, something went wrong.