Skip to content

Commit

Permalink
11.1
Browse files Browse the repository at this point in the history
  • Loading branch information
yyingying00 committed Nov 2, 2023
1 parent 8acdf44 commit dab8c4b
Show file tree
Hide file tree
Showing 21 changed files with 1,432 additions and 32 deletions.
172 changes: 172 additions & 0 deletions Example analysis.qmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,175 @@
---
title: "Example analysis"
author: "Yingying Yu"
date: "2023-11-1"
---

# Introduction

![](menopause.png){style="float:right;" width="170"} Menopause is a natural biological event triggered by the decline in ovarian follicular function and a reduction in circulating blood estrogen levels. According to the World Health Organization (WHO), the majority of women experience menopause in their forties and fifties, typically defined as twelve consecutive months without menstruation, with no other underlying cause. Notably, during pregnancy, women do not have their regular menstrual period for approximately nine months due to the profound impact on ovarian activity.

------------------------------------------------------------------------

# Research Objective

The objective of this study is to investigate whether the age at which natural menopause occurs is influenced by the number of childbirths among American women. This research utilizes data from the 2017-18 National Health and Nutrition Examination Survey (NHANES). Our hypothesis posits that an increased age at menopause is associated with a higher number of childbirths.

------------------------------------------------------------------------

# Methodology

### Study population

Performed by the National Center for Health Statistics (NCHS) within the Centers for Disease Control and Prevention (CDC), the NHANES is a program of cross-sectional studies collecting data regarding the health and nutritional status of Americans. This study is an analysis of the 2017-2018 data for female participants located in different counties across the nation, including demographics, socioeconomic, body measurements, reproductive health-related information gathered by interviews and physical examinations. The dataset can be download [here](https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017).

### Measures

The outcome of interest in this study is the age at natural menopause of a participant. Age at menopause was obtained from the self-reported question: 'About how old were you when you had your last menstrual period?' In this study, we defined age at menopause as the age in years when the woman had her last menstrual period.

The primary covariate in this analysis is the number of childbirths. The number of childbirths was obtained from the self-reported questions: 'How many of your deliveries resulted in a live birth?' In this study, we used number of childbirths as a continuous variable (0--10 childbirths).

After reviewing previous literatures, the following factors are included as additional covariates to address for potential confounding. Age in years was analyzed as continuous variable. Smoking status was dichotomized as "never" and "ever". Alcohol use was dichotomized as "never" and "ever". Educational level was categorized as "Less than high school diploma", "High school diploma", "College" and "College graduate or above". Body mass index was analyzed as continues variable measured in kg/m2.

### Statistical Analysis

Data of the outcome variable and covariates will be extracted from five sub-data files and combined into a single analytic dataset prior to analysis.

```{r}
# Load data
library("foreign")
BodyMeasure <- read.xport("data/BMX_J.XPT")
Demographic <- read.xport("data/DEMO_J.XPT")
Alcohol <- read.xport("data/ALQ_J.XPT")
Smoking <- read.xport("data/SMQ_J.XPT")
Reproductive <- read.xport("data/RHQ_J.XPT")
BMI <- BodyMeasure[,c('SEQN','BMXBMI')]
Demo <- Demographic[, c('SEQN','RIDAGEYR','RIDRETH3','DMDEDUC2')]
AlcUse <- Alcohol[,c('SEQN','ALQ111')]
SmoUse <- Smoking[,c('SEQN','SMQ020')]
FeHealth <- Reproductive[,c('SEQN','RHQ060','RHQ171','RHQ420')]
# Merge data
data <- merge(BMI, Demo, by = "SEQN", all.x = T, all.y = F)
data <- merge(AlcUse, data, by = "SEQN", all.x = T, all.y = F)
data <- merge(SmoUse, data, by = "SEQN", all.x = F, all.y = T)
data <- merge(FeHealth, data, by = "SEQN", all.x = T, all.y = F)
# Change column names
colnames(data) <- c('ID','age_meno','livebirths','pills','smoke','drink','bmi',
'age_sur','race','education')
head(data,5)
```

```{r}
# drop NA and weird values
library("tidyr")
data <- drop_na(data)
data <- data[data$age_meno != 999, ] # exclude those dont know when is their last period
data <- data[data$pills != 9, ] # exclude those dont know whether they ever taken birth control pills
data <- data[data$education != 9, ] # exclude those dont know their education level
head(data,5)
```

```{r}
# Make categorical variables
data$race <- factor(data$race, labels = c("Mexican American","Other Hispanic","White","Black","Asian","Multi-Racial")) %>% relevel(data$race, ref = "Black")
data$drink <- factor(data$drink, labels = c("Ever", "Never"))
data$smoke <- factor(data$smoke, labels = c("Ever", "Never"))
data$pills <- factor(data$pills, labels = c("Ever", "Never"))
data$education[data$education == 1] <- 2
data$education <- factor(data$education, labels = c("Less Than High School Diploma",
"High School Diploma","College","College Graduate Or Above"))
data$lbgroup <- data$livebirths
data$lbgroup[data$livebirths <= 2] <- "<=2"
data$lbgroup[data$livebirths > 2 & data$livebirths < 6 ] <- "3-5"
data$lbgroup[data$livebirths >= 6] <- ">=6"
data$lbgroup <- factor(data$lbgroup, levels=c("<=2","3-5",">=6"))
```

# Results

### Participant Characteristics

The mean age of participants the survey was done is 62.7 (SD = 11.3) years old and ranged from 22 to 80. Of the 1189 female participants, most of them are non-Hispanic White (38.7%), have some college experience (33.3%), drink alcohol at some point in their life (85.4%), and never smoke (64.1%). Table 1 below presents specific breakdowns of each variable considered by this study.

```{r}
#| warning: false
# Plot distribution of age
library(ggplot2)
ggplot(data=data) +
geom_bar(mapping=aes(x=age_sur)) +
geom_vline(xintercept = mean(data$age_sur), color = "red", linetype = "dashed", size = 1) +
labs(title = "Distribution of Participant Age at survey ", x = "Age", y = "Frequency")
```

::: callout-note
Note that the frequency of 80 years old is extremely high because it includes 80 years of age and over.
:::

```{r}
#| warning: false
ggplot(data = data) +
geom_point(mapping = aes(x = age_meno, y = livebirths)) +
facet_wrap(~ race, nrow = 2) +
labs(title = "Distribution of Menopausal Age and Livebirths by Race/Ethnicity",
x = "Menopausal Age", y = "Number of Livebirths")
```

```{r}
#| warning: false
ggplot(data = data) +
geom_smooth(mapping = aes(x = bmi, y = age_meno)) +
labs(title = "Distribution of BMI and Menopausal Age", x = "BMI",
y = "Menopausal Age")
```

```{r}
#| warning: false
library("table1")
label(data$age_meno) <- "Menopause Age"
label(data$livebirths) <- "Number of Childbirths"
label(data$age_sur) <- "Age"
label(data$race) <- "Race"
label(data$education) <- "Education"
label(data$drink) <- "Alcohol use"
label(data$smoke) <- "Smoking status"
label(data$pills) <- "Oral contraceptive use"
label(data$bmi) <- "BMI"
units(data$bmi) <- "kg/m2"
caption <- "Table 1. Characteristics of the study population according to number of livebirths"
footnote <- "Note: Continuous variables were displayed as mean (SD) and categorical variables were displayed as number (percentage)."
mytable <- table1(~ age_sur + race + education + drink + smoke + pills + bmi | lbgroup, data=data, footnote=footnote, caption=caption, overall=c(left="Total"))
mytable
```


### Regression Analysis

Univariate and multivariate analyses were performed with simple and multiple linear regression, respectively. In univariate analysis, “umber of childbirth”, “age at survey”, and “smoking status” show statistically significance with our outcome variable. For race, only the subgroups “other Hispanic” and “Asian” are significant relative to the reference group. For education level, “high school diploma” and “college graduate or above” rose to the level of statistical significance.

This association continued to appear after adjustment in the full model. After comparing between different combinations of covariates, the model with the lowest AIC score were picked as our final adjusted model shown in Table 2 below. In multivariable linear regression analyses, the number of childbirth was associated with the age of menopause, in which each additional childbirth is associated with a 0.34 (95% CI: 0.02, 0.66) increase of the menopausal age, while holding all other covariates constant. Future research will be needed to validify this association and explore the underlining biological mechanism.

```{r}
#| warning: false
Slin1 = lm(age_meno ~ livebirths, data = data)
Mlin3 = lm(age_meno ~ livebirths + age_sur + race + education + pills + smoke, data = data)
summary(Mlin3)
AIC(Mlin3)
library(gtsummary)
t1 <- tbl_regression(Slin1)%>%add_global_p()
t2 <- tbl_regression(Mlin3)%>%add_global_p()
tbl_merge <- tbl_merge(tbls = list(t1,t2),
tab_spanner = c("**Unadjusted**","**Adjusted**"))
tbl_merge
```





1 change: 0 additions & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ website:
- href: index.qmd
text: Home
- href: about.qmd
text: Who am I
- Example analysis.qmd

format:
Expand Down
7 changes: 3 additions & 4 deletions about.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@
title: "About"
---

# Education
My journey has been fueled by a deep-rooted passion for making a meaningful impact in the field of oncology. My undergraduate major in biology laid the foundation for my initial career aspirations. However, over time, I found myself increasingly drawn to the world of data science and bioinformatics. I realized the potential for data-driven insights to transform our understanding of complex biological processes.

------------------------------------------------------------------------
Currently, I'm pursuing a master's degree in epidemiology, which has provided me with valuable knowledge and skills to better comprehend the broader context of disease patterns and public health. Through several hands-on experiences, I've become proficient in handling RNA sequencing data and utilizing numerous R packages related to genome-wide analysis. My goal now is to merge my biological expertise with data science and epidemiology to contribute to advancements in oncology research.

**MHS in Epidemiology** (2024) Johns Hopkins University\
**BA in Biology** (2022) New York University
I look forward to connecting with like-minded individuals who share my passion for driving positive change in the oncology field.
Binary file added badminton.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added data/ALQ_J.XPT
Binary file not shown.
Binary file added data/BMX_J.XPT
Binary file not shown.
Binary file added data/DEMO_J.XPT
Binary file not shown.
Binary file added data/RHQ_J.XPT
Binary file not shown.
Binary file added data/SMQ_J.XPT
Binary file not shown.
Loading

0 comments on commit dab8c4b

Please sign in to comment.