11.1

yyingying00 · Nov 2, 2023 · dab8c4b · dab8c4b
1 parent 8acdf44
commit dab8c4b
Show file tree

Hide file tree

Showing 21 changed files with 1,432 additions and 32 deletions.
diff --git a/Example analysis.qmd b/Example analysis.qmd
@@ -1,3 +1,175 @@
 ---
 title: "Example analysis"
+author: "Yingying Yu"
+date: "2023-11-1"
 ---
+
+# Introduction
+
+![](menopause.png){style="float:right;" width="170"} Menopause is a natural biological event triggered by the decline in ovarian follicular function and a reduction in circulating blood estrogen levels. According to the World Health Organization (WHO), the majority of women experience menopause in their forties and fifties, typically defined as twelve consecutive months without menstruation, with no other underlying cause. Notably, during pregnancy, women do not have their regular menstrual period for approximately nine months due to the profound impact on ovarian activity.
+
+------------------------------------------------------------------------
+
+# Research Objective
+
+The objective of this study is to investigate whether the age at which natural menopause occurs is influenced by the number of childbirths among American women. This research utilizes data from the 2017-18 National Health and Nutrition Examination Survey (NHANES). Our hypothesis posits that an increased age at menopause is associated with a higher number of childbirths.
+
+------------------------------------------------------------------------
+
+# Methodology
+
+### Study population
+
+Performed by the National Center for Health Statistics (NCHS) within the Centers for Disease Control and Prevention (CDC), the NHANES is a program of cross-sectional studies collecting data regarding the health and nutritional status of Americans. This study is an analysis of the 2017-2018 data for female participants located in different counties across the nation, including demographics, socioeconomic, body measurements, reproductive health-related information gathered by interviews and physical examinations. The dataset can be download [here](https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017).
+
+### Measures
+
+The outcome of interest in this study is the age at natural menopause of a participant. Age at menopause was obtained from the self-reported question: 'About how old were you when you had your last menstrual period?' In this study, we defined age at menopause as the age in years when the woman had her last menstrual period.
+
+The primary covariate in this analysis is the number of childbirths. The number of childbirths was obtained from the self-reported questions: 'How many of your deliveries resulted in a live birth?' In this study, we used number of childbirths as a continuous variable (0--10 childbirths).
+
+After reviewing previous literatures, the following factors are included as additional covariates to address for potential confounding. Age in years was analyzed as continuous variable. Smoking status was dichotomized as "never" and "ever". Alcohol use was dichotomized as "never" and "ever". Educational level was categorized as "Less than high school diploma", "High school diploma", "College" and "College graduate or above". Body mass index was analyzed as continues variable measured in kg/m2.
+
+### Statistical Analysis
+
+Data of the outcome variable and covariates will be extracted from five sub-data files and combined into a single analytic dataset prior to analysis.
+
+```{r}
+# Load data
+library("foreign")
+BodyMeasure <- read.xport("data/BMX_J.XPT")
+Demographic <- read.xport("data/DEMO_J.XPT")
+Alcohol <- read.xport("data/ALQ_J.XPT")
+Smoking <- read.xport("data/SMQ_J.XPT")
+Reproductive <- read.xport("data/RHQ_J.XPT")
+BMI <- BodyMeasure[,c('SEQN','BMXBMI')] 
+Demo <- Demographic[, c('SEQN','RIDAGEYR','RIDRETH3','DMDEDUC2')]
+AlcUse <- Alcohol[,c('SEQN','ALQ111')]
+SmoUse <- Smoking[,c('SEQN','SMQ020')]
+FeHealth <- Reproductive[,c('SEQN','RHQ060','RHQ171','RHQ420')]
+
+# Merge data
+data <- merge(BMI, Demo, by = "SEQN", all.x = T, all.y = F)
+data <- merge(AlcUse, data, by = "SEQN", all.x = T, all.y = F)
+data <- merge(SmoUse, data, by = "SEQN", all.x = F, all.y = T)
+data <- merge(FeHealth, data, by = "SEQN", all.x = T, all.y = F)
+
+# Change column names
+colnames(data) <- c('ID','age_meno','livebirths','pills','smoke','drink','bmi',
+                    'age_sur','race','education')
+head(data,5)
+```
+
+```{r}
+# drop NA and weird values
+library("tidyr")
+data <- drop_na(data)
+data <- data[data$age_meno != 999, ] # exclude those dont know when is their last period
+data <- data[data$pills != 9, ] # exclude those dont know whether they ever taken birth control pills
+data <- data[data$education != 9, ] # exclude those dont know their education level
+head(data,5)
+```
+
+```{r}
+# Make categorical variables
+data$race <- factor(data$race, labels = c("Mexican American","Other Hispanic","White","Black","Asian","Multi-Racial")) %>% relevel(data$race, ref = "Black")
+data$drink <- factor(data$drink, labels = c("Ever", "Never"))
+data$smoke <- factor(data$smoke, labels = c("Ever", "Never"))
+data$pills <- factor(data$pills, labels = c("Ever", "Never"))
+data$education[data$education == 1] <- 2
+data$education <- factor(data$education, labels = c("Less Than High School Diploma",
+                                                    "High School Diploma","College","College Graduate Or Above"))
+
+data$lbgroup <- data$livebirths
+data$lbgroup[data$livebirths <= 2] <- "<=2"
+data$lbgroup[data$livebirths > 2 & data$livebirths < 6 ] <- "3-5"
+data$lbgroup[data$livebirths >= 6] <- ">=6"
+data$lbgroup <- factor(data$lbgroup, levels=c("<=2","3-5",">=6"))
+```
+
+# Results
+
+### Participant Characteristics
+
+The mean age of participants the survey was done is 62.7 (SD = 11.3) years old and ranged from 22 to 80. Of the 1189 female participants, most of them are non-Hispanic White (38.7%), have some college experience (33.3%), drink alcohol at some point in their life (85.4%), and never smoke (64.1%). Table 1 below presents specific breakdowns of each variable considered by this study.
+
+```{r}
+#| warning: false
+# Plot distribution of age
+library(ggplot2)
+ggplot(data=data) +
+  geom_bar(mapping=aes(x=age_sur)) +
+  geom_vline(xintercept = mean(data$age_sur), color = "red", linetype = "dashed", size = 1) +
+  labs(title = "Distribution of Participant Age at survey ", x = "Age", y = "Frequency")
+```
+
+::: callout-note
+Note that the frequency of 80 years old is extremely high because it includes 80 years of age and over.
+:::
+
+```{r}
+#| warning: false
+ggplot(data = data) + 
+  geom_point(mapping = aes(x = age_meno, y = livebirths)) + 
+  facet_wrap(~ race, nrow = 2) +
+  labs(title = "Distribution of Menopausal Age and Livebirths by Race/Ethnicity",
+       x = "Menopausal Age", y = "Number of Livebirths") 
+```
+
+```{r}
+#| warning: false
+ggplot(data = data) + 
+  geom_smooth(mapping = aes(x = bmi, y = age_meno)) +
+  labs(title = "Distribution of BMI and Menopausal Age", x = "BMI", 
+       y = "Menopausal Age") 
+```
+
+```{r}
+#| warning: false
+
+library("table1")
+label(data$age_meno) <- "Menopause Age"
+label(data$livebirths) <- "Number of Childbirths"
+label(data$age_sur) <- "Age"
+label(data$race) <- "Race"
+label(data$education) <- "Education"
+label(data$drink) <- "Alcohol use"
+label(data$smoke) <- "Smoking status"
+label(data$pills) <- "Oral contraceptive use"
+label(data$bmi) <- "BMI"
+
+units(data$bmi) <- "kg/m2"
+caption  <- "Table 1. Characteristics of the study population according to number of livebirths"
+footnote <- "Note: Continuous variables were displayed as mean (SD) and categorical variables were displayed as number (percentage)."
+mytable <- table1(~ age_sur + race + education + drink + smoke + pills + bmi | lbgroup, data=data, footnote=footnote, caption=caption, overall=c(left="Total"))
+mytable
+```
+
+
+### Regression Analysis
+
+Univariate and multivariate analyses were performed with simple and multiple linear regression, respectively. In univariate analysis, “umber of childbirth”, “age at survey”, and “smoking status” show statistically significance with our outcome variable. For race, only the subgroups “other Hispanic” and “Asian” are significant relative to the reference group. For education level, “high school diploma” and “college graduate or above” rose to the level of statistical significance.  
+
+This association continued to appear after adjustment in the full model. After comparing between different combinations of covariates, the model with the lowest AIC score were picked as our final adjusted model shown in Table 2 below. In multivariable linear regression analyses, the number of childbirth was associated with the age of menopause, in which each additional childbirth is associated with a 0.34 (95% CI: 0.02, 0.66) increase of the menopausal age, while holding all other covariates constant. Future research will be needed to validify this association and explore the underlining biological mechanism.
+
+```{r}
+#| warning: false
+Slin1 = lm(age_meno ~ livebirths, data = data)
+
+Mlin3 = lm(age_meno ~ livebirths + age_sur + race + education + pills + smoke, data = data)
+summary(Mlin3)
+AIC(Mlin3)
+
+library(gtsummary)
+t1 <- tbl_regression(Slin1)%>%add_global_p()
+t2 <- tbl_regression(Mlin3)%>%add_global_p()
+
+tbl_merge <- tbl_merge(tbls = list(t1,t2),
+          tab_spanner = c("**Unadjusted**","**Adjusted**"))
+tbl_merge 
+```
+
+
+
+
+
diff --git a/_quarto.yml b/_quarto.yml
@@ -9,7 +9,6 @@ website:
       - href: index.qmd
         text: Home
       - href: about.qmd
-        text: Who am I
       - Example analysis.qmd
 
 format:

diff --git a/about.qmd b/about.qmd
@@ -2,9 +2,8 @@
 title: "About"
 ---
 
-# Education
+My journey has been fueled by a deep-rooted passion for making a meaningful impact in the field of oncology. My undergraduate major in biology laid the foundation for my initial career aspirations. However, over time, I found myself increasingly drawn to the world of data science and bioinformatics. I realized the potential for data-driven insights to transform our understanding of complex biological processes.
 
-------------------------------------------------------------------------
+Currently, I'm pursuing a master's degree in epidemiology, which has provided me with valuable knowledge and skills to better comprehend the broader context of disease patterns and public health. Through several hands-on experiences, I've become proficient in handling RNA sequencing data and utilizing numerous R packages related to genome-wide analysis. My goal now is to merge my biological expertise with data science and epidemiology to contribute to advancements in oncology research.
 
-**MHS in Epidemiology** (2024) Johns Hopkins University\
-**BA in Biology** (2022) New York University
+I look forward to connecting with like-minded individuals who share my passion for driving positive change in the oncology field.
diff --git a/badminton.JPG b/badminton.JPG
diff --git a/data/ALQ_J.XPT b/data/ALQ_J.XPT
diff --git a/data/BMX_J.XPT b/data/BMX_J.XPT
diff --git a/data/DEMO_J.XPT b/data/DEMO_J.XPT
diff --git a/data/RHQ_J.XPT b/data/RHQ_J.XPT
diff --git a/data/SMQ_J.XPT b/data/SMQ_J.XPT