README: Titanic Survival Prediction

Author: Sandeep Kumawat

Batch: April 2024 (A48)

Domain: Data Science

Aim

The objective of this endeavor is to construct a model capable of forecasting whether a Titanic passenger survived or perished, leveraging specified features.

Dataset

The data utilized in this project is extracted from a CSV file named "archive.zip". It comprises details concerning Titanic passengers, encompassing their survival status, class (Pclass), gender (Gender), and age (Age).

Libraries Used

pandas
seaborn
numpy
matplotlib.pyplot
sklearn.linear_model.LogisticRegression
sklearn.model_selection.train_test_split
sklearn.preprocessing.LabelEncoder

Data Exploration and Preprocessing

The dataset was loaded into a pandas DataFrame, and its shape along with a preview of the first 10 rows were displayed using df.shape and df.head(10) respectively.
Descriptive statistics for the numerical columns were generated using df.describe() to provide an overview of the data, including any missing values.
Visualization of the count of passengers who survived versus those who did not was achieved through sns.countplot(x=df['Survived']).
Further visualization was conducted to examine the count of survivals concerning the passenger class (Pclass) using sns.countplot(x=df['Survived'], hue=df['Pclass']).
A similar visualization approach was employed to explore the count of survivals concerning gender, utilizing sns.countplot(x=df['Sex'], hue=df['Survived']).
To ascertain the survival rate by gender, calculations were performed and presented via df.groupby('Sex')[['Survived']].mean().
The 'Sex' column was transformed from categorical to numerical values using LabelEncoder from sklearn.preprocessing.
Following the encoding of the 'Sex' column, non-essential columns like 'Age' were removed from the DataFrame.

Model Training

The feature matrix X and target vector Y were created using relevant columns from the DataFrame.
The dataset was split into training and testing sets using train_test_split from sklearn.model_selection.
A logistic regression model was initialized and trained on the training data using LogisticRegression from sklearn.linear_model.

Model Prediction

The model was used to predict the survival status of passengers in the test set.
The predicted results were printed using log.predict(X_test).
The actual target values in the test set were printed using Y_test.
A sample prediction was made using log.predict([[2, 1]]) with Pclass=2 and Sex=Male (1).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
CODSOFT_TASK1.ipynb		CODSOFT_TASK1.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README: Titanic Survival Prediction

Author: Sandeep Kumawat

Batch: April 2024 (A48)

Domain: Data Science

Aim

Dataset

Libraries Used

Data Exploration and Preprocessing

Model Training

Model Prediction

About

Releases

Packages

Languages

rjsandeepkumawat/CODSOFT-TASK-1

Folders and files

Latest commit

History

Repository files navigation

README: Titanic Survival Prediction

Author: Sandeep Kumawat

Batch: April 2024 (A48)

Domain: Data Science

Aim

Dataset

Libraries Used

Data Exploration and Preprocessing

Model Training

Model Prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages