The objective of this endeavor is to construct a model capable of forecasting whether a Titanic passenger survived or perished, leveraging specified features.
The data utilized in this project is extracted from a CSV file named "archive.zip". It comprises details concerning Titanic passengers, encompassing their survival status, class (Pclass), gender (Gender), and age (Age).
- pandas
- seaborn
- numpy
- matplotlib.pyplot
- sklearn.linear_model.LogisticRegression
- sklearn.model_selection.train_test_split
- sklearn.preprocessing.LabelEncoder
-
The dataset was loaded into a pandas DataFrame, and its shape along with a preview of the first 10 rows were displayed using
df.shape
anddf.head(10)
respectively. -
Descriptive statistics for the numerical columns were generated using
df.describe()
to provide an overview of the data, including any missing values. -
Visualization of the count of passengers who survived versus those who did not was achieved through
sns.countplot(x=df['Survived'])
. -
Further visualization was conducted to examine the count of survivals concerning the passenger class (Pclass) using
sns.countplot(x=df['Survived'], hue=df['Pclass'])
. -
A similar visualization approach was employed to explore the count of survivals concerning gender, utilizing
sns.countplot(x=df['Sex'], hue=df['Survived'])
. -
To ascertain the survival rate by gender, calculations were performed and presented via
df.groupby('Sex')[['Survived']].mean()
. -
The 'Sex' column was transformed from categorical to numerical values using LabelEncoder from
sklearn.preprocessing
. -
Following the encoding of the 'Sex' column, non-essential columns like 'Age' were removed from the DataFrame.
- The feature matrix
X
and target vectorY
were created using relevant columns from the DataFrame. - The dataset was split into training and testing sets using
train_test_split
fromsklearn.model_selection
. - A logistic regression model was initialized and trained on the training data using
LogisticRegression
fromsklearn.linear_model
.
- The model was used to predict the survival status of passengers in the test set.
- The predicted results were printed using
log.predict(X_test)
. - The actual target values in the test set were printed using
Y_test
. - A sample prediction was made using
log.predict([[2, 1]])
with Pclass=2 and Sex=Male (1).