Answers for the exercises #2

tedchou12 · 2021-05-16T09:15:38Z

Thank you for the amazing book and the tutorials, I really appreciate the time and effort you have put into the book.

I am wondering if there is somewhere that you have posted the answers of the exercieses for us to double check?
Since this is completely self-study, having answers to look up would be helpful to let me know that I am at least in the right direction.

Thanks.

JannesKlaas · 2021-05-19T08:10:14Z

Hey @tedchou12 ,
there are no solution files published anywhere. Most of the questions are quite open ended, and there is multiple ways to solve them. But once you solved them it should be quite obvious that you did.

If you are stuck anywhere, or unsure, feel free to send me an email! I will give it a look :)

tedchou12 · 2021-05-20T18:35:39Z

Sorry for asking the question so quickly...
In chapater 2 of the book regarding fraud detection. I am trying to run locally and the code is quite similar to what was described int the book:

import pandas as pd
import numpy as np
from sklearn.metrics import f1_score, confusion_matrix
from keras.layers import Embedding, Dense, Activation, Reshape, Input, Concatenate
from keras.models import Model, Sequential
from keras.optimizers import SGD
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE

df = pd.read_csv('PS_20174392719_1491204439457_log.csv')
df = df.rename(columns={'oldbalanceOrg':'oldBalanceOrig', 'newbalanceOrig':'newBalanceOrig', \
                        'oldbalanceDest':'oldBalanceDest', 'newbalanceDest':'newBalanceDest'})


df['type'] = 'type_' + df['type'].astype(str)
dummies = pd.get_dummies(df['type'])
df = pd.concat([df, dummies], axis=1)


df['hour'] = df['step'] % 24
df['isNight'] = np.where((2 <= df['hour']) & (df['hour'] <= 6), 1, 0)

del df['type']
del df['step']
del df['nameOrig']
del df['nameDest']
del df['type_CASH_IN']
del df['type_DEBIT']
del df['type_PAYMENT']
del df['hour']

y_df = df['isFraud']
x_df = df.drop('isFraud', axis=1)

y = y_df.values
X = x_df.values


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.1, random_state=42)

sm = SMOTE(random_state=42)
# print(len(y_train))
X_train_res, y_train_res = sm.fit_resample(X_train, y_train)
# print(len(y_train_res))

# nn - level 1
alpha = 0.00001
model = Sequential()
model.add(Dense(1, input_dim=9))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=SGD(learning_rate=alpha), metrics=['acc'])

model.fit(X_train_res, y_train_res, epochs=5, batch_size=256, validation_data=(X_valid, y_valid))

y_pred = model.predict(X_test)
y_pred[y_pred > 0.5] = 1
y_pred[y_pred < 0.5] = 0

f1_s = f1_score(y_pred=y_pred, y_true=y_test)
print(f1_s)

# nn - level 2
alpha = 0.00001
model = Sequential()
model.add(Dense(16, input_dim=9))
model.add(Activation('tanh'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=SGD(learning_rate=alpha), metrics=['acc'])

model.fit(X_train_res, y_train_res, epochs=5, batch_size=256, validation_data=(X_valid, y_valid))

y_pred = model.predict(X_test)
y_pred[y_pred > 0.5] = 1
y_pred[y_pred < 0.5] = 0

f1_s = f1_score(y_pred=y_pred, y_true=y_test)
print(f1_s)

It can run, BUT the output and the f1 score seems awkward:

# for level 1
ted.chou@IITPC20-0109 ch2 % python3 fin_keras_predictive.py

2021-05-21 02:54:00.916335: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-21 02:54:01.352387: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/5
29935/29935 [==============================] - 27s 583us/step - loss: 479051.4881 - acc: 0.8800 - val_loss: 94051.3516 - val_acc: 0.8695
Epoch 2/5
29935/29935 [==============================] - 17s 554us/step - loss: 429615.8285 - acc: 0.8908 - val_loss: 4390205.5000 - val_acc: 0.5978
Epoch 3/5
29935/29935 [==============================] - 18s 589us/step - loss: 427322.5454 - acc: 0.8919 - val_loss: 117590.3438 - val_acc: 0.8590
Epoch 4/5
29935/29935 [==============================] - 17s 557us/step - loss: 435540.8527 - acc: 0.8911 - val_loss: 120546.6719 - val_acc: 0.8424
Epoch 5/5
29935/29935 [==============================] - 17s 582us/step - loss: 414160.2561 - acc: 0.8923 - val_loss: 124063.6250 - val_acc: 0.8490
0.01581431334622824

# for level 2
Epoch 1/5
29935/29935 [==============================] - 19s 621us/step - loss: 0.8930 - acc: 0.5381 - val_loss: 1.0736 - val_acc: 0.2382
Epoch 2/5
29935/29935 [==============================] - 20s 684us/step - loss: 0.7505 - acc: 0.6186 - val_loss: 0.7672 - val_acc: 0.5732
Epoch 3/5
29935/29935 [==============================] - 19s 636us/step - loss: 0.6106 - acc: 0.7352 - val_loss: 0.6389 - val_acc: 0.7062
Epoch 4/5
29935/29935 [==============================] - 19s 650us/step - loss: 0.5742 - acc: 0.7549 - val_loss: 0.6248 - val_acc: 0.7198
Epoch 5/5
29935/29935 [==============================] - 20s 654us/step - loss: 0.5547 - acc: 0.7691 - val_loss: 0.5958 - val_acc: 0.7760
0.009077803688785245

the 2 layer nn seems to have a lower f1 score as compared to 1 layer, which must have proven I have done something wrong.
Another thing is that the 1 layer training loss number is quite big, also a bit weird.

Unfortunately, I think the code is exactly as it was in the book. One thing I suspect is that I didn't normalize the data? But that wasn't written in the book for this chapter as well.

Thanks!
Ted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Answers for the exercises #2

Answers for the exercises #2

tedchou12 commented May 16, 2021 •

edited

Loading

JannesKlaas commented May 19, 2021

tedchou12 commented May 20, 2021

Answers for the exercises #2

Answers for the exercises #2

Comments

tedchou12 commented May 16, 2021 • edited Loading

JannesKlaas commented May 19, 2021

tedchou12 commented May 20, 2021

tedchou12 commented May 16, 2021 •

edited

Loading