You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thanks so much for the notebooks. They really help me to follow through the course.
I have one question in your notebook 4, nnCostFunction -- where J = ... np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix))).
I think this does matrix multiplication --> giving 10*10 matrix (or n_label * n_label). This gives a matrix, let's name this cost-matrix, Jc. This Jc matrix contains not only how a set of predicted values for one label differs from it's corresponding target (diagonal elements), but also how it is differs from targets of other labels (off-diagonal elements). For example, the multiplication would multiply a column of predicted values np.log(a3.T) of one label (e.g. k) with all columns of targets.
Then the code sums all elements of this matrix. This seems to over-calculate J. Instead of summing all the elements, I think only the diagonal elements are needed.
Please use this picture to accommodate my description, which might be confusing.
Please let me know if I misunderstood the code.
Best regards and thanks again,
-Tua
The text was updated successfully, but these errors were encountered:
The code you refer to above is the implementation of the Regularized Cost Function shown just above the code in the notebook and in section 1.4 of the Coursera exercise document. It will return a single, scalar value (not a matrix) assigned to variable J.
I am not sure I understand what you mean with 'over-calculating' cost J.
I understand that the code is the implementation of the Regularized Cost Function shown above it.
What I meant is, I think the np.sum in J = -1*(1/m)*np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix))) should be replaced with summing only the diagonal elements of ((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix)). Because np.sum would sum all the elements of the e.g. the output matrix in the image below. And it will not be the same as the Regularized Cost Function the code refers to.
For simplicity, I only wrote the output of (np.log(a3.T)*(y_matrix) but the same argument apply for np.log(1-a3).T*(1-y_matrix).
Hi Jordi,
First of all, thanks so much for the notebooks. They really help me to follow through the course.
I have one question in your notebook 4, nnCostFunction -- where
J = ... np.sum((np.log(a3.T)*(y_matrix)+np.log(1-a3).T*(1-y_matrix)))
.I think this does matrix multiplication --> giving 10*10 matrix (or n_label * n_label). This gives a matrix, let's name this cost-matrix, Jc. This Jc matrix contains not only how a set of predicted values for one label differs from it's corresponding target (diagonal elements), but also how it is differs from targets of other labels (off-diagonal elements). For example, the multiplication would multiply a column of predicted values np.log(a3.T) of one label (e.g. k) with all columns of targets.
Then the code sums all elements of this matrix. This seems to over-calculate J. Instead of summing all the elements, I think only the diagonal elements are needed.
Please use this picture to accommodate my description, which might be confusing.
![img_20170829_155209](https://user-images.githubusercontent.com/26705716/29824563-3029bc38-8cd2-11e7-9d48-1eef31ebb50f.jpg)
Please let me know if I misunderstood the code.
Best regards and thanks again,
-Tua
The text was updated successfully, but these errors were encountered: