Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: PRML/PRMLT
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.1
Choose a base ref
...
head repository: PRML/PRMLT
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Loading
Showing with 785 additions and 496 deletions.
  1. +126 −0 Contents.m
  2. +21 −0 LICENSE
  3. +18 −17 README.md
  4. +2 −5 chapter01/entropy.m
  5. +7 −8 chapter04/logitBin.m
  6. +0 −39 chapter05/mlp.m
  7. +63 −0 chapter05/mlpClass.m
  8. +19 −0 chapter05/mlpClassPred.m
  9. +0 −13 chapter05/mlpPred.m
  10. +62 −0 chapter05/mlpReg.m
  11. +17 −0 chapter05/mlpRegPred.m
  12. +1 −1 chapter06/knGauss.m
  13. +8 −11 chapter06/knKmeans.m
  14. +0 −1 chapter07/rvmRegSeq.m
  15. +12 −0 chapter08/MRF/mrfBethe.m
  16. +56 −0 chapter08/MRF/mrfBp.m
  17. +11 −0 chapter08/MRF/mrfGibbs.m
  18. +21 −0 chapter08/MRF/mrfIsGa.m
  19. +34 −0 chapter08/MRF/mrfMf.m
  20. 0 chapter08/{ → NaiveBayes}/nbBern.m
  21. 0 chapter08/{ → NaiveBayes}/nbBernPred.m
  22. +2 −2 chapter08/{ → NaiveBayes}/nbGauss.m
  23. 0 chapter08/{ → NaiveBayes}/nbGaussPred.m
  24. +0 −11 chapter08/betheEnergy.m
  25. +0 −9 chapter08/gibbsEnergy.m
  26. +0 −20 chapter08/im2mrf.m
  27. +0 −30 chapter08/isingMeanField.m
  28. +0 −62 chapter08/mrfBelProp.m
  29. +0 −55 chapter08/mrfExpProp.m
  30. +0 −31 chapter08/mrfMeanField.m
  31. +1 −1 chapter09/kmeansRnd.m
  32. +1 −1 chapter09/kmedoids.m
  33. +5 −5 chapter09/linRegEm.m
  34. +3 −3 chapter10/linRegVb.m
  35. +3 −3 chapter10/rvmRegVb.m
  36. +1 −3 chapter11/discreteRnd.m
  37. +6 −6 chapter12/ppcaVb.m
  38. +2 −2 chapter13/HMM/hmmEm.m
  39. +2 −2 chapter13/LDS/kalmanFilter.m
  40. +17 −17 chapter13/LDS/kalmanSmoother.m
  41. +43 −36 chapter13/LDS/ldsEm.m
  42. +20 −0 chapter13/LDS/ldsPca.m
  43. +15 −13 chapter13/LDS/ldsRnd.m
  44. +0 −1 chapter14/mixLinReg.m
  45. +7 −0 common/log1mexp.m
  46. +5 −4 common/log1pexp.m
  47. +3 −3 demo/ch04/logitBin_demo.m
  48. +32 −9 demo/ch05/mlp_demo.m
  49. +22 −0 demo/ch06/knKmeans_demo.m
  50. +1 −1 demo/ch07/rvmBinEm_demo.m
  51. +1 −1 demo/ch07/rvmBinFp_demo.m
  52. +34 −42 demo/ch08/mrf_demo.m
  53. +0 −13 demo/ch09/rvmBinEm_demo.m
  54. +2 −4 demo/ch12/ppcaVb_demo.m
  55. +79 −11 demo/ch13/lds_demo.m
126 changes: 126 additions & 0 deletions Contents.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
% CHAPTER01
% condEntropy - Compute conditional entropy z=H(x|y) of two discrete variables x and y.
% entropy - Compute entropy z=H(x) of a discrete variable x.
% jointEntropy - Compute joint entropy z=H(x,y) of two discrete variables x and y.
% mutInfo - Compute mutual information I(x,y) of two discrete variables x and y.
% nmi - Compute normalized mutual information I(x,y)/sqrt(H(x)*H(y)) of two discrete variables x and y.
% nvi - Compute normalized variation information z=(1-I(x,y)/H(x,y)) of two discrete variables x and y.
% relatEntropy - Compute relative entropy (a.k.a KL divergence) z=KL(p(x)||p(y)) of two discrete variables x and y.
% CHAPTER02
% logDirichlet - Compute log pdf of a Dirichlet distribution.
% logGauss - Compute log pdf of a Gaussian distribution.
% logKde - Compute log pdf of kernel density estimator.
% logMn - Compute log pdf of a multinomial distribution.
% logMvGamma - Compute logarithm multivariate Gamma function
% logSt - Compute log pdf of a Student's t distribution.
% logVmf - Compute log pdf of a von Mises-Fisher distribution.
% logWishart - Compute log pdf of a Wishart distribution.
% CHAPTER03
% linReg - Fit linear regression model y=w'x+w0
% linRegFp - Fit empirical Bayesian linear model with Mackay fixed point method (p.168)
% linRegPred - Compute linear regression model reponse y = w'*X+w0 and likelihood
% linRnd - Generate data from a linear model p(t|w,x)=G(w'x+w0,sigma), sigma=sqrt(1/beta)
% CHAPTER04
% binPlot - Plot binary classification result for 2d data
% fda - Fisher (linear) discriminant analysis
% logitBin - Logistic regression for binary classification optimized by Newton-Raphson method.
% logitBinPred - Prediction of binary logistic regression model
% logitMn - Multinomial regression for multiclass problem (Multinomial likelihood)
% logitMnPred - Prediction of multiclass (multinomial) logistic regression model
% sigmoid - Sigmod function
% softmax - Softmax function
% CHAPTER05
% mlpClass - Train a multilayer perceptron neural network for classification with backpropagation
% mlpClassPred - Multilayer perceptron classification prediction
% mlpReg - Train a multilayer perceptron neural network for regression with backpropagation
% mlpRegPred - Multilayer perceptron regression prediction
% CHAPTER06
% kn2sd - Transform a kernel matrix (or inner product matrix) to a squared distance matrix
% knCenter - Centerize the data in the kernel space
% knGauss - Gaussian (RBF) kernel K = exp(-|x-y|/(2s));
% knKmeans - Perform kernel kmeans clustering.
% knKmeansPred - Prediction for kernel kmeans clusterng
% knLin - Linear kernel (inner product)
% knPca - Kernel PCA
% knPcaPred - Prediction for kernel PCA
% knPoly - Polynomial kernel k(x,y)=(x'y+c)^o
% knReg - Gaussian process (kernel) regression
% knRegPred - Prediction for Gaussian Process (kernel) regression model
% sd2kn - Transform a squared distance matrix to a kernel matrix.
% CHAPTER07
% rvmBinFp - Relevance Vector Machine (ARD sparse prior) for binary classification.
% rvmBinPred - Prodict the label for binary logistic regression model
% rvmRegFp - Relevance Vector Machine (ARD sparse prior) for regression
% rvmRegPred - Compute RVM regression model reponse y = w'*X+w0 and likelihood
% rvmRegSeq - Sparse Bayesian Regression (RVM) using sequential algorithm
% CHAPTER08
% MRF
% mrfBethe - Compute Bethe energy
% mrfBp - Undirected graph belief propagation for MRF
% mrfGibbs - Compute Gibbs energy
% mrfIsGa - Contruct a latent Ising MRF with Gaussian observation
% mrfMf - Mean field for MRF
% NaiveBayes
% nbBern - Naive bayes classifier with indepenet Bernoulli.
% nbBernPred - Prediction of naive Bayes classifier with independent Bernoulli.
% nbGauss - Naive bayes classifier with indepenet Gaussian
% nbGaussPred - Prediction of naive Bayes classifier with independent Gaussian.
% CHAPTER09
% kmeans - Perform kmeans clustering.
% kmeansPred - Prediction for kmeans clusterng
% kmeansRnd - Generate samples from a Gaussian mixture distribution with common variances (kmeans model).
% kmedoids - Perform k-medoids clustering.
% kseeds - Perform kmeans++ seeding
% linRegEm - Fit empirical Bayesian linear regression model with EM (p.448 chapter 9.3.4)
% mixBernEm - Perform EM algorithm for fitting the Bernoulli mixture model.
% mixBernRnd - Generate samples from a Bernoulli mixture distribution.
% mixGaussEm - Perform EM algorithm for fitting the Gaussian mixture model.
% mixGaussPred - Predict label and responsibility for Gaussian mixture model.
% mixGaussRnd - Genarate samples form a Gaussian mixture model.
% rvmBinEm - Relevance Vector Machine (ARD sparse prior) for binary classification.
% rvmRegEm - Relevance Vector Machine (ARD sparse prior) for regression
% CHAPTER10
% linRegVb - Variational Bayesian inference for linear regression.
% mixGaussEvidence - Variational lower bound of the model evidence (log of marginal likelihood)
% mixGaussVb - Variational Bayesian inference for Gaussian mixture.
% mixGaussVbPred - Predict label and responsibility for Gaussian mixture model trained by VB.
% rvmRegVb - Variational Bayesian inference for RVM regression.
% CHAPTER11
% dirichletRnd - Generate samples from a Dirichlet distribution.
% discreteRnd - Generate samples from a discrete distribution (multinomial).
% Gauss - Class for Gaussian distribution used by Dirichlet process
% gaussRnd - Generate samples from a Gaussian distribution.
% GaussWishart - Class for Gaussian-Wishart distribution used by Dirichlet process
% mixDpGb - Collapsed Gibbs sampling for Dirichlet process (infinite) mixture model.
% mixDpGbOl - Online collapsed Gibbs sampling for Dirichlet process (infinite) mixture model.
% mixGaussGb - Collapsed Gibbs sampling for Dirichlet process (infinite) Gaussian mixture model (a.k.a. DPGM).
% mixGaussSample - Genarate samples form a Gaussian mixture model with GaussianWishart prior.
% CHAPTER12
% fa - Perform EM algorithm for factor analysis model
% pca - Principal component analysis
% pcaEm - Perform EM-like algorithm for PCA (by Sam Roweis).
% pcaEmC - Perform Constrained EM like algorithm for PCA.
% ppcaEm - Perform EM algorithm to maiximize likelihood of probabilistic PCA model.
% ppcaRnd - Generate data from probabilistic PCA model
% ppcaVb - Perform variatioanl Bayeisan inference for probabilistic PCA model.
% CHAPTER13
% HMM
% hmmEm - EM algorithm to fit the parameters of HMM model (a.k.a Baum-Welch algorithm)
% hmmFilter - HMM forward filtering algorithm.
% hmmRnd - Generate a data sequence from a hidden Markov model.
% hmmSmoother - HMM smoothing alogrithm (normalized forward-backward or normalized alpha-beta algorithm).
% hmmViterbi - Viterbi algorithm (calculated in log scale to improve numerical stability).
% LDS
% kalmanFilter - Kalman filter (forward algorithm for linear dynamic system)
% kalmanSmoother - Kalman smoother (forward-backward algorithm for linear dynamic system)
% ldsEm - EM algorithm for parameter estimation of linear dynamic system.
% ldsPca - Subspace method for learning linear dynamic system.
% ldsRnd - Generate a data sequence from linear dynamic system.
% CHAPTER14
% adaboostBin - Adaboost for binary classification (weak learner: kmeans)
% adaboostBinPred - Prediction of binary Adaboost
% mixLinPred - Prediction function for mxiture of linear regression
% mixLinReg - Mixture of linear regression
% mixLinRnd - Generate data from mixture of linear model
% mixLogitBin - Mixture of logistic regression model for binary classification optimized by Newton-Raphson method
% mixLogitBinPred - Prediction function for mixture of logistic regression
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2018 Mo Chen

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
35 changes: 18 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,37 @@
Introduction
-------
This package is a Matlab implementation of the algorithms described in the classical machine learning textbook:
This Matlab package implements machine learning algorithms described in the great textbook:
Pattern Recognition and Machine Learning by C. Bishop ([PRML](http://research.microsoft.com/en-us/um/people/cmbishop/prml/)).

Note: this package requires Matlab **R2016b** or later, since it utilizes a new syntax of Matlab called [Implicit expansion](https://cn.mathworks.com/help/matlab/release-notes.html?rntext=implicit+expansion&startrelease=R2016b&endrelease=R2016b&groupby=release&sortby=descending) (a.k.a. broadcasting in Python).
It is written purely in Matlab language. It is self-contained. There is no external dependency.

Description
-------
While developing this package, I stick to following principles
Note: this package requires Matlab **R2016b** or latter, since it utilizes a new Matlab syntax called [Implicit expansion](https://cn.mathworks.com/help/matlab/release-notes.html?rntext=implicit+expansion&startrelease=R2016b&endrelease=R2016b&groupby=release&sortby=descending) (a.k.a. broadcasting). It also requires Statistics Toolbox (for some simple random number generator) and Image Processing Toolbox (for reading image data).

* Succinct: The code is extremely terse. Minimizing the number of lines is a primal target. As a result, the core of the algorithms can be easily spot.
* Efficient: Many tricks for making Matlab scripts fast were applied (eg. vectorization and matrix factorization). Many functions are even comparable with C implementations. Usually, functions in this package are orders faster than Matlab builtin ones which provide the same functionality (eg. kmeans). If anyone have found any Matlab implementation that is faster than mine, I am happy to further optimize.
* Robust: Many tricks for numerical stability are applied, such as probability computation in log scale and square root matrix update to enforce matrix symmetry, etc.
* Learnable: The code is heavily commented. Reference formulas in PRML book are indicated for corresponding code lines. Symbols are in sync with the book.
* Practical: The package is designed not only to be easily read, but also to be easily used to facilitate ML research. Many functions in this package are already widely used (see [Matlab file exchange](http://www.mathworks.com/matlabcentral/fileexchange/?term=authorid%3A49739)).
Design Goal
-------
* Succinct: The code is extremely compact. Minimizing code length is a major goal. As a result, the core of the algorithms can be easily spotted.
* Efficient: Many tricks for speeding up Matlab code are applied (e.g. vectorization, matrix factorization, etc.). Usually, functions in this package are orders faster than Matlab builtin ones (e.g. kmeans).
* Robust: Many tricks for numerical stability are applied, such as computing probability in logrithm domain, square root matrix update to enforce matrix symmetry\PD, etc.
* Readable: The code is heavily commented. Corresponding formulas in PRML are annoted. Symbols are in sync with the book.
* Practical: The package is not only readable, but also meant to be easily used and modified to facilitate ML research. Many functions in this package are already widely used (see [Matlab file exchange](http://www.mathworks.com/matlabcentral/fileexchange/?term=authorid%3A49739)).

Installation
-------
1. Download the package to your local path (e.g. PRMLT/) by running: `git clone https://github.com/PRML/PRMLT.git`.
1. Download the package to a local folder (e.g. ~/PRMLT/) by running:
```console
git clone https://github.com/PRML/PRMLT.git
```
2. Run Matlab and navigate to the folder (~/PRMLT/), then run the init.m script.

2. Run Matlab and navigate to PRMLT/, then run the init.m script.

3. Run some demos in PRMLT/demo directory. Enjoy!
3. Run some demos in ~/PRMLT/demo folder. Enjoy!

FeedBack
-------
If you found any bug or have any suggestion, please do file issues. I am graceful for any feedback and will do my best to improve this package.
If you find any bug or have any suggestion, please do file issues. I am graceful for any feedback and will do my best to improve this package.

License
-------
Currently Released Under GPLv3

Released under MIT license

Contact
-------
7 changes: 2 additions & 5 deletions chapter01/entropy.m
Original file line number Diff line number Diff line change
@@ -6,10 +6,7 @@
% z: entropy z=H(x)
% Written by Mo Chen (sth4nth@gmail.com).
n = numel(x);
[u,~,x] = unique(x);
k = numel(u);
idx = 1:n;
Mx = sparse(idx,x,1,n,k,n);
Px = nonzeros(mean(Mx,1));
[~,~,x] = unique(x);
Px = accumarray(x, 1)/n;
Hx = -dot(Px,log2(Px));
z = max(0,Hx);
15 changes: 7 additions & 8 deletions chapter04/logitBin.m
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
function [model, llh] = logitBin(X, y, lambda, eta)
function [model, llh] = logitBin(X, y, lambda)
% Logistic regression for binary classification optimized by Newton-Raphson method.
% Input:
% X: d x n data matrix
% z: 1 x n label (0/1)
% y: 1 x n label (0/1)
% lambda: regularization parameter
% eta: step size
% alpha: step size
% Output:
% model: trained model structure
% llh: loglikelihood
% Written by Mo Chen (sth4nth@gmail.com).
if nargin < 4
eta = 1e-1;
alpha = 1e-1;
end
if nargin < 3
lambda = 1e-4;
@@ -20,18 +20,17 @@
tol = 1e-4;
epoch = 200;
llh = -inf(1,epoch);
h = 2*y-1;
w = rand(d,1);
for t = 2:epoch
a = w'*X;
llh(t) = -(sum(log1pexp(-h.*a))+0.5*lambda*dot(w,w))/n; % 4.89
if llh(t)-llh(t-1) < tol; break; end
llh(t) = (dot(a,y)-sum(log1pexp(a))-0.5*lambda*dot(w,w))/n; % 4.90
if abs(llh(t)-llh(t-1)) < tol; break; end
z = sigmoid(a); % 4.87
g = X*(z-y)'+lambda*w; % 4.96
r = z.*(1-z); % 4.98
Xw = bsxfun(@times, X, sqrt(r));
H = Xw*Xw'+lambda*eye(d); % 4.97
w = w-eta*(H\g);
w = w-alpha*(H\g); % 4.92
end
llh = llh(2:t);
model.w = w;
39 changes: 0 additions & 39 deletions chapter05/mlp.m

This file was deleted.

63 changes: 63 additions & 0 deletions chapter05/mlpClass.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
function [model, L] = mlpClass(X, y, k, lambda)
% Train a multilayer perceptron neural network for multiclass classification with backpropagation
% logistic activation function is used.
% Input:
% X: d x n data matrix
% y: 1 x n label vector
% k: T x 1 vector to specify number of hidden nodes in each layer
% lambda: regularization parameter
% Ouput:
% model: model structure
% L: (regularized cross entropy) loss
% Written by Mo Chen (sth4nth@gmail.com).
if nargin < 4
lambda = 1e-2;
end
eta = 1e-3;
tol = 1e-4;
maxiter = 50000;
L = inf(1,maxiter);

Y = sparse(y,1:numel(y),1);
k = [size(X,1);k(:);size(Y,1)];
T = numel(k)-1;
W = cell(T,1);
b = cell(T,1);
for t = 1:T
W{t} = randn(k(t),k(t+1));
b{t} = randn(k(t+1),1);
end
R = cell(T,1);
Z = cell(T+1,1);
Z{1} = X;
for iter = 2:maxiter
% forward
for t = 1:T-1
Z{t+1} = sigmoid(W{t}'*Z{t}+b{t}); % 5.10 5.113
end
Z{T+1} = softmax(W{T}'*Z{T}+b{T});

% loss
E = Z{T+1};
Wn = cellfun(@(x) dot(x(:),x(:)),W); % |W|^2
L(iter) = -dot(Y(:),log(E(:)))+0.5*lambda*sum(Wn);
if abs(L(iter)-L(iter-1)) < tol*L(iter-1); break; end

% backward
R{T} = Z{T+1}-Y;
for t = T-1:-1:1
df = Z{t+1}.*(1-Z{t+1}); % h'(a)
R{t} = df.*(W{t+1}*R{t+1}); % 5.66
end

% gradient descent
for t=1:T
dW = Z{t}*R{t}'+lambda*W{t}; % 5.67
db = sum(R{t},2);
W{t} = W{t}-eta*dW; % 5.43
b{t} = b{t}-eta*db;
end
end
L = L(2:iter);
model.W = W;
model.b = b;
19 changes: 19 additions & 0 deletions chapter05/mlpClassPred.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
function [y, P] = mlpClassPred(model, X)
% Multilayer perceptron classification prediction
% logistic activation function is used.
% Input:
% model: model structure
% X: d x n data matrix
% Ouput:
% y: 1 x n label vector
% P: k x n probability matrix
% Written by Mo Chen (sth4nth@gmail.com).
W = model.W;
b = model.b;
T = length(W);
Z = X;
for t = 1:T-1
Z = sigmoid(W{t}'*Z+b{t});
end
P = softmax(W{T}'*Z+b{T});
[~,y] = max(P,[],1);
Loading