Here is a brief (3 pages) demonstration of how Maximum Likelihood produces the same estimates as the standard, OLS regression estimator that we have used to this point. We wouldn't expect you to be able to prove or derive this yourself, but we do think that at this point in the semester, you're probably able to read along and follow the argument.
As you're reading, note:
- How challenging a log and derivative does this approach create? Could you take this yourself?
- Are you surprised to notice that we arrive at the same estimator from the MLE perspective as from the OLS perspective?
- The standard errors of the MLE estimates and the OLS estimates are slightly different. Why do you think this is? Where do you see an appeal to the idea of convergence in probability? What is the consequence of this convergence in probability?