RELEASE-NOTES.txt

Stan Math Library Release Notes

======================================================================
v4.8.1 (23 January 2024)
======================================================================

- The distributions defined over cholesky factors now check that their arguments are in support (#3007)
- Fixed an issue with the `inv_wishart_cholesky_rng` function (#3007)

======================================================================
v4.8.0 (16 January 2024)
======================================================================

 - Updated Boost headers to v1.81 (#2955)
 - Added the Dirichlet-Multinomial distribution to the Stan Math library (dirichlet_multinomial_lpmf, dirichlet_multinomial_log, and dirichlet_multinomial_rng). (#2979)
 - `binomial_logit_glm_lpmf` distribution added (#2946)
 - Improved numerical stability of Bernoulli CDF functions (#2784)
 - Replaced the `std::vector`s  for read/write events in `matrix_cl` with `tbb::concurrent_vectors` (#2905)
 - Added framework for generic higher-order autodiff support through finite-differences. (#2929)
 - Minor cleanup of numerical stability, constants, compound functions in OpenCL kernels (#2934)
 - Fixes aliasing issue in adjoint acccumulation for OpenCL (#2943)
 - Improve numerical stability of `binomial_logit` distribution (#2945)
 - Automatically detect UCRT toolchain use on Windows (#2948)
 - Update multiple functions to delegate to the compound functions (`log1p_exp`, etc.) where possible (#2950)
 - Fixed the RPATH to the TBB library being passed twice on the command line (#2952)
 - Split `eigendecompose` mix tests to fix CI memory issues (#2953)
 - Fix distribution test failures with row-vector integer inputs to bernoulli CDF functions (#2954)
 - Simplified usage of ref-type deduction (#2958)
 - Added OpenCL support for the `binomial_logit_glm` distribution (#2960)
 - Fixed a build configuration issue where the optional tbbmalloc and tbbmalloc_proxy libraries were built unconditionally. (#2969)
 - Remove unused meta requires (#2976)
 - Bugfix for uninitialized `var_value<Matrix>` types and assignment when the rhs of the assignment is an expression (#2978)
 - Invert the input matrix in-place which reduces the memory requirement for this function. (#2981)
 - Fixes the rng function for the Inverse Wishart Cholesky distribution. (#2984)
 - Fixed a compilation issue with the `check_symmetric` function on complex-valued inputs under LLVM 17 (#2991)
 - Fixed missing include guard for OpenCL rev headers (#2996)
 - Add helpful context to the failure message for the ad testing framework (#2997)

======================================================================
v4.7.0 (5 September 2023)
======================================================================

- Improved accuracy of `matrix_exp_multiply`. (#2619)
- Added framework for vectorising ternary functions (#2642)
- Added `select()` function for vectorised ternary operations, as well as the `any()` and `all()` boolean reduction functions (#2853)
- Added a minimum number of iterations (5) to the `grad_2f1` function to avoid early convergence (#2858)
- Fixed some spelling errors in documentation (#2908)
- Added new function in `stan::math::internal`, `finite_diff_hessian_vector_product_auto` (#2914)
- Fixed an issue with the return type of `eigenvalues` being incorrect (#2915)
- Delete jquery files from vendored Boost documentation (#2916)
- Fixed fvar/higher-order autodiff compatibility with GLM distributions (#2917)
- Fixed a type issue in the tests for inv_wishart_cholesky. (#2920)
- The constraint tolerance can now be adjusted by defining the `STAN_MATH_CONSTRAINT_TOLERANCE` macro before including Stan headers. The default value remains `1E-8`. (#2921)
- Use -dumpfullversion if available in makefiles (#2922)
- Vectorised binary signature for `log_sum_exp` (#2930)
- Added new functions `qr_thin`, `eigendecompose_sym`, `eigendecompose`, `complex_schur_decompose`, `svd`, and `csr_extract`. Each of these is equivalent to calling several existing functions, but should be more efficient due to sharing work. For example, `svd(m)` is equivalent to `(svd_U(m), singular_values(m), svd_V(m))`. (#2931)

======================================================================
v4.6.2 (15 May 2023)
======================================================================

- Fix an issue in Stan, see release notes.

======================================================================
v4.6.1 (2 May 2023)
======================================================================

- Fix an issue when compiling with newer versions of LLVM's libc++ (#2892)

======================================================================
v4.6.0 (20 April 2023)
======================================================================

- Updated to Eigen 3.4 (#2583)
- operands_and_partials expanded to accept 8 parameters. (#2833)
- Added new multidimensional integration routine `hcubature`. (#2838)
- Added eigenvalues and eigenvectors overloads which accept matrices with complex values. (#2846)
- Allowed Sundials to be linked to separately installed version. (#2861)
- Fixed `is_nan` to compile under the Microsoft compiler suite. (#2870)
- Fixed `csr_to_dense_matrix` to work in case of matrix with empty first row. (#2877)
- Fixed a bug where `eigenvalues_sym` would return a matrix with a dynamic number of rows equal to 1, rather than a vector. (#2884)
- Updated `eigenvalues_sym` return type for RcppEigen compatibility. (#2887)
- Various testing improvements. (#2832, #2864, #2869, #2878, #2880)
- Fixed an issue with building TBB on gcc 13. (#2890)

======================================================================
v4.5.0 (21 November 2022)
======================================================================

 - Added the generalised hypergeometric function. (#2510)
 - Added var<Matrix> overloads for  `pow()`, `owens_t()`, `log_inv_logit` . (#2546, #2787, #2806)
 - Add tuple overload and cleanup code for promote_scalar. (#2706)
 - New gradient function allows evaluated gradient to be stored into an array, enabling memory reuse when stan/math called via a FFI. (#2743)
 - Added function for casting `double` and `std::vector<double>` types to `int` and `std::vector<int>` types(#2771)
 - Added `hypergeometric_2F1` function. (#2792)
 - Fix bug with vectorised `pow()` incorrectly forcing `Eigen::Matrix` return instead of `Eigen::Array`. (#2793)
 - Exposed the `hypergeometric_3F2` function and improved its numerical stability. (#2797)
 - Fixed incorrect return from `log_inv_logit_diff` with positive infinity first argument. (#2798)
 - Added reverse-mode specializations for `fft2` and `inv_fft2`. (#2800)
 - Removed the cause of the out-of-order initializer compiler warning.(#2808)
 - Vectorized `atan2()`.(#2812)
 - Added `complex_schur_decompose_t` and `complex_schur_decompose_u`.(#2814)
 - Vectroized the `conj()` function. (#2817)
 - Stan's algebraic solvers now support variadic arguments. (#2820)
 - Fixed forward mode autodiff for FFT functions. (#2821)
 - Added `log_sum_exp_signed` function for computing `log_sum_exp` while respecting signs of arguments and tracking the sign of the result. (#2829)
 - `stan_print` can now print std::tuple types. (#2835)
 - `stan::math::minus` can now accept `std::vector`. (#2840)

======================================================================
v4.4.0 (1 July 2022)
======================================================================

Complex:
 - Added fast Fourier transform (FFT) implementations of 1D and 2D discrete Fourier transform and their inverses. (#2686, #2750)
 - Added type traits for detecting whether a type has a value type that is complex. (#2694)
 - Added vectorized versions of get_real and get_imag. (#2700)
 - Vectorized complex `abs()`, `to_complex`. (#2737, #2749)

New functions:
 - Added the incomplete beta function inverse. (#2637)
 - Added the `wishart_cholesky_lpdf`, which is the Cholesky parameterization of the Wishart distribution of both the input matrix and the scale matrix. (#2710)
 - Added the  inverse Wishart Cholesky parameterization `lpdf\lupdf` is available along with `inv_wishart_cholesky_rng`. (#2713)
 - Added the Cholesky parameterization of the multivariate Student's T distribution. (#2715)
 - Added the quantile function of the standard normal density function with log input. (#2744)

Misc:
 - Upgraded Sundials to version 6.1.1. (#2692)
 - Updated Boost to 1.78.0. (#2725)
 - Added namespace qualifiers to `size` and `apply` calls for C++17 compatibility. (#2693)
 - Enabled address sanitizer and fix it when running inside Docker in Flatiron CI. (#2702)
 - Made `arena_allocator` usable for all std containers. (#2708)
 - Added gradient calculation for the a2 input in stan-math for the `grad_2F1` function and fixes incorrect return values when `z` is between -1 and 0. (#2721)
 - Updated the signatures of `eigenvalues` and `eigenvectors` to use our pseudo-concept require templates.(#2728)
 - `norm1` and `norm2` were extended with the `apply_vector_unary` to accept general std::vectors as well as the Eigen vectors previously supported. (#2742)
 - Cleaned up compiler warnings for unused variables and non-consistent use of class and struct for ops_partials_edge. (#2757)
 - Fixed the check for `y` in `gamma_lpdf` so that it now errors if y is not a positive finite value. (#2758)
 - Improved numerical stability of gamma_lcdf gradients (#2767)

======================================================================
v4.3.2 (25 March 2022)
======================================================================

- Fixed template error for fma with matrix inputs.
- Fixed bug that did not allow for `var<Matrix>` types with different compile-time rows/cols to be assigned to one another.

======================================================================
v4.3.1 (2 March 2022)
======================================================================

- Fixed a bug with to_matrix(row_vector) and to_matrix(vector) that cause compile-time issues while the result was not stored in a temporary variable.

======================================================================
v4.3.0 (14 February 2022)
======================================================================

 - Changed constants to `static constexpr` for efficiency, added Euler's Gamma constant. (#2478)
 - Added `lmultiply()` that parallels `multiply_log()` so that the language doesn't have to rename. (#2596)
 - Cleaned up tests to include <CL/opencl.hpp> instead of <CL/cl2.hpp>. (#2610)
 - Added missing implementations of function signatures that shadow of math function signatures in the `std::` namespace. (#2612)
 - Fixed the bug where 2x2 matrix exponential function outputs NaN for matrices with large eigenvalues. (#2616)
 - Added compiler flag `-D_BOOST_LGAMMA` to allow users to force use of Boost `lgamma` implementation. (#2618)
 - Added more informative error messages for ODE solvers. (#2620)
 - Forceed linker to respect rpath set at compile time. (#2627)
 - Upgraded Sundials to 6.0.0. (#2629)
 - Added `norm1()` and `norm2()`. (#2636)
 - Added missing `von_mises_cdf_log` and `von_mises_ccdf_log` signatures. (#2640)
 - Refactored the DAE solver.(#2644)
 - Added the inverse of the complementary error function - `inv_erfc()`. (#2645)
 - Updated member functions for accessing tangents in scalar `fvar` types. (#2652)
 - Added support for the new matrix type for GLM functions. (#2655)

======================================================================
v4.2.1 (21 October 2021)
======================================================================

- Fixed issues with Adjoint ODE memory management
- Fixed included OpenCL headers

======================================================================
v4.2.0 (5 October 2021)
======================================================================

Miscellaneous:
 - Updated Powell and Newton solvers to use an adjoint method to propagate derivatives in reverse mode which should result in modest speed-up. Added variadic interfaces (`algebra_solver_powell_impl` and `algebra_solver_newton_impl`). (#2421)
 - Cleanup unused local typedefs. (#2505)
 - Fix tbb initialization so that if STAN_THREADS is not defined then the number of threads is set to 1. (#2514)
 - Bugfix for range checks not overriding STAN_THREADS. (#2530)
 - Bugfix for vectorized log include order. (#2542)
 - Added vectorized checks called by the stanc compiler. (#2556)
 - Adds an overload for the constrain functions on whether to accumulate jacobians into log probability argument.(#2559)
 - Updated algorithm for inv_Phi which is approximately 2x faster with precision of 16 digits. Based on the Fortran algorithm described in Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484.(#2566)
 - Vectorized unconstrain and constraing functions. (#2574, #2580)
 - Fixed generalized inverse so that it works for less than full rank symmetric matrices. (#2577)

Varmat:
 - Added `var<Matrix>` overloads for `append_row()`, `append_col()`, `rep_vector()`, `rep_row_vector()`, `to_vector()`, `divide()`. (#2484, #2487, #2521, #2557)
 - Added nested vectorized functions for the new matrix type. (#2502)
 - Added support for basic slicing and dimension queries operators support var<Matrix> types. (#2507)
 - Added several unary functions for `var<Matrix>` as well as division.(#2527)
 - Allow accumulator to accept `var<Matrix>` matrix types(#2535)

Complex:
 - Unary vectorized functions support containers of complex numbers(#2549)
 - Added complex scalar functions get_real(), get_imag(), and to_complex.(#2554)

OpenCL:
 - Added prim-only OpenCL implementations for `qr_Q`, `qr_R`, `qr_thin_Q`, `qr_thin_R`. (#2479)
 - Fixed a bug for kernel generator operation `block`, `eigenvalues_sym`, `eigenvectors_sym`. (#2479, #2512)
 - Added OpenCL implementation for `cumulative_sum`. (#2483)
 - Fixed a bug that made OpenCL `identity_matrix` unusable. (#2499)
 - Added reverse mode for indexing.(#2511)
 - Fixed a number of bugs related to conversions of `arena_matrix_cl` into `matrix_cl`. Before this PR such a conversion invoked kernel generator and made a copy of data. Now only references to data are updated.(#2538)
 - Fixes kernel generator allocating more local memory than it needed.(#2541)
 - Added OpenCL implementation of sorting (`sort_asc` and `sort_desc`).(#2550)
 - Bugfix OpenCL kernels so that they can no longer modify a const `matrix_cl`.(#2553)

======================================================================
v4.1.0 (2 June 2021)
======================================================================

 - Added the Cash-Karp numerical integrator to improve numerical integration of ODEs with semi-stiffness and/or rapid oscillations.(#2336)
 - Added the quantile function.(#2398)
 - Added custom reverse mode for diag_pre_multiply() and diag_post_multiply() functions.(#2405, #2453)
 - Optimized `multi_normal_cholesky` for non-autodiff covariance. (#2439)
 - Updated Sundials to 5.7.0.(#2441)
 - Improved memory safety of nested paralellism.(#2445)
 - Updated TBB to 2020.3.(#2447)
 - Added the `STAN_NO_RANGE_CHECKS` macro which turns off bounds and range checks.(#2423, #2437)
 - Optimized `gp_*_cov` functions, especially for large amount of data.(#2464)
 - Fixed compilation errors when using `unsigned` and `long` types with `apply_scalar_unary`.(#2469)
 - Added the implementation of the loglogistic probability density function.(#2477)
 - Adds reverse mode specialization for `csr_matrix_times_vector(sparse data, dense parameter).(#2462)
 - Allow tbb init to set the number of threads by an argument.(#2455)
 - Fixed a bug with expressions in poisson distribution functions.(#2414)
 - Fixed the off by one error in set_zero_all_adjoints_nested.(#2399)
 - Fixed bug with printing Eigen expressions.(#2436)
 - Refactored operands and partials to avoid extra allocations.(#2418)
 - Tidied up distributions C++ code.(#2352)
 - Updated the integrate_1d internal interface updated in preparation for closures(#2397)
 - Added docs for new contributors with a getting started guide and docs for contributing new distributions.(#2350, #2466)
 - Added an ODE testing framework.(#2432)
 - Replaced the finite difference approximation of the Hessian from one that is based on function calls to one that is based on gradients.(#2348)
 - Updated code generation for expression tests.(#2419)
 - Fixed a bug in expression tests and benchmark generation, where downloading `stanc.exe` did not work on Windows.(#2480)
- Varmat:
    - Add `rep_*` utility functions for new matrix type(#2358)
    - `var<Matrix>` overloads for digamma, distance, Phi, inv_Phi, Phi_approx, sqrt, tail, tgamma, rows_dot_self, fma, offset_multiplier, bessel first and second kind, beta, binary log loss, ceil, erf, erfc, exp2, expm1, falling_factorial and floor (#2362, #2378, #2396, #2461)
    - Added lb/ub/lub_constrain specializations.(#2373, #2382, #2387, #2379)
    - Added script to automatically check stanc3 signatures for varmat compatibility.(#2434)
- OpenCL:
    - Fixed OpenCL implementations of distributions mostly not working with row vectors.(#2360)
    - Added prim and rev OpenCL implementations for `to_matrix`, `to_vector`, `to_row_vector`, `to_array_1d`, `to_array_2d`, `append_array`, `reverse`, `symmetrize_from_lower_tri`, `symmetrize_from_upper_tri` `trace`.(#2377, #2383, #2388)
    - Added OpenCL functions `rep_matrix`, `rep_vector`, `rep_row_vector`, `rep_array` and `identity_matrix`.(#2388)
    - Added operator %.()
    - Reorganized how work is distributed between threads in generated kernels that use colwise reductions (including all distributions), significantly improving GPU preformance.(#2392)
    - Removed `.triangularTranspose()` member funtion from `matrix_cl` and `TriangularMapCL` enum. `.triangularTranspose()` is replaced by `symmetrize_from_lower_tri()`.(#2393)
    - Added support for two dimensional reductions to kernel generator.(#2403)
    - Added OpenCL implementations for functions `log_mix`, `log_softmax`, `log_sum_exp`, `rank`, `sd`, `softmax` and `˙variance`.(#2426)
    - Added OpenCL implementations for `ub_constrain`, `lb_constrain`, `lub_constrain`, `offset_multiplier_constrain` and `unit_vector_constrain`.(#2427)
    - Added OpenCL implementation for `prod` function and kernel generator operation for rowwise, colwise and 2d product.(#2433)
    - Added OpenCL implementations for functions: `bernoulli_cdf`, `bernoulli_lcdf`, `bernoulli_lccdf`, `cauchy_cdf`, `cauchy_lcdf`, `cauchy_lccdf`.(#2446)
    - Added OpenCL implementations for functions `double_exponential_cdf`, `double exponential_lcd`, `double_exponential_lccdf`˙, `exp_mod_normal_cdf`, `exp_mod_normal_lcdf` and `exp_mod_normal_lccdf`.(#2449)
    - Added OpenCL implementations for functions `exponential_cdf`, `exponential_lcdf`, `exponential_lccdf`, `frechet_cdf`, `frechet_lcdf` and `frechet_lccdf`.(#2450)
    - Added OpenCL implementations for functions `gumbel_cdf`, `gumbel_lcdf`, `gumbel_lccdf`, `logistic_cdf`, `logistic_lcdf` and `logistic_lccdf`.(#2451)
    - Added a new kernel generator operation that allows writing custom OpenCL code.(#2454)
    - Added OpenCL implementations for functions `pareto_cdf`, `pareto_lccdf`, `pareto_lcdf`, `pareto_type_2_cdf`, `pareto_type_2_lccdf`, and `pareto_type_2_lcdf`.(#2456)
    - Added OpenCL implementations for functions: `rayleigh_cdf`, `rayleigh_lccdf`, `rayleigh_lcdf`, `skew_double_exponential_cdf`, `skew_double_exponential_lccdf`, `skew_double_exponential_lcdf` and `skew_double_exponential_lpdf`.(#2457)
    - Added OpenCL implementations for functions `lognormal_cdf`, `lognormal_lccdf`, `lognormal_lcdf`, `normal_cdf`, `normal_lccdf`, `normal_lcdf`.(#2458)
    - Added OpenCL implementations for functions `std_normal_cdf`, `std_normal_lccdf`, `std_normal_lcdf`, `uniform_cdf`, `uniform_lccdf` and `uniform_lcdf`.(#2459)
    - Added OpenCL implementations for functions `weibull_cdf`, `weibull_lccdf` and `weibull_lcdf`.(#2460)
    - Removed unused OpenCL kernels and checks.(#2463)
    - Added OpenCL prim implementation for functions: `gp_exponential_cov`, `gp_matern32_cov`, `matern_52_cov` and both prim and rev implementation for `gp_dot_prod_cov`.(#2471)
    - Added reference (`ref_type`) for kernel generator expressions.(#2404)
    - Added typecast operation to kernel generator.(#2472)

======================================================================
v4.0.1 (17 February 2021)
======================================================================

- Fixed issue with `cholesky_decompose` not propagating derivatives correctly .
- Fixed OpenCL implementations of distributions mostly not working with row vectors.
- Fixed handling of input expressions with reduce_sum.

======================================================================
v4.0.0 (26 January 2021)
======================================================================

New functions:  

    - Add `symmetrize_from_lower_tri(matrix m)` function to `stan::math`.(#2209)
    - Added the Moore-Penrose generalized inverse, available with the `generalized_inverse` function.(#2225)
    - Implement `svd_U` and `svd_V`, add gradients to `singular_values`.(#2286)
    - Added linspaced_int_array.(#2312)
    
Miscellaneous improvements and bugfixes:  

    - Speedup `reduce_sum` parallelism.(#2162)
    - Make gradients for negative binomial and 2F1 function more robust for boundary values.(#2175)
    - Fixed problem with integrate_1d tolerances(#2183)
    - Fixed a bug in check_matching_dims that prevented checking vectors of Eigen matrices of vars.(#2220)
    - Updated log-likelihood calculation for `ordered_probit_lpmf` for increased numerical stability(#2229)
    - Updated TBB makefiles to support building on M1 Macs.(#2208)
    - Updated `mdivide_left` to avoid doing QR decomposition in reverse passes. (#2230)
    - Upgraded to Eigen 3.3.9.(#2238)
    - Upgraded Sundials to 5.6.1.(#2242, #2283)
    - Use immediately invoked lambdas in size and range error checks to improve code caching. (#2255)
    - Added a python script that can generate and run benchmarks for most of function signatures in math.(#2260)
    - `scalar_type_t` now decays references for `var_value<T>` types(#2264)
    - Added `adjoint_of()` function that accesses `.adj()` of `var`s, but analogous to `value_of()` also works on prim types (returning a dummy object).(#2270)
     - Wishart and inverse wishart functions now check that symmetric positive definite inputs are actually symmetric positive definite.(#2281)
    - Fix potential segfault when mixing reduce_sum() and print().(#2319)
    - Fixed a bug in dirichlet_lpdf and multi_normal_cholesky_lpdf, where function overloads that do broadcasting produced wrong derivatives. (#2331)

Expressions:  

#2093, #2131, #2133, #2136, #2138, #2139, #2150, #2151, #2186, #2190, #2205

    - Generalized all remaining prim functions.
    - Prim functions now return expressions where applicable.
    - Removed the requirement for linear indexing from prim function
    - Improved expressions testing: No longer relying on an separate file of functions that do not support expressions

OpenCL:  

#2117, #2155, #2176, #2177, #2180, #2181, #2215, #2173, #2174, #2184, #2185, #2191, #2216, #2217, #2217, #2219, #2221, #2222, #2226, #2231, #2236, #2236, #2250, #2253, #2258, #2259, #2263, #2267, #2270, #2272, #2273, #2274, #2275, #2294, #2295, #2296, #2311, #2316

    - Added OpenCL reverse mode support for lpdf/lpmf functions: bernoulli_lpmf, bernoulli_logit_lpmf, beta_lpdf, beta_proportion_lpdf, binomial_lpmf, cauchy_lpdf, chi_square_lpdf, double_exponential_lpdf, exp_mod_normal_lpdf, exponential_lpdf, frechet_lpdf, gamma_lpdf, gumbel_lpdf, inv_chi_square_lpdf, inv_gamma_lpdf, logistic_lpdf, lognormal_lpdf, neg_binomial_lpmf, neg_binomial_2_lpmf, neg_binomial_2_log_lpmf, normal_lpdf, pareto_lpdf, pareto_type_2_lpdf, poisson_lpmf, poisson_log_lpmf, rayleigh_lpdf, scaled_inv_chi_square_lpdf, skew_normal_lpdf, std_normal_lpdf, student_t_lpdf, uniform_lpdf and weibull_lpdf.
    - Added OpenCL reverse mode support for acos, acosh, add, add_diag, asin, asinh, atan, atanh, beta, block, cbrt, ceil, cholesky_decompose, col, cols, columns_dot_product, columns_dot_self, cos, cosh, crossprod, diag_matrix, diagonal, diag_post_multiply, diag_pre_multiply, digamma, dims, distance, dot_product, dot_self, elt_divide, elt_multiply, erf, erfc, exp, exp2, expm1, fabs, floor, head, hypot, inv, inv_cloglog, inv_logit, inv_Phi, inv_sqrt, inv_square, lbeta, ldexp, lgamma, lmultiply, log, log10, log1m, log1m_exp, log1m_inv_logit, log1p, log1p_exp, log2, log_diff_exp, log_inv_logit, log_inv_logit_diff, logit, mdivide_left_tri_low, mdivide_right_tri_low, mean, multiply, multiply_log, num_elements, Phi, Phi_approx, pow, round, row, rows, rows_dot_product, rows_dot_self, segment, sin, sinh, size, sqrt, square, squared_distance, sub_col, sub_row, subtract, sum, tail, tan, tanh, tcrossprod, tgamma, transpose and trunc.
    - Reduced amount of copying when using OpenCL. CPUs and integrated GPUs can use host memory in place. Transferring data to discrete GPUs can avoid one copy to pinned memory on host side if the new `pinned_matrix` class is used. Depending on the OPenCL implementation, optimization for CPUs and Discrete GPUs may also require use of this class. (#2219)
    - Improved kernel generator: simplified writing of broadcasting tests, fixed a bug when using same operations on different sets of unique matrices, fixed `check_cl`, extended support for integer arguments, added support for kernel expressions returning a scalar, added compound assignment operators, made transpose kernel generator operation assignable.
    - Bugfixed `arena_matrix_cl` so it can be assigned a `matrix_cl`.(#2270)

Varmat:  

    - Added varmat implementations of cholesky_decompose, columns_dot_self, determinant, dot_self, inverse, log_determinant, matrix_power, multiply_lower_tri_self_transpose, and tcrossprod, add, operator+, operator-, quad_form, trace_quad_form, transpose (member function), variance, sd, log_softmax, log_sum_exp, divide, trace_gen_quad_form, dot_product, columns_dot_product, rows_dot_product, squared_distance, mdivide_left_spd, mdivide_left_tri, mdivide_left, log_determinant_spd, mdivide_left_ldlt, mdivide_right_ldlt, log_determinant_ldlt, trace_gen_inv_quad_form_ldlt, trace_inv_quad_form_ldlt (#2105, #2106, #2115, #2163, #2164, #2168, #2169, #2171, #2172, #2196, #2199, #2212, #2213, #2230, #2232, #2266, #2280, #2305)
    - Adds `var_value<Matrix>` overloads for array and column vector converters.(#2278)
    - Make multivariate distributions work with `varmat`(#2284)
    - Adds sequence views for `var<matrix>` to make some of the distributions var matrix compatible(#2292)
    - `to_var_value` and `from_var_value` now work with `std::vector` types(#2299)
    - Add more methods for taking blocks of var matrices.(#2188)
    - Updating matrix constrains to work with `varmart`(#2254)
    - Adds arc trig functions for `var<matrix>` along several other unary operators (#2256)

======================================================================
v3.4.0 (26 October 2020)
======================================================================

 - Generalize signatures to accept general Eigen expressions (#1844, #1931, #1944, #1945, #1944, #2022, #2023, #2028, #2029, #2056, #2058, #2061, #2062, #2066, #2068, #2069, #2070, #2072, #2073, #2074, #2075, #2081, #2082, #2087, #2088, #2089, #2104, #2109, #2110, #2118, #2119, #2134, #2137, #2140)
 - Add templates to `var` and `vari` so that autodiff can use multiple floating point types.(#1915)
 - Made binomial_lpmf work more reliably when the probability parameter is 0.0 or 1.0(#1988)
 - Fixed requires docs(#1990)
 - fixed doxygen documentation of `hmm_marginal`(#1994)
 - Vectorised binary scalar functions (#1987, #2002, #2005) 
 - Added `reverse_pass_callback`, a simpler and faster alternative to `adj_jac_apply`. `dot_product` implementation that accepts pointers has been removed.(#2011)
 - Added missing implementations for the unary plus.(#2032)
 - Reduced some duplicate ODE test code(#2039)
 - Fix problems with higher order gradients in probability test framework(#2042)
 - forward_as requires types match exactly -- not just be convertible(#2054)
 - C0 in gaussian_dlm_obs_lpdf and gaussian_dlp_obs_rng can now be positive semidefinite(#2143)
 - Made behaviour of ```fmin``` and ```fmax``` with equal inputs equivalent across reverse- and forward-mode(#2063)
 - Cleaned up the use of `<Eigen/*>` outside the main Eigen header file.(#2080)
 - fixed numerical issue with `quad_form_sym`.(#2096)
 - Added function for creating an ordered integer sequence(#2108)
 - The states returned by hmm_hidden_state_rng now live on {1, 2, ..., K}, rather than {0, 1, ..., K - 1}.(#2113)
 - Removed a duplicated check function.(#2126)

var_value: 
 - Allowed `vari` to hold an Eigen type(#1952)
 - `operands_and_partials` now supports `var_value` operands.(#1970)
 - Added views for var_value<Eigen::Matrix> class(#2024)
 - Added functions for conversion between `var_value` and Eigen matrix of vars and a metaprogram that determines return type of a function that accepts a mixture of `var_value`s and Eigen matrices and propagates `var_value` if present.(#2047)
 - Adds softmax function for ˙var_value` matrices and testing scheme for checking array of structs vs struct of arrays matrix var implimentation.(#2050) 
 - Adds views to `vari_value<Matrix>` types that allow for compound slicing expressions.(#2064)
 - Enabled assigning to blocks and other views into `var_value`.(#2065)
 - Adds multiplication function for var matrices and a test suite for checking the correctness of binary functions that accept var matrices as an input.(#2091)
 - Add var and var matrix specializtion elementwise product.(#2121)
 
Testing: 
  - Updated test values for pow() to fixed false positives in the testing framework(#2123)
  - Upgraded Google Test to v1.10.x.(#1995)
  - Added Google Benchmark as a test dependency.(#1995)
  - Added testing framework for checking that functions exposed to stan language accept Eigen Expressions.(#1980, #2027)
  - Removed unit tests for reduce_sum threading that caused false positives.(#2033)
  - Added more tests for vectorized probability functions(#2085)
  - Added makefile variables that will be used for optimizations in the upstream interfaces (#2020)
  - Jumbo tests are used for all test/unit/math unit tests.(#2057)
  - Added address sanitizer to continuous integration.(#2146)

Building:
 - Removed the use of the GNU cut utility.(#2126)
 - Improved error messaging on Windows if the C++ toolchain was installed in a folder with spaces.(#2009)
 - Added flags to suppress warnings when building TBB.(#1993)
 - Switched `/bin/bash` with `/usr/bin/env bash` in the makefiles.(#1998)
 
OpenCL backend: 
  - Added support for elementwise checks to kernel generator(#1977)
  - Removed `matrix_cl<var>` specialization.(#2021)
  - OpenCL implementation `normal_id_glm_lpdf` can also be used when derivatives of `x` or `y` are needed.(#2034)
  - OpenCL implementations of `bernoulli_logit_glm_lpmf` and `poisson_log_glm_lpmf` can also be used when derivatives of `x` are needed.(#2035)
  - Added index operations to kernel generator.(#2051)
  - Implemented indexing for kernel generator expressions.(#2052)
  - OpenCL implementations of `neg_binomial_2_log_glm_lpmf`, `ordered_logistic_glm_lpmf` and `categorical_logit_glm_lpmf` can also be used when derivatives of `x` are needed.(#2055)
  - Added an option to select the OpenCL device to use at runtime.(#2067)
  - Simplified tests of GPU GLMs by moving repeated code into a common utility header.(#2097)
  - Added OpenCL reverse mode implementations of matrix multiply and sum.(#2099)
  - vari_base can now be constructed with matrix_cl value and adjoint. Copying between host and device is possible for vars containing Eigen::Matrix/matrix_cl. Adjoints are propagated over these copies.(#1967)


======================================================================
v3.3.0 (28 July 2020)
======================================================================

 - New functions is_nonnegative and is_positive_finite to parallel check_nonnegative and check_positive_finite. They signal failure by returning false instead of by throwing std::domain_error.(#1798)
 - Functions check_not_nan, check_nonnegative, check_positive, check_finite, check_positive_finite, is_not_nan, is_nonnegative, is_positive, is_scal_finite, and is_positive_finite now operate on nested containers.(#1798)
 - Clearer error messages when csr_u_to_z is called with out of range indices.(#1798)
 - check_positive now throws domain error when given an unsigned 0.(#1798)
 - Adds documentation for the OpenCL modules.(#1804)
 - OpenCL kernel for matrix-vector multiply is replaced with kernel generator.(#1816)
 - Added optional broadcasting to kernel generator.(#1817)
 - Adds additional testing for checking type traits of Eigen objects and modifies type traits for Eigen matrices so detecting an Eigen matrix will only be true if neither rows or columns are equal to 1(#1827)
 - Generalized functions with names starting with letters e-l to accept general Eigen expressions.(#1828)
 - Updates to Eigen 3.3.7(#1832)
 - extended Eigen scalar traits to complex autodiff types to enable complex autodiff matrix operations(#1838)
 - added eigenvalues(), eigenvectors(), pseudo_eigenvalues(), pseudo_eigenvectors(), and complex_schur_decompose() functions(#1838)
 - Added  function for more efficient extraction of values and derivatives from Eigen containers of autodiff variables(#1841)
 - Generalized function with names starting with m to p to accept geenral Eigen expressions.(#1845)
 - Generalized functions with names starting with q and r.(#1847)
 - Generalized functions starting with s and t.(#1848)
 - Cleaned up the use of Boost headers.(#1851)
 - Fixed issue where doing nested gradients while there is an ongoing chain would cause segfaults.(#1856)
 - Added support for `int add(int,int)` and `real add(real,real)`.(#1858)
 - Added OpenCL support for prim versions of add, col, inv, inv_cloglog, inv_logit, inv_sqrt, inv_square, row, dims, cols.(#1859)
 - Added argument checks to OpenCL version of block() and renamed `is_valid_expression` to `is_valid_kernel_expression` and `require_all_valid_expressions*` to `require_all_valid_kernel_expressions*`.(#1859)
 - Added floating point classification functions (`isfinite`, `isnan`, `isinf`) to kernel generator.(#1860)
 - Added diagonal operation to kernel generator.(#1862)
 - Adds holder operation to kernel generator.(#1865)
 - Added matrix concatenation oprations (`append_row` and `append_col`) to kernel generator.(#1867)
 - Add two functions for hmm models that (i) evaluate the marginal posterior probability of each latent state and (ii) sample latent states from their posterior distribution.(#1868)
 - Renamed elewise_multiplication/elewise_division to elt_multiply/elt_divide in the kernel generator and added OpenCL support to /prim functions crossprod, tcrossprod, fabs, log1m_inv_logit, logit, divide(#1869)
 - Updated `cpplint` to conform to `pypi`(#1871)
 - `value_of` and `value_of_rec` now can now return Eigen expressions.(#1872)
 - Reverse mode differentiation of `stan::math::eigenvectors_sym` and `stan::math::eigenvalues_sym` now returns symmetric matrix (previously lower triangular).(#1878)
 - Refactored Jenkins CI stages.(#1881)
 - Implements both branches of the Lambert W function.(#1882)
 - Added -Wno-int-in-bool-context and -Wno-attributes compiler flags to silence warnings due to compiler bugs.(#1885)
 - Fixed bug causing wrong log-probability values to be returned when using the lognormal distribution with an integer argument for sigma.(#1894)
 - Unified EXPECT_MATRIX_* and EXPECT_STD_VECTOR_* macros in the Math unit tests.(#1895)
 - Added Github Actions to CI.(#1898)
 - Generalized GLM functions so they accept arbitrary Eigen expressions.(#1899)
 - Added framework for vectorizing/broadcasting binary scalar functions, and utilities for testing the values and gradients of these(#1907)
 - Fixed a bug where the appropriate specialization for the beta function was not called for forward-mode autodiff variables(#1909)
 - Added holder operation for Eigen expressions. `value_of`, `value_of_rec`, `as_column_vector_or_scalar` and `as_array_or_scalar` now work with rvalue inputs.(#1914)
 - Generalized functions related to Bernoulli distribution to accept Eigen expressions.(#1916)
 - Vectorized binary functions extended to take complex variables as inputs(#1917)
 - Fixed a bug that resulted in wrong result when constructing `matrix_cl` from row-major matrix or expression.(#1919)
 - Generalized functions related to Bernoulli distribution to accept Eigen expressions.(#1925)
 - Add log-parametrized multinomial distribution.(#1927)
 - Replaced OpenCL kernels for `bernoulli_logit_glm_lpmf` and `poisson_log_glm_lpmf` with kernel generator implementations.(#1929)
 - Generalized functions related to beta distribution to accept general Eigen expressions.(#1930)
 - Fixed one sided checks in `test_repeat_as_vector` in distribution tests.(#1933)
 - Simplified using `operands_and_partials` by making `broadcast_array` sum the argument when assigned to.(#1934)
 - Extended check_matching_dims to handle arbitrary containers.(#1936)
 - Bugfix for require style templates(#1942)
 - Fix segfaults when compiling ```mix``` tests for ```pow``` on Windows with g++ 4.9.3(#1951)
 - Extends binary vectorisation framework to take combinations of ```real``` and ```int``` containers as inputs(#1966)
 - Fixed generated OpenCL kernels using colwise reductions to work on OpenCL implementations that require local variables to be declared at the top level of kernel function.(#1975)
 - Added variadic ODE interfaces (`ode_bdf`, `ode_adams`, `ode_rk45` and `ode_bdf_tol`,`ode_adams_tol`,
`ode_rk45_tol`) and deprecated the old interfaces (`integrate_ode_rk45`, `integrate_ode_bdf`, `integrate_ode_adams`). It is important to switch to the new interfaces because the deprecated interfaces are much slower (we've seen up to 30%) than they previously were (#1641)

======================================================================
v3.2.0 (22 April 2020)
======================================================================

### Feature summary:
* Added new `reduce_sum` utility for parallelizing calculations over multiple cores of one computer
* Sundials upgrade to version 5.2.0
* Support for size 0 matrices [Ben: This needs checked by someone in release candidate, it might be just a Math-facing feature]
* discrete_range_lpmf/lcdf/lccdf/rngs/etc. added for uniform discrete random variable
* Added row vector, array and int_array construction utilities
* Adds reverse function for vectors, row vectors, and std::vectors
* Added release notes section to pull request to make assembling release notes easier (this must be filled out for new pulls)
* Added apply operator (standin for C++17 apply)
* Added support for complex numbers at all levels of the autodiff library

### Bugfix summary:
* Fixed problems with vectorizing neg_binomial_* functions that lead to wrong answers
* Improved lbeta to be more numerically stable with one large and one small argument
* Improved numerical stability with binomial_coefficient_log, neg_binomial_2_lpmf, and neg_binomial_2_log_lpmf computations
* Fixed problem with wrong gradients for large arguments to log_sum_exp
* Fixed bug where normal_id_glm did not work with a sigma is not an autodiff type
* Fixed problem with makefiles on Windows
* Fix ode gradients with respect to t0

There were also extensive work in various parts of the Math library adding docs, cleaning up distribution functions, adding tests, generalizing functions to work with special Eigen temporary types, working with new template functions (`return_type_t`, etc.), and otherwise making quality of life improvements.

Thanks to everyone who submitted and reviewed pull requests! The list of pull request submitters for this release are (in no particular order): @t4c1, @mcol, @martinmodrak, @rok-cesnovar, @serban-nicusor-toptal, @SteveBronder, @kedartal, @wds15, @bob-carpenter, @peterwicksstringfield, @andrjohns, @pgree, @IvanYashchuk, and @bbbales2.

### Pull request details:
|Contributor                                                                                                               |Title                                                                   |
|:-------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------|
|[t4c1](https://github.com/t4c1) : ([#1824](https://github.com/stan-dev/math/pull/1824))                                   |Add unary operations minus and logical negation to kernel generator     |
|[wds15](https://github.com/wds15) : ([#1819](https://github.com/stan-dev/math/pull/1819))                                 |Feature/issue 1818 upgrade sundials 520                                 |
|[SteveBronder](https://github.com/SteveBronder) : ([#1813](https://github.com/stan-dev/math/pull/1813))                   |adds reduce_sum and tests                                               |
|[t4c1](https://github.com/t4c1) : ([#1802](https://github.com/stan-dev/math/pull/1802))                                   |Add extended kernel generator docs                                      |
|[t4c1](https://github.com/t4c1) : ([#1801](https://github.com/stan-dev/math/pull/1801))                                   |Add support for device functions to kernel generator                    |
|[SteveBronder](https://github.com/SteveBronder) : ([#1800](https://github.com/stan-dev/math/pull/1800))                   |Adds auxilary functions needed for reduce_sum                           |
|[t4c1](https://github.com/t4c1) : ([#1797](https://github.com/stan-dev/math/pull/1797))                                   |Add broadcasting to kernel generator                                    |
|[SteveBronder](https://github.com/SteveBronder) : ([#1791](https://github.com/stan-dev/math/pull/1791))                   |adds an apply function and cleans adj_jac_apply to use it               |
|[peterwicksstringfield](https://github.com/peterwicksstringfield) : ([#1780](https://github.com/stan-dev/math/pull/1780)) |Feature/elementwise check                                               |
|[bob-carpenter](https://github.com/bob-carpenter) : ([#1774](https://github.com/stan-dev/math/pull/1774))                 |Feature/0123 complex funs                                               |
|[t4c1](https://github.com/t4c1) : ([#1769](https://github.com/stan-dev/math/pull/1769))                                   |Add transposition to kernel generator                                   |
|[t4c1](https://github.com/t4c1) : ([#1767](https://github.com/stan-dev/math/pull/1767))                                   |avoid redundant buffer allocations in kernel generator                  |
|[t4c1](https://github.com/t4c1) : ([#1751](https://github.com/stan-dev/math/pull/1751))                                   |Allow kernel generator to calculate multiple outputs in single kernel   |
|[t4c1](https://github.com/t4c1) : ([#1740](https://github.com/stan-dev/math/pull/1740))                                   |Add colwise reductions to kernel generator                              |
|[bob-carpenter](https://github.com/bob-carpenter) : ([#1736](https://github.com/stan-dev/math/pull/1736))                 |Feature/1734 return scalar meta                                         |
|[andrjohns](https://github.com/andrjohns) : ([#1727](https://github.com/stan-dev/math/pull/1727))                         |Eigen::Map and elementwise functions                                    |
|[t4c1](https://github.com/t4c1) : ([#1726](https://github.com/stan-dev/math/pull/1726))                                   |Bugfix common subexpression elimination in kernel generator             |
|[bob-carpenter](https://github.com/bob-carpenter) : ([#1720](https://github.com/stan-dev/math/pull/1720))                 |Feature/0123 complex spec                                               |
|[mcol](https://github.com/mcol) : ([#1716](https://github.com/stan-dev/math/pull/1716))                                   |Add discrete_range_cdf, discrete_range_lcdf and discrete_range_lccdf    |
|[martinmodrak](https://github.com/martinmodrak) : ([#1706](https://github.com/stan-dev/math/pull/1706))                   |Using local_nested_autodiff for all instances of nested autodiff        |
|[mcol](https://github.com/mcol) : ([#1680](https://github.com/stan-dev/math/pull/1680))                                   |Add discrete_range_rng(lower, upper) and discrete_range_lpmf            |
|[wds15](https://github.com/wds15) : ([#1675](https://github.com/stan-dev/math/pull/1675))                                 |avoid vari on chain-stack if var is constructed from an arithmetic type |
|[martinmodrak](https://github.com/martinmodrak) : ([#1657](https://github.com/stan-dev/math/pull/1657))                   |Improved behavior of expect_near_rel                                    |
|[mcol](https://github.com/mcol) : ([#1650](https://github.com/stan-dev/math/pull/1650))                                   |Add reverse                                                             |
|[mcol](https://github.com/mcol) : ([#1636](https://github.com/stan-dev/math/pull/1636))                                   |Add row vector, array and int_array construction utilities              |
|[t4c1](https://github.com/t4c1) : ([#1623](https://github.com/stan-dev/math/pull/1623))                                   |Add rowwise reductions to kernel generator                              |
|[martinmodrak](https://github.com/martinmodrak) : ([#1575](https://github.com/stan-dev/math/pull/1575))                   |Additional tests for distributions      
|[martinmodrak](https://github.com/martinmodrak) : ([#1830](https://github.com/stan-dev/math/pull/1830))                   |More stable implementation of neg_binomial_2_log_lpmf                               |

## Fixes

|Contributor                                                                                                               |Title                                                                               |
|:-------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
|[mitzimorris](https://github.com/mitzimorris) : ([#1842](https://github.com/stan-dev/math/pull/1842))                     |Bugfix/1839 make shell linux                                                        |
|[bob-carpenter](https://github.com/bob-carpenter) : ([#1837](https://github.com/stan-dev/math/pull/1837))                 |return value type for complex abs; fixes #1836                                      |
|[bbbales2](https://github.com/bbbales2) : ([#1834](https://github.com/stan-dev/math/pull/1834))                           |Fix ode gradients with respect to t0 (Issue #1833)                                  |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1825](https://github.com/stan-dev/math/pull/1825))                   |Add licensing info to log_modified_bessel_first_kind                                |
|[bbbales2](https://github.com/bbbales2) : ([#1810](https://github.com/stan-dev/math/pull/1810))                           |Added arithmetic version of pow (Fixes: #1809)                                      |
|[t4c1](https://github.com/t4c1) : ([#1807](https://github.com/stan-dev/math/pull/1807))                                   |fixed normal_id_glm for const sigma                                                 |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1795](https://github.com/stan-dev/math/pull/1795))                   |Makefiles: Add $(strip) to findfiles function                                       |
|[SteveBronder](https://github.com/SteveBronder) : ([#1794](https://github.com/stan-dev/math/pull/1794))                   |Cleanup requires with macros                                                        |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1789](https://github.com/stan-dev/math/pull/1789))                   |Reorganize Jenkins test stages                                                      |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1786](https://github.com/stan-dev/math/pull/1786))                   |Add a release notes section in the pull request template                            |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1784](https://github.com/stan-dev/math/pull/1784))                   |Makefiles: strip newlines in findiles and fix runChecks call                        |
|[serban-nicusor-toptal](https://github.com/serban-nicusor-toptal) : ([#1777](https://github.com/stan-dev/math/pull/1777)) |Moved verify changes Jenkins step to shared libraries.                              |
|[pgree](https://github.com/pgree) : ([#1772](https://github.com/stan-dev/math/pull/1772))                                 |Changed calculation of log_sum_exp(x1, x2)                                          |
|[peterwicksstringfield](https://github.com/peterwicksstringfield) : ([#1771](https://github.com/stan-dev/math/pull/1771)) |More documentation fixups.                                                          |
|[SteveBronder](https://github.com/SteveBronder) : ([#1768](https://github.com/stan-dev/math/pull/1768))                   |[WIP][WIP] Add docs and generic value type signatures for reduce_sum                |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1765](https://github.com/stan-dev/math/pull/1765))                   |Cleanup duplicated overloads of scalar std:: functions                              |
|[mcol](https://github.com/mcol) : ([#1762](https://github.com/stan-dev/math/pull/1762))                                   |Remove fwd versions of qr_Q and qr_R                                                |
|[mcol](https://github.com/mcol) : ([#1761](https://github.com/stan-dev/math/pull/1761))                                   |Move size_zero() calls after other consistency checks (L-Z)                         |
|[mcol](https://github.com/mcol) : ([#1758](https://github.com/stan-dev/math/pull/1758))                                   |Move size_zero() calls after other consistency checks (A-I)                         |
|[peterwicksstringfield](https://github.com/peterwicksstringfield) : ([#1756](https://github.com/stan-dev/math/pull/1756)) |Documentation fixups                                                                |
|[t4c1](https://github.com/t4c1) : ([#1754](https://github.com/stan-dev/math/pull/1754))                                   |Generalize */fun starting with cr-d                                                 |
|[t4c1](https://github.com/t4c1) : ([#1747](https://github.com/stan-dev/math/pull/1747))                                   |Generalize */fun startimg with chole-cov                                            |
|[mcol](https://github.com/mcol) : ([#1744](https://github.com/stan-dev/math/pull/1744))                                   |Reuse intermediate computations in distributions part 1                             |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1743](https://github.com/stan-dev/math/pull/1743))                   |Update license.md year to 2020                                                      |
|[t4c1](https://github.com/t4c1) : ([#1732](https://github.com/stan-dev/math/pull/1732))                                   |Generalize */fun starting with a-chol2                                              |
|[mcol](https://github.com/mcol) : ([#1730](https://github.com/stan-dev/math/pull/1730))                                   |Avoid 'Blocked loading mixed active content' errors                                 |
|[mcol](https://github.com/mcol) : ([#1728](https://github.com/stan-dev/math/pull/1728))                                   |Keep computations in log space and simplify expressions in beta binomial_*cdf       |
|[serban-nicusor-toptal](https://github.com/serban-nicusor-toptal) : ([#1724](https://github.com/stan-dev/math/pull/1724)) |Run CI/CD only on Source Code                                                       |
|[pgree](https://github.com/pgree) : ([#1722](https://github.com/stan-dev/math/pull/1722))                                 |remove discontinuity in lub_constrain; fixes #1441                                  |
|[mcol](https://github.com/mcol) : ([#1718](https://github.com/stan-dev/math/pull/1718))                                   |Remove size zero checks from check_multiplicable                                    |
|[mcol](https://github.com/mcol) : ([#1717](https://github.com/stan-dev/math/pull/1717))                                   |Remove promote_common and common_type                                               |
|[mcol](https://github.com/mcol) : ([#1715](https://github.com/stan-dev/math/pull/1715))                                   |Cleanup mdivide_* and tests                                                         |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1712](https://github.com/stan-dev/math/pull/1712))                   |fix develop                                                                         |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1711](https://github.com/stan-dev/math/pull/1711))                   |Bugfix/include what you use in /prim/fun                                            |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1707](https://github.com/stan-dev/math/pull/1707))                   |Add namespace qualifiers to size()                                                  |
|[mcol](https://github.com/mcol) : ([#1705](https://github.com/stan-dev/math/pull/1705))                                   |Check for positive definiteness only after computing the LLT factors                |
|[mcol](https://github.com/mcol) : ([#1701](https://github.com/stan-dev/math/pull/1701))                                   |Don't throw when computing the inverse of a size zero matrix                        |
|[mcol](https://github.com/mcol) : ([#1698](https://github.com/stan-dev/math/pull/1698))                                   |Improve tests and documentation for quad_form and quad_form_sym                     |
|[mcol](https://github.com/mcol) : ([#1688](https://github.com/stan-dev/math/pull/1688))                                   |Add positive definite checks to mdivide_left_spd                                    |
|[mcol](https://github.com/mcol) : ([#1685](https://github.com/stan-dev/math/pull/1685))                                   |Allow multiply for size 0 inputs                                                    |
|[kedartal](https://github.com/kedartal) : ([#1684](https://github.com/stan-dev/math/pull/1684))                           |Explicitly call stan::math::size (fix ambiguous call versus std::size)              |
|[mcol](https://github.com/mcol) : ([#1682](https://github.com/stan-dev/math/pull/1682))                                   |Check for consistent sizes before comparing arguments                               |
|[mcol](https://github.com/mcol) : ([#1678](https://github.com/stan-dev/math/pull/1678))                                   |Reorganize files after flatten                                                      |
|[mcol](https://github.com/mcol) : ([#1673](https://github.com/stan-dev/math/pull/1673))                                   |Remove Boost deprecation warning                                                    |
|[mcol](https://github.com/mcol) : ([#1665](https://github.com/stan-dev/math/pull/1665))                                   |make develop build again                                                            |
|[mcol](https://github.com/mcol) : ([#1663](https://github.com/stan-dev/math/pull/1663))                                   |correctness fixes in neg_binomial_* functions                                       |
|[mcol](https://github.com/mcol) : ([#1661](https://github.com/stan-dev/math/pull/1661))                                   |Move vectorize files to meta folder                                                 |
|[t4c1](https://github.com/t4c1) : ([#1660](https://github.com/stan-dev/math/pull/1660))                                   |Generalize view and size functions                                                  |
|[mcol](https://github.com/mcol) : ([#1655](https://github.com/stan-dev/math/pull/1655))                                   |Cleanup in poisson_cdf, poisson_lccdf and poisson_lcdf                              |
|[serban-nicusor-toptal](https://github.com/serban-nicusor-toptal) : ([#1653](https://github.com/stan-dev/math/pull/1653)) |Fix permission denied in Jenkins pipeline                                           |
|[mcol](https://github.com/mcol) : ([#1649](https://github.com/stan-dev/math/pull/1649))                                   |Clean up code to use index_type_t,  return_type_t and partials_return_t             |
|[t4c1](https://github.com/t4c1) : ([#1628](https://github.com/stan-dev/math/pull/1628))                                   |Generalize operator-like functions                                                  |
|[martinmodrak](https://github.com/martinmodrak) : ([#1614](https://github.com/stan-dev/math/pull/1614))                   |Improved numerical stability of binomial_coefficient_log                            |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1567](https://github.com/stan-dev/math/pull/1567))                   |Cleanup the use of find in makefile                                                 |
|[SteveBronder](https://github.com/SteveBronder) : ([#1525](https://github.com/stan-dev/math/pull/1525))                   |Generic var templates for operators and std::iterator_trait var/fvar specialization |
|[martinmodrak](https://github.com/martinmodrak) : ([#1497](https://github.com/stan-dev/math/pull/1497))                   |Fixing negative binomial phi cutoff 
|[t4c1](https://github.com/t4c1) : ([#1796](https://github.com/stan-dev/math/pull/1796))                                   |Bugfix: make kernel generator wait for events on matrices               |
|[mcol](https://github.com/mcol) : ([#1674](https://github.com/stan-dev/math/pull/1674))                                   |Document template parameters in /prim                                   |
|[mcol](https://github.com/mcol) : ([#1668](https://github.com/stan-dev/math/pull/1668))                                   |Rename spaced_* to linspaced_*                                          |
|[wds15](https://github.com/wds15) : ([#1821](https://github.com/stan-dev/math/pull/1821))                                 |clarify README.md                                                       |


max_size, max_size_mvt, check_consistent_sizes, and check_consistent_sizes_mvt have been simplified and now take any number of arguments.

======================================================================
v3.1.0 (24 January 2020)
======================================================================

## Features
We upgraded the Boost library to version 1.72.0.

[SteveBonder](https://github.com/SteveBronder) put in a ton of effort to refactor the [math docs site](http://mc-stan.org/math/) and clean up the doxygen. Steve also added var specialization for matrix_cl. [t4c1](https://github.com/t4c1) generalized signatures of element-wise matrix functions which now accept and return expressions, added broadcasting to the GLM functions and extended that functionality to GPU GLM function. He also added the JIT compilation for dynamically generating OpenCL kernels that is now able to deal with matrices, scalars, unary and binary element-wise matrix operations and the block() function. This will enable much faster adoptation of GPU support for all remaining Stan Math function. Tadej also optimized the matrix_cl constructor for the case of constructing from rvalue and extened matrix_cl to accept Eigen::Map.
[peterwicksstringfield](https://github.com/peterwicksstringfield) added the specialized derivative for the for matrix/scalar and the matrix power function. [wds15](https://github.com/wds15) added a change to the makefiles that enables easier building of custom .cpp files that use Stan Math. [mcol](https://github.com/mcol) added implementations of std_normal_cdf, std_normal_lcdf, std_normal_lccdf and std_normal_rng. [rok-cesnovar](https://github.com/rok-cesnovar) simplified turning on threading in the makefiles.

## Developer-aimed features
[bob-carpenter](https://github.com/bob-carpenter) extended the autodiff test framework for ternary functions and consolidated most of rev and fwd tests into mix. Bob also improved test coverage for all real-valued functions of two or fewer arguments. [rok-cesnovar](https://github.com/rok-cesnovar) flattened the arr,scal and mat subfolders to simplify the codebase and renamed the length() and length_mvt() functions to size() and size_mvt(), cleaned up the unit test names and added a restriction to enforce unique test names in the Stan Math codebase.
[mcol](https://github.com/mcol) cleaned up the use of constants in the codebase and added missing documentation for template parameters. [SteveBonder](https://github.com/SteveBronder) extended with require metaprogramming traits for is_container and is_string_convertible.

## Fixes
[PhilClemson](https://github.com/PhilClemson) improved numerical precision of normal lcdf. [martinmodrak](https://github.com/martinmodrak) fixed some notable edge cases for neg_binomial_2 and neg_binomial_2_log. [SteveBonder](https://github.com/SteveBronder) fixed a bug with the beta_binomial_lpmf function. [rok-cesnovar](https://github.com/rok-cesnovar) fixed a bug in poisson_log_glm that caused the sampling statement fails to not fit and resolved include-what-you-use issues in the codebase.

[mcol](https://github.com/mcol) has cleaned up a lot of the codebase: simplifiying and optimizing the exp_mod_normal_*, VectorBuilder, pareto_type_2, poisson_log_lpmf, poisson_lpmf, beta binomial CDF/CCDF functions. Marco also did an amazing job fixing and cleaning the behaviour and adding testing for edge cases in:

* corr_matrix_transform
* determinant
* hypergeometric_lpmf
* inverse
* inverse_spd
* log_determinant
* log_determinant_spd
* log_sum_exp
* matrix_exp_multiply
* matrix_exp_pade
* mdivide_left_ldlt
* mdivide_right_ldlt
* scale_matrix_exp_multiply
* singular_values
* trace_inv_quad_form_ldlt
* trace_gen_inv_quad_form_ldlt
* von_mises_lpdf
* von_mises_rng

Last but not least he cleaned up the use of trailing underscores in our codebase, compiler warnings when building the Kinsol library, removed the use of some boost functions that are now available in C++11, cleaned up some uninitialized variable in tests, fixed the bug that cause overpromotion in muliply functions, cleaned up unnecessary checks in _lpmf/_lpdf functions, renamed dirichlet_lpmf to dirichlet_lpdf and stan::math::domain_error to throw_domain_error, removed uses of boost::promote_args and boost/math/tools/promotion.hpp, removed the unused functions stan::math::sub(), locscale_contrain and locscale_free.

[martinmodrak](https://github.com/martinmodrak) fixed some notable edge cases for neg_binomial_2 and neg_binomial_2_log. [wds15](https://github.com/wds15) fixed the static init order bug. [bgoodri](https://github.com/bgoodri) fixed a bug that issued a warning for catching without a constant reference. [peterwicksstringfield](https://github.com/peterwicksstringfield) cleaned up some potentialy misleading docs in negated require template metaprograms and other documentation and also cleaned up the use of ibeta and fixed the loop order in loops dealing with Eigen matrics.[seantalts](https://github.com/seantalts) refactored integrate_1d to use std::ostream* instead of std::ostream&. [mitzimorris](https://github.com/mitzimorris) cleaned up the doxygen config file of unneeded directives.

======================================================================
v3.0.0 (18 October 2019)
======================================================================

## Features

PR [#1180](https://github.com/stan-dev/math/pull/1180) brought the [Intel TBB](https://github.com/intel/tbb) into Stan as a dependency, which we will be using in the future for CPU parallelism all across Stan! The TBB is an excellent framework that will let Stan utilize nested parallelism across the algorithms and gradient evaluations. The licensing for the Intel TBB library is under the Apache 2.0 license. This dependency implies an additional restriction as compared to the new BSD license alone. The Apache 2.0 license is incompatible with GPL-2 licensed code if the software if distributing the software as a unitary binary. Refer to the Apache 2.0 evaluation page on the Stan Math [wiki](https://github.com/stan-dev/math/wiki/Apache-2.0-License-Evaluation).

With @t4c1's large contributions, we now have GPU/OpenCL support for many of our glm functions! @rok-cesnovar added an OpenCL reverse mode specialization for `multiplication` and `mdivide_left_tri` while @t4c1 added the OpenCL specialization for `gp_exp_quad_cov`.

Some other nice features include @andrjohns vectorizing the Dirichlet distribution, @IvanYashchuk implementing a reverse mode specialization for inverse, and @yizhang-yiz with @charlesm93 adding fixed point algebra solvers based on Sundial's KINSOL scheme.

Internally, @bob-carpenter added a new AD testing framework, which both replaced 18,539 of code with 2,500 and simultaneously increased our test coverage! @Stevebronder added a type traits metaprogramming scheme so that we can make use of more generic templating in a lot of our code. Last but not least, @andrjohns standardized a lot of our code to use standard library functions instead of our hand-rolled methods.

We are now using TBB for threading in map_rect. With performance tests on a non-trivial map_rect model we have observed speedups of up to 20% on Windows, 70% on Linux and 30% on MacOS. Speedups were observed for both Intel and AMD CPUs. On MacOS we observed 25-30% speedups even for single threaded models when using tbbmalloc.

![Chart](https://user-images.githubusercontent.com/5857231/66257806-3158c780-e7a6-11e9-9c8c-fbf6bcde8fd5.png)

## Fixes

@wds15 patched the way we use `lgamma` so that it's faster in concurrent settings. A speedy patch came in from @t4c1 when @jgabry reported intercept only glm specializations with size zero matrices could give the wrong output. We had several patches and code cleanups in the OpenCL code, mostly testing and improving the type trait system around the OpenCL methods. @nhuurre patched `log_sum_exp` and `log_diff_exp` so that the methods respected boundary conditions a bit better. Stan also now uses `clang-tidy,` which gives us an automated way to keep the code base standardized.

## Features List

|Contributor                                                                                               |Title                                                                                          |
|:---------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|
|[bob-carpenter](https://github.com/bob-carpenter) : ([#1384](https://github.com/stan-dev/math/pull/1384)) |Feature/1382 remove fvar nan checks                                                            |
|[wds15](https://github.com/wds15) : ([#1376](https://github.com/stan-dev/math/pull/1376))                 |integrate Intel TBB                                                                            |
|[yizhang-yiz](https://github.com/yizhang-yiz) : ([#1371](https://github.com/stan-dev/math/pull/1371))     |Feature fp solver                                                                              |
|[t4c1](https://github.com/t4c1) : ([#1366](https://github.com/stan-dev/math/pull/1366))                   |Gpu ordered_logistic_glm_lpmf and categorical_logit_glm_lpmf                                   |
|[t4c1](https://github.com/t4c1) : ([#1365](https://github.com/stan-dev/math/pull/1365))                   |Gpu neg_binomial_2_log_glm                                                                     |
|[andrjohns](https://github.com/andrjohns) : ([#1363](https://github.com/stan-dev/math/pull/1363))         |Issue 1362 - Vectorised Dirichlet distribution                                                 |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1355](https://github.com/stan-dev/math/pull/1355))   |Feature/issue 1354 Implement matrix_cl overloads for rep_vector, rep_row_vector and rep_matrix |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1353](https://github.com/stan-dev/math/pull/1353))   |Revert GPU caching                                                                             |
|[t4c1](https://github.com/t4c1) : ([#1350](https://github.com/stan-dev/math/pull/1350))                   |Gpu poisson bernoulli glms                                                                     |
|[SteveBronder](https://github.com/SteveBronder) : ([#1344](https://github.com/stan-dev/math/pull/1344))   |Adds require_* template type traits                                                            |
|[charlesm93](https://github.com/charlesm93) : ([#1339](https://github.com/stan-dev/math/pull/1339))       |Feature/issue 1115 newton solver                                                               |
|[IvanYashchuk](https://github.com/IvanYashchuk) : ([#1334](https://github.com/stan-dev/math/pull/1334))   |Implemented reverse mode for inverse                                                           |
|[t4c1](https://github.com/t4c1) : ([#1333](https://github.com/stan-dev/math/pull/1333))                   |Implement normal_id_glm_lpdf in OpenCL                                                         |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1329](https://github.com/stan-dev/math/pull/1329))   |Feature/Issue 1294 Rewrite the test-math-dependencies script in Python                         |
|[SteveBronder](https://github.com/SteveBronder) : ([#1323](https://github.com/stan-dev/math/pull/1323))   |Adds const ref and ref returns for to_var/fvar methods                                         |
|[andrjohns](https://github.com/andrjohns) : ([#1318](https://github.com/stan-dev/math/pull/1318))         |Issue 1010 - Replace hand-coded math with standard library c++11 functions                     |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1303](https://github.com/stan-dev/math/pull/1303))   |Feature/issue 1221 Use OpenCL in rev/mdivide_left_tri                                          |
|[andrjohns](https://github.com/andrjohns) : ([#1296](https://github.com/stan-dev/math/pull/1296))         |issue 1279 - Remove deprecated Eigen content from math headers                                 |
|[t4c1](https://github.com/t4c1) : ([#1293](https://github.com/stan-dev/math/pull/1293))                   |OpenCL matrix multiplication optimizations                                                     |
|[andrjohns](https://github.com/andrjohns) : ([#1283](https://github.com/stan-dev/math/pull/1283))         |Refactor rev/mat with eigen plugin methods                                                     |
|[SteveBronder](https://github.com/SteveBronder) : ([#1281](https://github.com/stan-dev/math/pull/1281))   |Add a double template to matrix_cl                                                             |
|[bob-carpenter](https://github.com/bob-carpenter) : ([#1262](https://github.com/stan-dev/math/pull/1262)) |Feature/1258 ad test core                                                                      |
|[t4c1](https://github.com/t4c1) : ([#1252](https://github.com/stan-dev/math/pull/1252))                   |Implement ordinal regression GLM (ordered_logistic_glm_lpmf)                                   |
|[t4c1](https://github.com/t4c1) : ([#1206](https://github.com/stan-dev/math/pull/1206))                   |opencl prim gp_exp_quad_cov                                                                    |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1305](https://github.com/stan-dev/math/pull/1305)) |Feature/issue 1221 Use OpenCL in rev/multiply                                   |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1278](https://github.com/stan-dev/math/pull/1278)) |Feature/1221 OpenCL primitive multiply                                          |
|[t4c1](https://github.com/t4c1) : ([#1299](https://github.com/stan-dev/math/pull/1299))                 |mdivide_right_tri can use OpenCL                                                |
|[wds15](https://github.com/wds15) : ([#1180](https://github.com/stan-dev/math/pull/1180))                 |Feature/intel tbb lib                                                                          |

## Fixes

|Contributor                                                                                             |Title                                                                           |
|:-------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|
|[wds15](https://github.com/wds15) : ([#1401](https://github.com/stan-dev/math/pull/1401))               |Bugfix/tbb cleanup                                                              |
|[t4c1](https://github.com/t4c1) : ([#1399](https://github.com/stan-dev/math/pull/1399))                 |bugfix intercept only GLMs                                                      |
|[wds15](https://github.com/wds15) : ([#1395](https://github.com/stan-dev/math/pull/1395))               |allow spaces in path leading to stan-directory in makefiles                     |
|[SteveBronder](https://github.com/SteveBronder) : ([#1392](https://github.com/stan-dev/math/pull/1392)) |Add /lib/tbb/** to the .gitignore                                               |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1375](https://github.com/stan-dev/math/pull/1375)) |Fix bug in stack_alloc_test                                                     |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1369](https://github.com/stan-dev/math/pull/1369)) |Bugfix/remove unused vectorize test                                             |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1364](https://github.com/stan-dev/math/pull/1364)) |Reorganize /opencl and add missing matrix_cl overloads                          |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1361](https://github.com/stan-dev/math/pull/1361)) |Remove const qualifier from matrix_cl rows & cols                               |
|[t4c1](https://github.com/t4c1) : ([#1358](https://github.com/stan-dev/math/pull/1358))                 |Split opencl glm function                                                       |
|[SteveBronder](https://github.com/SteveBronder) : ([#1356](https://github.com/stan-dev/math/pull/1356)) |Bugfix for making matrix_cls from temporaries                                   |
|[SteveBronder](https://github.com/SteveBronder) : ([#1341](https://github.com/stan-dev/math/pull/1341)) |Refactor Type Traits                                                            |
|[SteveBronder](https://github.com/SteveBronder) : ([#1340](https://github.com/stan-dev/math/pull/1340)) |Refactor/clang tidy cleanup                                                     |
|[SteveBronder](https://github.com/SteveBronder) : ([#1337](https://github.com/stan-dev/math/pull/1337)) |Update OpenCL Headers                                                           |
|[SteveBronder](https://github.com/SteveBronder) : ([#1331](https://github.com/stan-dev/math/pull/1331)) |Moves if statements for scal/prob/beta-binomial out of for loops                |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1330](https://github.com/stan-dev/math/pull/1330)) |Remove EXPECT_DEATH unit tests that fail when -NDEBUG is set                    |
|[SteveBronder](https://github.com/SteveBronder) : ([#1327](https://github.com/stan-dev/math/pull/1327)) |Adds clang-tidy to makefile                                                     |
|[t4c1](https://github.com/t4c1) : ([#1314](https://github.com/stan-dev/math/pull/1314))                 |fix matrix_cl_view test                                                         |
|[t4c1](https://github.com/t4c1) : ([#1311](https://github.com/stan-dev/math/pull/1311))                 |Fixed matrix_cl copying and moving                                              |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1310](https://github.com/stan-dev/math/pull/1310)) |Cleanup/issue #1301 remove unnecessary Boost and other compiler flags           |
|[rok-cesnovar](https://github.com/rok-cesnovar) : ([#1304](https://github.com/stan-dev/math/pull/1304)) |Re-apply #1278 OpenCL prim multiply                                             |
|[SteveBronder](https://github.com/SteveBronder) : ([#1298](https://github.com/stan-dev/math/pull/1298)) |make key of map for opencl kernel options into a string                         |
|[SteveBronder](https://github.com/SteveBronder) : ([#1291](https://github.com/stan-dev/math/pull/1291)) |Changes all prim files to use *_return_type_t instead of typename *_return_type |
|[nhuurre](https://github.com/nhuurre) : ([#1290](https://github.com/stan-dev/math/pull/1290))           |Bugfix/646 log_sum_exp and log_diff_exp boundaries                              |
|[SteveBronder](https://github.com/SteveBronder) : ([#1289](https://github.com/stan-dev/math/pull/1289)) |Refactor for enable_if functions                                                |
|[SteveBronder](https://github.com/SteveBronder) : ([#1286](https://github.com/stan-dev/math/pull/1286)) |Removes extra loops in Jacobian calculations                                    |
|[t4c1](https://github.com/t4c1) : ([#1266](https://github.com/stan-dev/math/pull/1266))                 |Added triangularity attribute to matrix_cl                                      |
|[t4c1](https://github.com/t4c1) : ([#1261](https://github.com/stan-dev/math/pull/1261))                 |GLM tests improvements                                                          |
|[wds15](https://github.com/wds15) : ([#1255](https://github.com/stan-dev/math/pull/1255))               |Bugfix/issue 1250 lgamma                                                        |

======================================================================
v2.20.0 (18 July 2019)
======================================================================

New Features
------------
- OpenCL rectangular matrix multiply optimization (#1196)
- Change name of opencl `copy` function (#1217)
- Add better doc for all traits (#377)
- Math repo runs tests on Windows (#951)

Bug Fixes
---------
- Length 0 checks should come after checking for mismatched sizes (#336)
- abs(real) translates to C++ but fails to compile (#907)
- prim/mat/meta/is_vector_like.hpp weirdness through check_finite.hpp (#79)
- Clang 4.0.0 fails tests that _should_ fail already (#561)


Other
-----
- Fix code style in stan/math/rev/mat/fun/multiply.hpp (#598)
- Remove unnecessary using statements from vectorized functions (#426)
- factor size zero tests into variadic function (#589)
- Update cholesky_decompose to use inplace decomposition per Eigen 3.3 (#359)


v2.19.1 (18 Apr 2019)
======================================================================

This is a release out of cycle to address a performance bug in v.2.19 (stan-dev/stan#2748)

New Features
------------
- GLM Optimizations, especially for small number of attributes. ( #1185 )
- Simplifies several prim/mat/fun functions using Eigen's vectorisation. (#1092)
    The following functions have been changed to use Eigen under the hood, usage remain the same:
    * add
    * add_diag
    * apply_scalar_unary
    * cholesky_corr_constrain
    * columns_dot_product
    * cumulative_sum
    * diag_post_multiply
    * diag_pre_multiply
    * divide_columns
    * elt_divide
    * elt_multiply
    * log_softmax
    * log_sum_exp
    * max
    * mean
    * min
    * prod
    * quad_form
    * quad_form_diag
    * quad_form_sym
    * rows_dot_product
    * softmax
    * subtract
    * variance
- Adds two new opencl kernels for operations matrix*vector and vector*matrix. (#1192)
- OpenCL rectangular matrix multiply optimization (#1196)

Bug Fixes
---------
- Fixing Missing bounds checks in some lpdfs (#1101)

Other
-----

- Add option of having device functions shared across opencl kernels. (#1201)
- New version of `to_size_t` to get rid of an `assert` (#1172)


v2.19.0 (20 Mar 2019)
======================================================================
Slow to arrive, but fast to compute: Stan has GPU support!

Stan 2.19 brings GPU-optimized computation to Stan users. The first supported
function is Cholesky decomposition, the main bottleneck of many common
statistical models. Activating GPU support is easy - only a few lines are added
to the configuration and no changes have to be made to the Stan model. Cholesky
decompositions of larger matrices (including their gradients, when dealing with
parameters) are then automatically transferred to the GPU with speedups ranging
from 10 to 30, depending on matrix size and GPU.

Other GPU-optimized matrix algebra primitives and common statistical models are
soon to follow: matrix multiplication, lower triangular inverse,
eigendecomposition, GP covariance functions and several GLMs. The implementation
is based on OpenCL, so it can be used with any GPU and GPU programming-savvy
users can also add their own custom OpenCL kernels.

New Features
------------
- GPU
  - matrix multiplication (#974)
  - inverse of lower triangular matrix (#1028)
  - Operator overloading for GPU functions (#1056)
  - Cholesky decomposition (#1058)
  - specialized reverse-mode implementation for cholesky decompse (#1117)
- Host doxygen API doc on https://mc-stan.org/math/ (#500)
- Makefile completely rewritten (#581, #954, #1041, #1043, #1087)
- Adding `beta_proportion` distribution (#1018)
- adjoint vector-Jacobian product form of precomputed gradients for reverse (#876)
- Add alternative inv_logit parameterization to prevent underflow (#874)


Bug Fixes
---------
- Improved derivative for Gamma CDF w.r.t. alpha (#525)
- `value_of` incorrectly returned the wrong type (#968)
- `sum` incorrectly returned the wrong type (#987)
- `matrix_exp` incorrectly passed the argument by values (#769)
- Unit testing with Windows on Jenkins (#1046)
- `gp_cov_exp_quad` was computing the ARD mixing up rows and cols (#984)
- Fixing GoodGammaP for gcc 7.3 (#1063)

Other
-----

- Clarity on what's being tested in Math (one compiler per OS) (#943)
- Updated GitHub templates (#911)
- Improve ODE speed (#1049)
- Fix tests for threading (#1058)
- Upgrade Google Test to v1.8.1 (#1051)
- Upgrade Sundials to v4.1.0 (#1097)
- Matrix exponential action:
  - A fast implementation was implemented (#771), but it had errors (#)
  - Currently, a slow implementation is in the codebase.
- Improve the codebase:
  - Code  spacing (#587)
  - Using varidic template parameters for `return_type`, `partials_return_type`,
    and `include_summands` (#977)
  - Fixing math constants definitions for Windows (#986)
  - Avoid ambiguous instantiation of `math::sqrt()` by implementing for `double`
    and `int` (#712)
  - Clean up GPU code:
    - Seperate OpenCL kernel access into it's own class (#973)
    - `read_only` and `write_only` decorators in GPU kernels fail in Windows (#1034)
  - Fixing uninitialized values in tests:
    - `bernoulli_logit_glm_lpdf` test (#995)
    - `check_greater` test (#819)
    - `gp_exponential_cov_test` failing (#1150)
  - Updating template parameters of matern32 (#981)
  - Update `gp_dot_prod_cov` (#979)
  - Deprecating old GP covariance function names (#756)
  - Fixed compiler warnings in `test-headers` (#1110)
  - Adding required headers (#1106)
  - Turn test-math-dependencies warnings into failures on Jenkins (#1078)
  - Replace `boost::type_traits` with `std::` versions (#1126)
  - Fix doxygen errors (#1139)za
  - Clean up anaonymous namespace usage (#1006)
  - Setting STAN_NUM_THREADS to illegal value should produce an error (#947)


v.2.18.1 (24 December 2018)
======================================================================
Bugfix release - major bug in threading where certain thread count and
job size combinations could result in incorrect output. See
https://discourse.mc-stan.org/t/bug-in-map-rect-with-threading-in-stan-2-18-0/7056
for more details.

Bug Fixes
--------------------------------------------------
* Fix threading batch size bug (#1075)

v.2.18.0 (13 July 2018)
======================================================================
This is our first release with parallelism (through C++11 threads or MPI)!

New Features
--------------------------------------------------
* Support for MPI parallelization with map_rect
* Support for threaded AD and a threaded map_rect implementation (#809)
* New periodic covariance function (#921)
* Integrated Boost's 1 dimensional integrator (#913)
* Added thin QR decomposition (#900)
* New `add_diag` function for adding a real or vector to the diagonal of a matrix (#871)
* New `log_inv_logit_diff` function for the log of the difference of two inverse logits (#856)
* New dot product kernel for GPs (#834)
* Vectorized many more RNGs (#833, #722, #622)
* Add `matrix_exp_action` to calculate `exp(At)*B` (#830)
* `log_mix` vectorized (#751, #664)
* New integrators from CVODES (`integrate_ode_adams`, `_bdf`) (#735)
* New `std_normal` (#609)
* New `std_normal_log` (#728)
* New GLM primitive `normal_id_glm` (#665)
* New `chol2inv` that computes a matrix's inverse from its Cholesky factor (#649)
* New `poisson_log_glm` poisson regression with log link (#647)
* New vectorized `ordered_probit` (#645)
* New `log_modified_bessel_first_kind` that expands where Bessel 1 function can be used (#640)
* New bernoulli logit GLM (#608)

Bug Fixes
--------------------------------------------------
* Rising and falling factorial were not accepting negative arguments (#636)
* Fix normalizing constant for LKJ distribution (#628)

Other
--------------------------------------------------
* Tweaks to `effective_sample_size` for performance and upstream compatibility (#865)
* Switch to sundials package combining CVODES and IDAS, updating versions (#779, #744)
* Now publishing the Math doxygen (http://mc-stan.org/math/)
* Faster `dirichlet_lpdf` accepting more vector types (#788)
* Upgrade to Boost 1.66 (#766)
* Improved derivatives for Gamma CDF (#780)
* Sped up `multi_normal_cholesky` (#753)
* More efficient `mdivide_right_tri` (#715)
* More efficient `integrate_ode_rk45_grad_test` (#714)
* Make algebraic solver easier to use (#703, #697)
* Begin thinking about internal compatibility for complex numbers (#643)
* Faster `pow` (#642)


v.2.17.0 (05 September 2017)
======================================================================

New Features
--------------------------------------------------
* New algebraic solver! (stan-dev/stan#2023, #516)
* append_array now supports vectors of vectors

Other
--------------------------------------------------
* C++11 (and some of 14) now supported; see https://github.com/stan-dev/stan/wiki/Supported-C---Compilers-and-Language-Features
* Updated to Boost 1.64.0 (#599)
* Makefile refactoring (#602, others)
* New forward-mode test kit (#0557, #0568)
* replace copy args with refs (#346)

v.2.16.0 (15 June 2017)
======================================================================

New Features
--------------------------------------------------
* New append_array function
* Add categorical_logit_rng function

Bug Fixes
--------------------------------------------------
* Align gamma_* function parameter names with documentation

Other
--------------------------------------------------
* Update to Eigen 3.3.3
* Support g++ 4.9
* Fix overload logic in mdivide_left_tri_low so that it calls the var version of mdivide_left_tri where appropriate.
* Check consistent size of state and dy_dt in ode_system
* OperandsAndPartials refactor with new multivariate / nested container support
* Update LLT to inplace decomposition per eigen 3.3 doc
* Disable printf functions from CVODES


v.2.15.0 (13 April 2017)
======================================================================

New Features
--------------------------------------------------

* Efficient blocking algorithm for gradient of the Cholesky(#384)
* New distribution functions _lpdf / _lpmf / _lcdf / _lccdf to replace _log function (#320)
* Univariate normal distribution on sufficient statistics(#38)
* New to_matrix function for real arrays(real[]) (#467)
* New specialization of stan::math::array_builder for matrix types(#496)

Bug Fixes
--------------------------------------------------
* Fixes to hypergeometric functions(#487)

Other
--------------------------------------------------
* Speedup for categorical_rng(#503)
* Speedup for non-stiff ODE integration(#512)
* Refactor VectorView into scalar_seq_view(#464)


v.2.14.0 (26 December 2016)
======================================================================

Bug Fixes
--------------------------------------------------
* neg_binomial_2_log problem with large values of phi (#463)

Other
--------------------------------------------------
* Remove check_equal() function (#398)

v2.13.0 (25 November 2016)
======================================================================

New Features
--------------------------------------------------
* vectorization of unary functions (#347)
* specialized gradients for cove_exp_quad for amplitude and/or length-scale
  parameters and input data (#353)
* math library is consistent with Eigen 3.2.9 and 3.3.0 (#432, #429)
* added choose(n, k) function (#400)
* added bernoulli_logit_rng function (#258)

Bug Fixes
--------------------------------------------------
* beta_rng no longer returns NaN
* fixed memory issue with cholesky_decompose in reverse-mode (#415)
* no more multiple definitions when including Stan headers in
  multiple source files (#394)
* fixed broken _MSC_VER conditionals and remove math.h includes (#370)

Other
--------------------------------------------------
* chi-square test in RNG testing pulled into utility function (#412)
* error check functions are now all void functions (#382)
* remove math.h includes (#277)
* update to Boost 1.62.0 (#435)


v2.12.0 (6 September 2016)
======================================================================

New Features
--------------------------------------------------
* update to Eigen 3.2.9 (#362)
* lots of unary functions vectorized (#202)
* reverse-mode gradients for matrix/matrix multiply,
  matrix/vector multiply, and vector/vector multiply
  received rewrites and significant speed ups for
  large matrices/vectors (#323)

Bug Fixes
--------------------------------------------------
* fixed exception handling with CVODES for g++6 (#348)
* squashed warnings with g++6 (#345)
* enforce symmetry on cov_matrix parameter inverse transform (#342)
* added int overload for log(), exp(), log2(), and log1p() to resolve
  ambiguity with g++6 (#327)
* sort_asc and sort_desc check for NaN (#253)
* wiener_log: raise exception for tau >= y (#150)
* stopped printing matrices in check functions (#146)
* added check for positive definiteness in multi_normal_rng (#44)


Other
--------------------------------------------------
* added multi-core to travis builds (#373)
* fixed makefile to recognize N_TESTS (#367)
* updated error message for check_bounded() (#332)
* made the use of stan::math namespace within the math library
  more consistent (#331)
* reimplemented integer division (#329)
* added test for multiple translation units (#328)
* updated costs for Eigen::NumTraits (#233)
* worked on getting the Math library working with Eigen 3.3-beta (#187)


v2.11.0 (27 July 2016)
======================================================================

New Features
--------------------------------------------------
* vectorization of a many unary math functions (#202)

Bug Fixes
--------------------------------------------------
* throw exception on integer divide by 0 (#321)
* allow inclusing of Stan headers in multiple translation units (#311)


v2.10.0 (15 June 2016)
======================================================================

New Features
--------------------------------------------------
* append_row and append_col for scalars (stan-dev/stan #1347)
* PYTHON2 property in makefile for 2.x requirement for
  cpplint (#300)
* refactored general ODE interface and add CVODES bdf
  integrator (#262, #293, #281, #284, #288)
* rename integrate_ode to integrate_ode_rk45 (#292)
* update to Eigen 3.2.8 (#256)
* efficiency improvement in value_of and value_of_rec (#249)
* generalize dependencies in runTests.py (#247)
* untangle include dependencies, add finer headers (#246)
* std::vector version of value_of and value_of_rec (#243)
* doc for is_vector_like and more tests (#241)
* issue and pull-request templates on GitHub (#237)
* top-level includes by shape and autodiff type (#236)
* upgrade to Boost 1.60 (#221)
* unary vectorization infrastructure (#202)
* ODE integrator fails if precision not reachable (#179)

Bug Fixes
--------------------------------------------------
* fix elementwise division argument cast problem with
  operator/(matrix, var) not compiling (#290)
* fix QR decomposition segfault for no-square matrices (#286)
* fix off-by-one bug in nested Jacobian autodiff in degenerate
  set_zero_all_adjoints_nested() call (#278)
* squash compiler warnings in OperandsAndPartials (#274)
* ifdef out residual Sundials (CVODES) print statements (#257)
* fix binomial coefficient branch to remove integer division and
  rounding (#239)
* refine clean-all make target to remove .d files (#234)
* fix Jacobian bug returning transposed result (#230)
* include stdexcept so container_view header compiles (#223)


v2.9.0 (3 December 2015)
======================================================================
New Features
--------------------------------------------------
* Added container_view (#103)
* Improved derivative calculation for beta_cdf (#124)
* Gradients computed for cholesky_decompose (#125)
* Added functions for sparse matrix multiplication (#159)
* Changed the unit_vector transformation (#204)

Bug Fixes
--------------------------------------------------
* Check inputs for cholesky_decompose (#48)
* Fixed documentation for wiener.hpp (#140)

Other
--------------------------------------------------
* Cleaned up couple_ode_system for clarity (#180)
* Additional forward mode specializations (#16, #17)
* Removed unused chaintable base class (#183)
* Updated clang++ compiler options to supress warnings (#197)
* Fixed some unit tests (#199)
* Replaced top-level using statements (#207)


v2.8.0 (8 September 2015)
======================================================================
New Features
--------------------------------------------------
* Added standard normal quantile function (#148)
* Add sparse matrix (csr format) by dense vector multiplication (#127)
* Faster derivative calculation for cholesky (#125)
* Fixes for MSVC 2008 and 2010 (#134)

Bug Fixes
--------------------------------------------------
* Fixed unit test for `mdivide_right_spd` (#153)
* Reduced compiler warnings for sing comparisons for sign functions (#151)
* Fixed gitignore for test directory (#149)
* Better assignment error messages (#146)


v2.7.0 (8 July 2015; first independent release)
======================================================================
First release
--------------------------------------------------
* Previously included as part of Stan
* Release numbering started at release number of Stan
* All work done up to and including the release of Stan Math 2.7.0
  is documented under the Stan C++ 2.7.0 release notes