Update cudnn-batch to next #70

alreadydone · 2018-11-19T02:38:09Z

No description provided.

Case-sensitive coordinates are a thing in SGF, not GTP. Pull request leela-zero#1793.

* Install to ${CMAKE_INSTALL_BINDIR}, some distros like to put games in /usr/games. * Store/load leelaz_opencl_tuning and load weights file from system directories, i.e. ~/.local/share/leela-zero on Unix * Better error reporting when network weights file is not found. Pull request leela-zero#1618.

* MTCS: Skip current expanding child when doing uct select. Search thread should explore other nodes in this case, this would save the search from some useless searches. It has benefit for batching support too. Before this change, all threads could be busy waiting for the first node being expanded. Give expanding node a huge virtual loss instead to avoid crash when only one child exists. Pull request leela-zero#1794.

Necessary for Clang. Fixes issue leela-zero#1809. Pull request leela-zero#1811.

As demanded by GTP, improving the input handling of GameState::play_textmove in the process (which now would crash if given a pass or resignation). Pull request leela-zero#1814.

Updated README to compile under Linux with Boost filesystem. Required after 73f1f93. Pull request leela-zero#1813.

Pull request leela-zero#1827.

Pull request leela-zero#1824.

@Atarust

According to @Atarust at leela-zero#1806 this fixes kernel compilation error with his configuration. No performance difference. Pull request leela-zero#1820.

Copying on weight construction keeps a copy of the weights on the host memory, at least for recent NVIDIA GPUs. Creating a buffer and then copying later on doesn't, and this saves memory. Pull request leela-zero#1818.

Pull request leela-zero#1826.

* Thread-safe UCTNodePointer This makes almost all UCTNodePointer operations thread-safe. The only exceptions are destructors and when it is 'moved out' Should even handle concurrent inflate() calls properly. Uses atomic operations to emulate locks only when needed. This includes support for re-expansion by forcibly moving the state back to INITIAL on a single-thread context. Pull request leela-zero#1764.

Avoid having duplicate copies of the network weights in memory. Pull request leela-zero#1795.

Fixes issue leela-zero#1837. Pull request leela-zero#1838.

Fixes clang warning. Pull request leela-zero#1841.

When doing auto precision detection, make sure prior implementation is destroyed before trying new implementation Pull request leela-zero#1842.

* Count memory consumption of a search tree by introducing a referencer for UCTNodePointer and UCTNode. * NNCache: Add method to get estimated memory consumption. * Extend Network with methods to estimate network size, network cache size and resize cache. * Estimate total memory consumption as estimated network size + number of gpus * 85MB + estimated tree size * 1.1 + estimated cache size * 1.1 * Add command `lz-setoption` which behaves like set_option from UCI spec. * Add option to set maximum memory consumption in MB. * Add option to configure ratio of memory reserved for nn cache and search tree. * Add command 'lz-estimatememory' which shows estimated memory consumption. * Initialize maximum tree size and cache size after the network initialization. Pull request leela-zero#1741.

Follow up to pull request leela-zero#1741.

Fixes issue leela-zero#1843.

Pull request leela-zero#1852.

Pull request leela-zero#1867.

If a node is fully expanded but is reverted to INITIAL state, there is no chance it returns to EXPANDED state. Don't revert nodes to INITIAL state if it is fully expanded. Some additional small bugfixes. Pull request leela-zero#1851.

Minor comment fixups.

* Add policy prior in analysis output. * Store policy as float instead of string in OutputAnalysisData. Pull request leela-zero#1836.

We preferably store the analysis in the original format for sorting, and only do the conversion for display at display time. Don't use the completely meaningless tag "N" for move policy prior. We'll use "prior" instead. Winrate is currently output in 1/100th of a percent, so we'll use the same format for priors. I'm not sure why winrate is not just using floats, but I assume GUIs now already rely on this, and it might avoid some weird bugs related to locale.

Pull request leela-zero#1874.

This adds Eigen as a default matrix/vector library via a submodule. This has a load of advantages: * It can be used as a replacement for a cBLAS library when it is not available, cannot be found, or is outdated compared to the compiler or CPU. * Because Eigen is header only, it significantly eases the build prequisite requirements. * The Eigen code paths are much more readable from a mathematical perspective. * Eigen can optimize more heavily for known matrix sizes. The current code doesn't yet take advantage of this, though. The downsides: * Eigen might be a bit slower than other BLAS libraries. (Nevertheless, on my system it is faster than OpenBLAS) * Binaries built with Eigen are optimized for the CPU it was compiled on and don't port as well to other CPUs. So you need seperate binaries for wider client support. * Default Eigen in CMake, add tests. Default the Eigen library in CMake, as it's the fastest for most contemporary CPUs and configurations, and the easiest to build. We can optionally use BLAS by adding the USE_BLAS define, and will try to locate BLAS/OpenBLAS if so. This is useful for binaries for distribution such as our releases or distros. Split all tests to cover both Eigen and BLAS. Update build instructions to remove BLAS as a dependency, use CMake on all Unixy platforms, and use HTTPS. Pull request leela-zero#1692.

See discussion in pull request leela-zero#1642. This adds an optional side to move in the lz-analyze command, instead of only a posting interval. This makes the format more consistent with all other GTP commands. We check the amount and format of the arguments so we are backwards compatible with GUIs that send the old format, i.e. with only a posting interval. Pull request leela-zero#1872.

Rework the Network initialization to pull out the OpenCL benchmarking for precision autodetection. Add support to ForwardPipe to report whether it is necessary to run the benchmarks. If the answer is no, and fp16 compute works, we assume that's what we want. This avoids the benchmarking overhead on modern AMD cards and probably on the latest ones from NVIDIA too. Pull request leela-zero#1873.

Should fix issue leela-zero#1921.

Required on macOS, and probably other platforms. Fixes issue leela-zero#1901. Pull request leela-zero#1910.

Implement a few more parameters that can be set via lz-setoption, specifically visits, playouts, pondering, resign threshold and the lag buffer. We currently don't check the provided values against the reported min/max values but rely on the UI not to mess up. This could be addressed in a refactoring. Similarly, commandline and setoption values should probably treated in a unified way. Remove the bogus boolean return value from GTP processing functions. Minor style fix for old GTP code. Pull request leela-zero#1927.

We need to call UCTNode::create_children() even if we aren't expanding because that moves our node's state from INITIAL to EXPANDED. Pull request leela-zero#1928.

If tuner failed during precision autodetection error output in stdout was read as a GTP message. Pull request leela-zero#1935.

* Fall back to single precision when the GPU claims fp16 support but it doesn't work. * Net initialization fixes: - Try at least one selfcheck eval when autodetecting precision - Revive selfcheck when using Eigen Pull request leela-zero#1934.

Fixes issue leela-zero#1938.

Many lz-setoption commands are forgetting to add the closing GTP = if they are successful. This will freeze GUIs. Fixes issue leela-zero#1940.

We've supported HTTPS on the server side for a while now, make it the default.

Colab has updated and the instructions here probably no longer work. They should probably be hosted elsewhere, too.

Define a variable closer to the usage point.

* Separate FPU-reduction setting for root. * Removed fpu_root_reduction. Pull request leela-zero#1960.

Link to Google Cloud tutorial on Google Docs. Pull request leela-zero#1961.

Delete outdated questions and answers. Pull request leela-zero#196.

Disabling input buffering on Windows causes breakage that looks like input buffering stays enabled. This was accounted for in the code, but the #define check was against a non-default flag, and a different one as used elsewhere.

Pull request leela-zero#1975.

Even though SGF defaults to size 19 boards, we should not try to set up a board that size if LZ has not been compiled to support it. Pull request leela-zero#1964.

Without this, it's empirically not possible to load the current 256x40 networks on a 32-bit machine.

If we are trying to auto-select the best device for OpenCL, never select a CPU. This will cause the engine to refuse to run when people are trying to run the OpenCL version without a GPU or without GPU drivers, instead of selecting any slow and suboptimal (and empirically extremely broken) OpenCL-on-CPU drivers. Falling back to CPU-only would be another reasonable alternative, but doesn't provide an alert in case the GPU drivers are missing. Improves behavior of issue leela-zero#1994.

Fix full tuner for heterogeneous GPUs and auto precision detection. --full-tuner implies --tune-only --full-tuner requires an explicit precision Fixes leela-zero#1973. Pull request leela-zero#2004.

Very minor speedup of about 2% with batch size of 1. With batch size of 5 there is a speedup of about 5% with half precision and 12% with single precision. Out transformation memory accesses are almost completely coalesced with the new kernel. Pull request leela-zero#2014.

From upstream a807dcf0f8623d40dc5ce9d1eb00ffd0e46150c7.

* CPUPipe : change winograd transformation constants to an equation. Combined with a series of strength reduction changes, improves netbench by about 8%. * Convert some std::array into individual variables For some reason this allows gcc to optimize the code better, improving netbench by 2%. Pull request leela-zero#2021.

Use hard-coded equations instead of matrix multiplication. Pull request leela-zero#2023.

Fix Validation -k option by reading its value before the parser is reused. Pull request leela-zero#2024.

TFiFiE and others added 30 commits September 3, 2018 14:56

Don't use "void" as function parameter.

6eecb1e

Pull request leela-zero#1765.

Isolate and clean up text-to-vertex conversion.

0549816

Case-sensitive coordinates are a thing in SGF, not GTP. Pull request leela-zero#1793.

Convert string before variadic function call.

1042cb6

Necessary for Clang. Fixes issue leela-zero#1809. Pull request leela-zero#1811.

Always expect 2 arguments after "play" command.

51cba90

As demanded by GTP, improving the input handling of GameState::play_textmove in the process (which now would crash if given a pass or resignation). Pull request leela-zero#1814.

Update README with new boost dependencies.

b290f47

Updated README to compile under Linux with Boost filesystem. Required after 73f1f93. Pull request leela-zero#1813.

Fix boost package reference for VS2017 build.

5d4bd2f

Pull request leela-zero#1827.

Added missing files to MSVC 2015 projects.

5bd2ef4

Pull request leela-zero#1824.

Make Winograd matrices global.

dd95cab

According to @Atarust at leela-zero#1806 this fixes kernel compilation error with his configuration. No performance difference. Pull request leela-zero#1820.

OpenCL : Don't copy on weight construction.

5412e66

Copying on weight construction keeps a copy of the weights on the host memory, at least for recent NVIDIA GPUs. Creating a buffer and then copying later on doesn't, and this saves memory. Pull request leela-zero#1818.

Winograd filter transform and CPU in transform optimization.

7e13bf0

Pull request leela-zero#1826.

Pass network weight as a std::shared_ptr class.

cd48427

Avoid having duplicate copies of the network weights in memory. Pull request leela-zero#1795.

Fix vectorized Winograd transform.

0a0d134

Fixes issue leela-zero#1837. Pull request leela-zero#1838.

Remove unused lambda capture.

c21c8a4

Fixes clang warning. Pull request leela-zero#1841.

Reduce network memory usage when autodetecting.

cff3917

When doing auto precision detection, make sure prior implementation is destroyed before trying new implementation Pull request leela-zero#1842.

Assorted style nits and minor bugfixes.

c6999fc

Follow up to pull request leela-zero#1741.

Fix "NN eval" so it is never the search result.

aaf1038

Fixes issue leela-zero#1843.

Update .gitignore to include ".vs/".

71c6a36

Pull request leela-zero#1852.

Add some more const correctness.

bf2e767

Pull request leela-zero#1867.

Fixes assert failure on wait_expanded().

dac5a1f

If a node is fully expanded but is reverted to INITIAL state, there is no chance it returns to EXPANDED state. Don't revert nodes to INITIAL state if it is fully expanded. Some additional small bugfixes. Pull request leela-zero#1851.

Only run assertion logic in debug mode.

8abf0d2

Minor comment fixups.

Make lz-analyze output policy prior.

a0f60cb

* Add policy prior in analysis output. * Store policy as float instead of string in OutputAnalysisData. Pull request leela-zero#1836.

Fix memory estimation for auto-detected gpu.

142199c

Pull request leela-zero#1874.

gcp and others added 30 commits October 12, 2018 09:26

Update README.md.

cd1de6e

Should fix issue leela-zero#1921.

Add GNUInstallDirs include.

6881787

Required on macOS, and probably other platforms. Fixes issue leela-zero#1901. Pull request leela-zero#1910.

Fix assert-fail when memory is completely full.

4830a95

We need to call UCTNode::create_children() even if we aren't expanding because that moves our node's state from INITIAL to EXPANDED. Pull request leela-zero#1928.

Report tuner errors to stderr.

7f5073e

If tuner failed during precision autodetection error output in stdout was read as a GTP message. Pull request leela-zero#1935.

Update OpenCL headers link.

2e079fc

Fixes issue leela-zero#1938.

Add missing GTP terminator for lz-setoption cases.

8a57a85

Many lz-setoption commands are forgetting to add the closing GTP = if they are successful. This will freeze GUIs. Fixes issue leela-zero#1940.

Switch AutoGTP to HTTPS.

4bd7cd4

We've supported HTTPS on the server side for a while now, make it the default.

Remove COLAB Readme.

a1a4af8

Colab has updated and the instructions here probably no longer work. They should probably be hosted elsewhere, too.

Update links and Todo in README.

ac88220

Remove reference to Colab README.

fc54323

Tiny style fix.

82d5f25

Define a variable closer to the usage point.

Separate FPU-reduction variable for root.

b2a40e4

* Separate FPU-reduction setting for root. * Removed fpu_root_reduction. Pull request leela-zero#1960.

Link to instructions for running on the cloud.

40260b0

Link to Google Cloud tutorial on Google Docs. Pull request leela-zero#1961.

Update FAQ.md.

a0baa60

Delete outdated questions and answers. Pull request leela-zero#196.

Fix Windows flag check for input buffering.

2e4f3e6

Disabling input buffering on Windows causes breakage that looks like input buffering stays enabled. This was accounted for in the code, but the #define check was against a non-default flag, and a different one as used elsewhere.

Update AUTHORS.

d1225db

Bump version numbers.

4fd6e69

AutoGTP: update build dir of leelaz in README.md.

6d16497

Pull request leela-zero#1975.

Correctly initialize board when reading SGF.

1fe59c6

Even though SGF defaults to size 19 boards, we should not try to set up a board that size if LZ has not been compiled to support it. Pull request leela-zero#1964.

Increase memory limit for 32-bit builds.

5cd4d8f

Without this, it's empirically not possible to load the current 256x40 networks on a 32-bit machine.

Fix tuner for heterogeneous GPUs and auto precision.

6f58159

Fix full tuner for heterogeneous GPUs and auto precision detection. --full-tuner implies --tune-only --full-tuner requires an explicit precision Fixes leela-zero#1973. Pull request leela-zero#2004.

Update OpenCL C++ headers.

c72cb3a

From upstream a807dcf0f8623d40dc5ce9d1eb00ffd0e46150c7.

Convolve in/out performance optimization.

304f9c7

Use hard-coded equations instead of matrix multiplication. Pull request leela-zero#2023.

Validation: fix -k option.

fc8d080

Fix Validation -k option by reading its value before the parser is reused. Pull request leela-zero#2024.

copy branch

4ca33e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update cudnn-batch to next #70

Update cudnn-batch to next #70

alreadydone commented Nov 19, 2018

Update cudnn-batch to next #70

Are you sure you want to change the base?

Update cudnn-batch to next #70

Conversation

alreadydone commented Nov 19, 2018