forked from Ttl/leela-zero
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update cudnn-batch to next #70
Open
alreadydone
wants to merge
65
commits into
cudnn-batch-tomerge
Choose a base branch
from
patch-27
base: cudnn-batch-tomerge
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pull request leela-zero#1765.
Case-sensitive coordinates are a thing in SGF, not GTP. Pull request leela-zero#1793.
* Install to ${CMAKE_INSTALL_BINDIR}, some distros like to put games in /usr/games. * Store/load leelaz_opencl_tuning and load weights file from system directories, i.e. ~/.local/share/leela-zero on Unix * Better error reporting when network weights file is not found. Pull request leela-zero#1618.
* MTCS: Skip current expanding child when doing uct select. Search thread should explore other nodes in this case, this would save the search from some useless searches. It has benefit for batching support too. Before this change, all threads could be busy waiting for the first node being expanded. Give expanding node a huge virtual loss instead to avoid crash when only one child exists. Pull request leela-zero#1794.
Necessary for Clang. Fixes issue leela-zero#1809. Pull request leela-zero#1811.
As demanded by GTP, improving the input handling of GameState::play_textmove in the process (which now would crash if given a pass or resignation). Pull request leela-zero#1814.
Updated README to compile under Linux with Boost filesystem. Required after 73f1f93. Pull request leela-zero#1813.
Pull request leela-zero#1824.
According to @Atarust at leela-zero#1806 this fixes kernel compilation error with his configuration. No performance difference. Pull request leela-zero#1820.
Copying on weight construction keeps a copy of the weights on the host memory, at least for recent NVIDIA GPUs. Creating a buffer and then copying later on doesn't, and this saves memory. Pull request leela-zero#1818.
* Thread-safe UCTNodePointer This makes almost all UCTNodePointer operations thread-safe. The only exceptions are destructors and when it is 'moved out' Should even handle concurrent inflate() calls properly. Uses atomic operations to emulate locks only when needed. This includes support for re-expansion by forcibly moving the state back to INITIAL on a single-thread context. Pull request leela-zero#1764.
Avoid having duplicate copies of the network weights in memory. Pull request leela-zero#1795.
Fixes issue leela-zero#1837. Pull request leela-zero#1838.
Fixes clang warning. Pull request leela-zero#1841.
When doing auto precision detection, make sure prior implementation is destroyed before trying new implementation Pull request leela-zero#1842.
* Count memory consumption of a search tree by introducing a referencer for UCTNodePointer and UCTNode. * NNCache: Add method to get estimated memory consumption. * Extend Network with methods to estimate network size, network cache size and resize cache. * Estimate total memory consumption as estimated network size + number of gpus * 85MB + estimated tree size * 1.1 + estimated cache size * 1.1 * Add command `lz-setoption` which behaves like set_option from UCI spec. * Add option to set maximum memory consumption in MB. * Add option to configure ratio of memory reserved for nn cache and search tree. * Add command 'lz-estimatememory' which shows estimated memory consumption. * Initialize maximum tree size and cache size after the network initialization. Pull request leela-zero#1741.
Follow up to pull request leela-zero#1741.
Pull request leela-zero#1852.
Pull request leela-zero#1867.
If a node is fully expanded but is reverted to INITIAL state, there is no chance it returns to EXPANDED state. Don't revert nodes to INITIAL state if it is fully expanded. Some additional small bugfixes. Pull request leela-zero#1851.
Minor comment fixups.
* Add policy prior in analysis output. * Store policy as float instead of string in OutputAnalysisData. Pull request leela-zero#1836.
We preferably store the analysis in the original format for sorting, and only do the conversion for display at display time. Don't use the completely meaningless tag "N" for move policy prior. We'll use "prior" instead. Winrate is currently output in 1/100th of a percent, so we'll use the same format for priors. I'm not sure why winrate is not just using floats, but I assume GUIs now already rely on this, and it might avoid some weird bugs related to locale.
This adds Eigen as a default matrix/vector library via a submodule. This has a load of advantages: * It can be used as a replacement for a cBLAS library when it is not available, cannot be found, or is outdated compared to the compiler or CPU. * Because Eigen is header only, it significantly eases the build prequisite requirements. * The Eigen code paths are much more readable from a mathematical perspective. * Eigen can optimize more heavily for known matrix sizes. The current code doesn't yet take advantage of this, though. The downsides: * Eigen might be a bit slower than other BLAS libraries. (Nevertheless, on my system it is faster than OpenBLAS) * Binaries built with Eigen are optimized for the CPU it was compiled on and don't port as well to other CPUs. So you need seperate binaries for wider client support. * Default Eigen in CMake, add tests. Default the Eigen library in CMake, as it's the fastest for most contemporary CPUs and configurations, and the easiest to build. We can optionally use BLAS by adding the USE_BLAS define, and will try to locate BLAS/OpenBLAS if so. This is useful for binaries for distribution such as our releases or distros. Split all tests to cover both Eigen and BLAS. Update build instructions to remove BLAS as a dependency, use CMake on all Unixy platforms, and use HTTPS. Pull request leela-zero#1692.
See discussion in pull request leela-zero#1642. This adds an optional side to move in the lz-analyze command, instead of only a posting interval. This makes the format more consistent with all other GTP commands. We check the amount and format of the arguments so we are backwards compatible with GUIs that send the old format, i.e. with only a posting interval. Pull request leela-zero#1872.
Rework the Network initialization to pull out the OpenCL benchmarking for precision autodetection. Add support to ForwardPipe to report whether it is necessary to run the benchmarks. If the answer is no, and fp16 compute works, we assume that's what we want. This avoids the benchmarking overhead on modern AMD cards and probably on the latest ones from NVIDIA too. Pull request leela-zero#1873.
Should fix issue leela-zero#1921.
Required on macOS, and probably other platforms. Fixes issue leela-zero#1901. Pull request leela-zero#1910.
Implement a few more parameters that can be set via lz-setoption, specifically visits, playouts, pondering, resign threshold and the lag buffer. We currently don't check the provided values against the reported min/max values but rely on the UI not to mess up. This could be addressed in a refactoring. Similarly, commandline and setoption values should probably treated in a unified way. Remove the bogus boolean return value from GTP processing functions. Minor style fix for old GTP code. Pull request leela-zero#1927.
We need to call UCTNode::create_children() even if we aren't expanding because that moves our node's state from INITIAL to EXPANDED. Pull request leela-zero#1928.
If tuner failed during precision autodetection error output in stdout was read as a GTP message. Pull request leela-zero#1935.
* Fall back to single precision when the GPU claims fp16 support but it doesn't work. * Net initialization fixes: - Try at least one selfcheck eval when autodetecting precision - Revive selfcheck when using Eigen Pull request leela-zero#1934.
Fixes issue leela-zero#1938.
Many lz-setoption commands are forgetting to add the closing GTP = if they are successful. This will freeze GUIs. Fixes issue leela-zero#1940.
We've supported HTTPS on the server side for a while now, make it the default.
Colab has updated and the instructions here probably no longer work. They should probably be hosted elsewhere, too.
Define a variable closer to the usage point.
* Separate FPU-reduction setting for root. * Removed fpu_root_reduction. Pull request leela-zero#1960.
Link to Google Cloud tutorial on Google Docs. Pull request leela-zero#1961.
Delete outdated questions and answers. Pull request leela-zero#196.
Disabling input buffering on Windows causes breakage that looks like input buffering stays enabled. This was accounted for in the code, but the #define check was against a non-default flag, and a different one as used elsewhere.
Even though SGF defaults to size 19 boards, we should not try to set up a board that size if LZ has not been compiled to support it. Pull request leela-zero#1964.
Without this, it's empirically not possible to load the current 256x40 networks on a 32-bit machine.
If we are trying to auto-select the best device for OpenCL, never select a CPU. This will cause the engine to refuse to run when people are trying to run the OpenCL version without a GPU or without GPU drivers, instead of selecting any slow and suboptimal (and empirically extremely broken) OpenCL-on-CPU drivers. Falling back to CPU-only would be another reasonable alternative, but doesn't provide an alert in case the GPU drivers are missing. Improves behavior of issue leela-zero#1994.
Fix full tuner for heterogeneous GPUs and auto precision detection. --full-tuner implies --tune-only --full-tuner requires an explicit precision Fixes leela-zero#1973. Pull request leela-zero#2004.
Very minor speedup of about 2% with batch size of 1. With batch size of 5 there is a speedup of about 5% with half precision and 12% with single precision. Out transformation memory accesses are almost completely coalesced with the new kernel. Pull request leela-zero#2014.
From upstream a807dcf0f8623d40dc5ce9d1eb00ffd0e46150c7.
* CPUPipe : change winograd transformation constants to an equation. Combined with a series of strength reduction changes, improves netbench by about 8%. * Convert some std::array into individual variables For some reason this allows gcc to optimize the code better, improving netbench by 2%. Pull request leela-zero#2021.
Use hard-coded equations instead of matrix multiplication. Pull request leela-zero#2023.
Fix Validation -k option by reading its value before the parser is reused. Pull request leela-zero#2024.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.