Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update squeeze-excitation + ladder/legality/liberty to gcp/next #97

Open
wants to merge 102 commits into
base: patch-31
Choose a base branch
from

Conversation

alreadydone
Copy link
Owner

No description provided.

Ttl and others added 30 commits August 6, 2018 15:14
The previous method is too strict for fp16 compute. 

Since lower precision of fp16 is still good enough to play at 
the same strength as fp32 relax the self check.

Pull request leela-zero#1698.
* Fix error calculation (Missing batch_size divider).
* Better error reporting when no working configuration could be found.
* Change reference data to have less rounding errors with half precision.
* Replace BLAS reference SGEMM with custom code that gives transposed 
  output like the OpenCL SGEMM.

Pull request leela-zero#1710.
Should save a tiny bit of memory.

Pull request leela-zero#1716.
Fall back to single precision net when half precision is broken, 
at least when detection mode is auto.

Pull request leela-zero#1726.
Some OpenCL buffers were allocated too big. 
Tested with oclgrind that the new sizes are correct.

Pull request leela-zero#1727.
Use smaller precision to store the weights to decrease the file size.

See discussion in issue leela-zero#1733.

Pull request leela-zero#1736.
* Network initialization restructuring

- Create one net at a time when doing fp16/fp32 autodetect.
  Saves some GPU memory.
- Create an internal lambda which initializes the nets.
- Use std::copy to copy vectors to reduce runtime.

* zeropad_U : loop reordering for performance optimization.

Plus other optimizations for zero-copying initialization.

Pull request leela-zero#1750.
Minor fixes to incorrect comments, and reduce some excessively long
lines.
* Changed Validation and Game to support multiple GTP commands
  at start up but left the Validations options untouched.
* Separated engine options (as positional arguments) from match options.
  Replaced time settings option with ability to specify any GTP commands.
* Added --gtp-command options using the existing option parser.
  Also changed default binary options from -p 1600 to -v 3200.
* Each binary argument has to be preceded by "--".
* Changes to use Engine Objects.
* Exits on failed GTP command.

Added printing of GTP commands in gameStart() so users can see what
commands are actually sent to each engine.

Pull request leela-zero#1652.
* Don't refer to stone locations as "squares".

* Use "vertex" for those in the "letterbox" representation.
* Otherwise, mostly use "intersection".
* Also, capture all possible moves (i.e. including pass) in its own
  explicit constant.

* Clean up network constants.

Pull request leela-zero#1723.
Case-sensitive coordinates are a thing in SGF, not GTP.

Pull request leela-zero#1793.
* Install to ${CMAKE_INSTALL_BINDIR},
  some distros like to put games in /usr/games.

* Store/load leelaz_opencl_tuning and load weights file from
  system directories, i.e.
  ~/.local/share/leela-zero on Unix

* Better error reporting when network weights file is not found.

Pull request leela-zero#1618.
* MTCS: Skip current expanding child when doing uct select.

Search thread should explore other nodes in this case, this would save
the search from some useless searches.

It has benefit for batching support too. Before this change, all
threads could be busy waiting for the first node being expanded.

Give expanding node a huge virtual loss instead to avoid crash when
only one child exists.

Pull request leela-zero#1794.
As demanded by GTP, improving the input handling of GameState::play_textmove 
in the process (which now would crash if given a pass or resignation).

Pull request leela-zero#1814.
Updated README to compile under Linux with Boost filesystem.

Required after 73f1f93.

Pull request leela-zero#1813.
According to @Atarust at leela-zero#1806 this fixes kernel compilation error with his
configuration. No performance difference.

Pull request leela-zero#1820.
Copying on weight construction keeps a copy of the weights on the host memory,
at least for recent NVIDIA GPUs. Creating a buffer and then copying later on
doesn't, and this saves memory.

Pull request leela-zero#1818.
* Thread-safe UCTNodePointer

This makes almost all UCTNodePointer operations thread-safe.
The only exceptions are destructors and when it is 'moved out'
Should even handle concurrent inflate() calls properly.

Uses atomic operations to emulate locks only when needed.

This includes support for re-expansion by forcibly moving the state back 
to INITIAL on a single-thread context.

Pull request leela-zero#1764.
Avoid having duplicate copies of the network weights in memory.

Pull request leela-zero#1795.
Fixes clang warning.

Pull request leela-zero#1841.
When doing auto precision detection, make sure prior implementation 
is destroyed before trying new implementation

Pull request leela-zero#1842.
* Count memory consumption of a search tree by introducing a
  referencer for UCTNodePointer and UCTNode.
* NNCache: Add method to get estimated memory consumption.
* Extend Network with methods to estimate network size, network cache
  size and resize cache.
* Estimate total memory consumption as
    estimated network size +
    number of gpus * 85MB +
    estimated tree size * 1.1 +
    estimated cache size * 1.1
* Add command `lz-setoption` which behaves like set_option from UCI spec.
* Add option to set maximum memory consumption in MB.
* Add option to configure ratio of memory reserved for nn cache
  and search tree.
* Add command 'lz-estimatememory' which shows estimated memory consumption.
* Initialize maximum tree size and cache size after
  the network initialization.

Pull request leela-zero#1741.
ihavnoid and others added 30 commits November 17, 2018 23:45
* CPUPipe : change winograd transformation constants to an equation.

Combined with a series of strength reduction changes, 
improves netbench by about 8%.

* Convert some std::array into individual variables

For some reason this allows gcc to optimize the code better,
improving netbench by 2%.

Pull request leela-zero#2021.
Use hard-coded equations instead of matrix multiplication.

Pull request leela-zero#2023.
Fix Validation -k option by reading its value before the parser is reused.

Pull request leela-zero#2024.
Simplify instructions, especially related to building and running
when wanting to contribute.

Based on pull request leela-zero#1983.
* Move Engine to Game.h and refactor autogtp to use it too.
* Fix initialization of job engines.

Pull request leela-zero#2029.
Generally speaking, providing character pointers as the first argument 
directly might cause FSB (Format String Bug).

Pull request leela-zero#2063.
Update from upstream f0b7045.

Fixes warnings related to CL_TARGET_OPENCL_VERSION.
* Make AutoGTP URL parametric.
* Support for the sgfhash and movescount parameters in get-task.
* Automatic downloading of sgf and training files.
* Fix Management.cpp for older Qt5 versions.
* Added starting match games from specified initial position
* Tidy ValidationJob::init() like ProductionJob::init()
* Use existing QUuid method of generating random file 
  names instead of QTemporaryFile when fetching game data.

Moreover, we do not load training data in LeelaZ since it is not needed to start from
an arbitrary position.

Pull request leela-zero#2052.
* Add optional separate options for white in match game.
* Fixed loading of saved match order with optionsSecond.

Pull request leela-zero#2078.
See issue leela-zero#2032.

All contributors to the core engine have given their permission to
add an additional permission to link with NVIDIA's CUDA/cuDNN/TensorRT
libraries. This makes it possible to distribute the engine when built to
use those libraries.

Update the copyright notices to 2019.
Although the OpenCL driver is generally installed as part of the driver
install, mention the requirement explicitly in case it wasn't.

See pull request leela-zero#2138.
Calling get_eval() on zero-visit node will assert-fail.
The original code could assert-fail on b.get_eval() if 'a' and 'b' both
had zero visits but suddenly 'a' gained an additional visit.

Pull request leela-zero#2110.
Follow up fix for pull request leela-zero#2114.
* AutoGTP: Allow specifying initial GTP commands.
  Also add support for white taking the first move in handicapped job games.
* AutoGTP: Refactored core loop for match games to avoid code duplication.
* Fixed white using black's match game settings after loading from an SGF by
  moving SGF loading into Game::gameStart() to before sending GTP commands
  (except handicap commands).
* Changed so that when an SGF file is loaded, AutoGTP determines whether
  handicap is in use from the SGF rather than from any starting GTP commands.

Pull request leela-zero#2096.
This includes some optimization improvements for newer GCC/Clang that
may be relevant to a lot of our users.

Pull request leela-zero#2151.
Fixes issue leela-zero#2167.

I could swear I fixed this before. Maybe I forgot to push?
* AutoGTP: Added full engine options and starting GTP commands 
  to SGF comments that are produced.
* Refactored Game::fixSgf().

Pull request leela-zero#2160.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.