Tensorcore -> fastexit #73

alreadydone · 2018-11-24T08:28:22Z

No description provided.

Should fix issue leela-zero#1921.

Required on macOS, and probably other platforms. Fixes issue leela-zero#1901. Pull request leela-zero#1910.

Implement a few more parameters that can be set via lz-setoption, specifically visits, playouts, pondering, resign threshold and the lag buffer. We currently don't check the provided values against the reported min/max values but rely on the UI not to mess up. This could be addressed in a refactoring. Similarly, commandline and setoption values should probably treated in a unified way. Remove the bogus boolean return value from GTP processing functions. Minor style fix for old GTP code. Pull request leela-zero#1927.

We need to call UCTNode::create_children() even if we aren't expanding because that moves our node's state from INITIAL to EXPANDED. Pull request leela-zero#1928.

If tuner failed during precision autodetection error output in stdout was read as a GTP message. Pull request leela-zero#1935.

* Fall back to single precision when the GPU claims fp16 support but it doesn't work. * Net initialization fixes: - Try at least one selfcheck eval when autodetecting precision - Revive selfcheck when using Eigen Pull request leela-zero#1934.

Fixes issue leela-zero#1938.

Many lz-setoption commands are forgetting to add the closing GTP = if they are successful. This will freeze GUIs. Fixes issue leela-zero#1940.

We've supported HTTPS on the server side for a while now, make it the default.

Colab has updated and the instructions here probably no longer work. They should probably be hosted elsewhere, too.

Define a variable closer to the usage point.

* Separate FPU-reduction setting for root. * Removed fpu_root_reduction. Pull request leela-zero#1960.

Link to Google Cloud tutorial on Google Docs. Pull request leela-zero#1961.

Delete outdated questions and answers. Pull request leela-zero#196.

Disabling input buffering on Windows causes breakage that looks like input buffering stays enabled. This was accounted for in the code, but the #define check was against a non-default flag, and a different one as used elsewhere.

Pull request leela-zero#1975.

Even though SGF defaults to size 19 boards, we should not try to set up a board that size if LZ has not been compiled to support it. Pull request leela-zero#1964.

Without this, it's empirically not possible to load the current 256x40 networks on a 32-bit machine.

If we are trying to auto-select the best device for OpenCL, never select a CPU. This will cause the engine to refuse to run when people are trying to run the OpenCL version without a GPU or without GPU drivers, instead of selecting any slow and suboptimal (and empirically extremely broken) OpenCL-on-CPU drivers. Falling back to CPU-only would be another reasonable alternative, but doesn't provide an alert in case the GPU drivers are missing. Improves behavior of issue leela-zero#1994.

Fix full tuner for heterogeneous GPUs and auto precision detection. --full-tuner implies --tune-only --full-tuner requires an explicit precision Fixes leela-zero#1973. Pull request leela-zero#2004.

Very minor speedup of about 2% with batch size of 1. With batch size of 5 there is a speedup of about 5% with half precision and 12% with single precision. Out transformation memory accesses are almost completely coalesced with the new kernel. Pull request leela-zero#2014.

From upstream a807dcf0f8623d40dc5ce9d1eb00ffd0e46150c7.

* CPUPipe : change winograd transformation constants to an equation. Combined with a series of strength reduction changes, improves netbench by about 8%. * Convert some std::array into individual variables For some reason this allows gcc to optimize the code better, improving netbench by 2%. Pull request leela-zero#2021.

Use hard-coded equations instead of matrix multiplication. Pull request leela-zero#2023.

Fix Validation -k option by reading its value before the parser is reused. Pull request leela-zero#2024.

See pull request leela-zero#2031.

Pull request leela-zero#2034.

Pull request leela-zero#2035.

Tuning output

gcp and others added 30 commits October 12, 2018 09:26

Update README.md.

cd1de6e

Should fix issue leela-zero#1921.

Add GNUInstallDirs include.

6881787

Required on macOS, and probably other platforms. Fixes issue leela-zero#1901. Pull request leela-zero#1910.

Fix assert-fail when memory is completely full.

4830a95

We need to call UCTNode::create_children() even if we aren't expanding because that moves our node's state from INITIAL to EXPANDED. Pull request leela-zero#1928.

Report tuner errors to stderr.

7f5073e

If tuner failed during precision autodetection error output in stdout was read as a GTP message. Pull request leela-zero#1935.

Update OpenCL headers link.

2e079fc

Fixes issue leela-zero#1938.

Add missing GTP terminator for lz-setoption cases.

8a57a85

Many lz-setoption commands are forgetting to add the closing GTP = if they are successful. This will freeze GUIs. Fixes issue leela-zero#1940.

Switch AutoGTP to HTTPS.

4bd7cd4

We've supported HTTPS on the server side for a while now, make it the default.

Remove COLAB Readme.

a1a4af8

Colab has updated and the instructions here probably no longer work. They should probably be hosted elsewhere, too.

Update links and Todo in README.

ac88220

Remove reference to Colab README.

fc54323

Tiny style fix.

82d5f25

Define a variable closer to the usage point.

Separate FPU-reduction variable for root.

b2a40e4

* Separate FPU-reduction setting for root. * Removed fpu_root_reduction. Pull request leela-zero#1960.

Link to instructions for running on the cloud.

40260b0

Link to Google Cloud tutorial on Google Docs. Pull request leela-zero#1961.

Update FAQ.md.

a0baa60

Delete outdated questions and answers. Pull request leela-zero#196.

Fix Windows flag check for input buffering.

2e4f3e6

Disabling input buffering on Windows causes breakage that looks like input buffering stays enabled. This was accounted for in the code, but the #define check was against a non-default flag, and a different one as used elsewhere.

Update AUTHORS.

d1225db

Bump version numbers.

4fd6e69

AutoGTP: update build dir of leelaz in README.md.

6d16497

Pull request leela-zero#1975.

Correctly initialize board when reading SGF.

1fe59c6

Even though SGF defaults to size 19 boards, we should not try to set up a board that size if LZ has not been compiled to support it. Pull request leela-zero#1964.

Increase memory limit for 32-bit builds.

5cd4d8f

Without this, it's empirically not possible to load the current 256x40 networks on a 32-bit machine.

Fix tuner for heterogeneous GPUs and auto precision.

6f58159

Fix full tuner for heterogeneous GPUs and auto precision detection. --full-tuner implies --tune-only --full-tuner requires an explicit precision Fixes leela-zero#1973. Pull request leela-zero#2004.

Update OpenCL C++ headers.

c72cb3a

From upstream a807dcf0f8623d40dc5ce9d1eb00ffd0e46150c7.

Convolve in/out performance optimization.

304f9c7

Use hard-coded equations instead of matrix multiplication. Pull request leela-zero#2023.

Validation: fix -k option.

fc8d080

Fix Validation -k option by reading its value before the parser is reused. Pull request leela-zero#2024.

Add link to Azure free trial instructions.

6e88b95

See pull request leela-zero#2031.

sethtroisi and others added 8 commits November 20, 2018 11:35

Cleanup atomics and dead if.

666c0c6

Pull request leela-zero#2034.

Const in SGFTree.

8670a40

Pull request leela-zero#2035.

always output tuning result

4622d95

fixes.

e781dc8

Tensor core support for half precision

9053b9d

Bugfixes

848417f

Merge pull request #71 from alreadydone/tuning-output

88f7111

Tuning output

Merge branch '1t-batch-fastexit-tensor' into tensorcore+

89687ad

alreadydone merged commit f2a1c69 into 1t-batch-fastexit-tensor Nov 24, 2018

alreadydone deleted the tensorcore+ branch November 24, 2018 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorcore -> fastexit #73

Tensorcore -> fastexit #73

alreadydone commented Nov 24, 2018

Tensorcore -> fastexit #73

Tensorcore -> fastexit #73

Conversation

alreadydone commented Nov 24, 2018