Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorcore -> fastexit #73

Merged
merged 38 commits into from
Nov 24, 2018
Merged

Tensorcore -> fastexit #73

merged 38 commits into from
Nov 24, 2018

Conversation

alreadydone
Copy link
Owner

No description provided.

gcp and others added 30 commits October 12, 2018 09:26
Required on macOS, and probably other platforms.

Fixes issue leela-zero#1901.

Pull request leela-zero#1910.
Implement a few more parameters that can be set via lz-setoption,
specifically visits, playouts, pondering, resign threshold and the lag
buffer.

We currently don't check the provided values against the reported
min/max values but rely on the UI not to mess up. This could be
addressed in a refactoring. Similarly, commandline and setoption values
should probably treated in a unified way.

Remove the bogus boolean return value from GTP processing functions.

Minor style fix for old GTP code.

Pull request leela-zero#1927.
We need to call UCTNode::create_children() even if we aren't expanding
because that moves our node's state from INITIAL to EXPANDED.

Pull request leela-zero#1928.
If tuner failed during precision autodetection error output in stdout was read
as a GTP message.

Pull request leela-zero#1935.
* Fall back to single precision when the GPU claims fp16 support 
  but it doesn't work.
* Net initialization fixes:
- Try at least one selfcheck eval when autodetecting precision
- Revive selfcheck when using Eigen

Pull request leela-zero#1934.
Many lz-setoption commands are forgetting to add the closing GTP = if
they are successful. This will freeze GUIs.

Fixes issue leela-zero#1940.
We've supported HTTPS on the server side for a while now, make it the
default.
Colab has updated and the instructions here probably no longer work.
They should probably be hosted elsewhere, too.
Define a variable closer to the usage point.
* Separate FPU-reduction setting for root.
* Removed fpu_root_reduction.

Pull request leela-zero#1960.
Link to Google Cloud tutorial on Google Docs.

Pull request leela-zero#1961.
Delete outdated questions and answers.

Pull request leela-zero#196.
Disabling input buffering on Windows causes breakage that looks like
input buffering stays enabled. This was accounted for in the code, but
the #define check was against a non-default flag, and a different one as
used elsewhere.
Even though SGF defaults to size 19 boards, we should not try
to set up a board that size if LZ has not been compiled to support
it.

Pull request leela-zero#1964.
Without this, it's empirically not possible to load the current 256x40
networks on a 32-bit machine.
If we are trying to auto-select the best device for OpenCL, never select
a CPU. This will cause the engine to refuse to run when people are
trying to run the OpenCL version without a GPU or without GPU drivers,
instead of selecting any slow and suboptimal (and empirically extremely
broken) OpenCL-on-CPU drivers.

Falling back to CPU-only would be another reasonable alternative, but
doesn't provide an alert in case the GPU drivers are missing.

Improves behavior of issue leela-zero#1994.
Fix full tuner for heterogeneous GPUs and auto precision detection.

--full-tuner implies --tune-only
--full-tuner requires an explicit precision

Fixes leela-zero#1973.

Pull request leela-zero#2004.
Very minor speedup of about 2% with batch size of 1.
With batch size of 5 there is a speedup of about 5% with half precision
and 12% with single precision.

Out transformation memory accesses are almost completely coalesced
with the new kernel.

Pull request leela-zero#2014.
From upstream a807dcf0f8623d40dc5ce9d1eb00ffd0e46150c7.
* CPUPipe : change winograd transformation constants to an equation.

Combined with a series of strength reduction changes, 
improves netbench by about 8%.

* Convert some std::array into individual variables

For some reason this allows gcc to optimize the code better,
improving netbench by 2%.

Pull request leela-zero#2021.
Use hard-coded equations instead of matrix multiplication.

Pull request leela-zero#2023.
Fix Validation -k option by reading its value before the parser is reused.

Pull request leela-zero#2024.
@alreadydone alreadydone merged commit f2a1c69 into 1t-batch-fastexit-tensor Nov 24, 2018
@alreadydone alreadydone deleted the tensorcore+ branch November 24, 2018 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants