Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update squeeze-excitation + ladder/legality/liberty to gcp/next #97

Open
wants to merge 102 commits into
base: patch-31
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
488de43
Use L2-norm in self check.
Ttl Aug 6, 2018
d2ad525
OpenCL tuner fixes.
Ttl Aug 9, 2018
87c95c4
Change policy vector to array.
TFiFiE Aug 10, 2018
e72496d
Fall back to single precision net on breakage.
ihavnoid Aug 14, 2018
681229a
AutoGTP: use compressed weights networks.
marcocalignano Aug 14, 2018
07c908e
Fix OpenCL buffer sizes.
Ttl Aug 14, 2018
f85a685
Script for quantizing weights.
Ttl Aug 16, 2018
ebfe51a
Network initialization restructuring.
ihavnoid Aug 20, 2018
7e889c7
Fix comments, code style.
gcp Aug 20, 2018
8bb0da6
Validation: support GTP commands for each binary.
Hersmunch Aug 20, 2018
bd36100
Don't refer to stone locations as "squares".
TFiFiE Aug 20, 2018
6eecb1e
Don't use "void" as function parameter.
TFiFiE Sep 3, 2018
0549816
Isolate and clean up text-to-vertex conversion.
TFiFiE Sep 5, 2018
73f1f93
Packaging improvements.
infinity0 Sep 5, 2018
f3fbcaa
Improve MTCS a bit.
Sep 5, 2018
1042cb6
Convert string before variadic function call.
TFiFiE Sep 7, 2018
51cba90
Always expect 2 arguments after "play" command.
TFiFiE Sep 7, 2018
b290f47
Update README with new boost dependencies.
jest Sep 7, 2018
5d4bd2f
Fix boost package reference for VS2017 build.
kuba97531 Sep 13, 2018
5bd2ef4
Added missing files to MSVC 2015 projects.
ihavnoid Sep 13, 2018
dd95cab
Make Winograd matrices global.
Ttl Sep 13, 2018
5412e66
OpenCL : Don't copy on weight construction.
ihavnoid Sep 13, 2018
7e13bf0
Winograd filter transform and CPU in transform optimization.
Ttl Sep 13, 2018
15e1bd1
"Lockless" UCTNode.
ihavnoid Sep 13, 2018
cd48427
Pass network weight as a std::shared_ptr class.
ihavnoid Sep 13, 2018
0a0d134
Fix vectorized Winograd transform.
Ttl Sep 14, 2018
c21c8a4
Remove unused lambda capture.
TFiFiE Sep 17, 2018
cff3917
Reduce network memory usage when autodetecting.
ihavnoid Sep 17, 2018
8b628ea
Make maximum memory consumption configurable.
kuba97531 Sep 17, 2018
c6999fc
Assorted style nits and minor bugfixes.
gcp Sep 17, 2018
aaf1038
Fix "NN eval" so it is never the search result.
gcp Sep 17, 2018
71c6a36
Update .gitignore to include ".vs/".
AncalagonX Sep 19, 2018
bf2e767
Add some more const correctness.
nerai Sep 24, 2018
dac5a1f
Fixes assert failure on wait_expanded().
ihavnoid Sep 24, 2018
8abf0d2
Only run assertion logic in debug mode.
gcp Sep 24, 2018
a0f60cb
Make lz-analyze output policy prior.
alreadydone Sep 24, 2018
c64dd2a
Fix up lz-analyse formats.
gcp Sep 24, 2018
142199c
Fix memory estimation for auto-detected gpu.
kuba97531 Sep 25, 2018
72431e2
Include Eigen as BLAS replacement.
gcp Sep 26, 2018
0b3ee48
Add side to move in lz-analyze command.
gcp Sep 26, 2018
04aeb54
Autoselect half mode for fp16 compute.
gcp Sep 26, 2018
720d5af
Don't let printsgf output consecutive newlines.
gcp Sep 26, 2018
8f6f830
Add Eigen include path to MSVC 2017 solution.
gcp Sep 26, 2018
e2d16fa
Remove set SDK in MSVC 2017 solution.
gcp Sep 26, 2018
408efbb
Appveyor jobs for msbuild in MSVC 2015/2017.
ChinChangYang Oct 1, 2018
a261168
Fix fp16/fp32 autodetection.
ihavnoid Oct 4, 2018
cd1de6e
Update README.md.
gcp Oct 12, 2018
6881787
Add GNUInstallDirs include.
gcp Oct 15, 2018
1c384e1
Implement more options in lz-setoption.
gcp Oct 15, 2018
4830a95
Fix assert-fail when memory is completely full.
ihavnoid Oct 15, 2018
7f5073e
Report tuner errors to stderr.
Ttl Oct 15, 2018
60f0cff
Fixes for various net initialization issues.
ihavnoid Oct 15, 2018
2e079fc
Update OpenCL headers link.
gcp Oct 22, 2018
8a57a85
Add missing GTP terminator for lz-setoption cases.
gcp Oct 22, 2018
4bd7cd4
Switch AutoGTP to HTTPS.
gcp Oct 22, 2018
a1a4af8
Remove COLAB Readme.
gcp Oct 23, 2018
ac88220
Update links and Todo in README.
gcp Oct 23, 2018
fc54323
Remove reference to Colab README.
gcp Oct 23, 2018
82d5f25
Tiny style fix.
gcp Oct 23, 2018
b2a40e4
Separate FPU-reduction variable for root.
TFiFiE Oct 29, 2018
40260b0
Link to instructions for running on the cloud.
wonderingabout Oct 29, 2018
a0baa60
Update FAQ.md.
LL145 Oct 29, 2018
2e4f3e6
Fix Windows flag check for input buffering.
gcp Oct 29, 2018
d1225db
Update AUTHORS.
gcp Oct 31, 2018
4fd6e69
Bump version numbers.
gcp Oct 31, 2018
6d16497
AutoGTP: update build dir of leelaz in README.md.
gcp Nov 2, 2018
1fe59c6
Correctly initialize board when reading SGF.
zliu1022 Nov 5, 2018
5cd4d8f
Increase memory limit for 32-bit builds.
gcp Nov 5, 2018
631b88f
Never select a CPU during OpenCL autodetection.
gcp Nov 5, 2018
6f58159
Fix tuner for heterogeneous GPUs and auto precision.
ihavnoid Nov 15, 2018
32c75e3
Optimized out and out_in kernels.
Ttl Nov 15, 2018
c72cb3a
Update OpenCL C++ headers.
gcp Nov 16, 2018
b833952
CPU-only eval performance optimization.
ihavnoid Nov 17, 2018
304f9c7
Convolve in/out performance optimization.
ihavnoid Nov 17, 2018
fc8d080
Validation: fix -k option.
Hersmunch Nov 17, 2018
6e88b95
Add link to Azure free trial instructions.
gcp Nov 19, 2018
666c0c6
Cleanup atomics and dead if.
sethtroisi Nov 20, 2018
8670a40
Const in SGFTree.
sethtroisi Nov 20, 2018
77582b9
Make the README more clear.
gcp Nov 26, 2018
8daa0cd
Refactor to allow AutoGTP to use Engine.
Hersmunch Nov 29, 2018
64097f0
Fix printf call style.
Dec 4, 2018
c157d0b
Update Khronos OpenCL C++ headers.
gcp Dec 7, 2018
bc3e750
Cleanup loop code.
sethtroisi Nov 20, 2018
d166740
AutoGTP: allow specifying an SGF as initial position.
Hersmunch Dec 19, 2018
08efb53
Support separate options for white in match games.
Hersmunch Dec 24, 2018
39be654
Add O(sqrt(log(n))) scaling to tree search.
Ttl Dec 7, 2018
21e3580
Option to get network output without writing to cache.
TFiFiE Dec 24, 2018
808bb43
Add permission to link with NVIDIA libs. Update year.
gcp Jan 4, 2019
ce41cc1
Add link to GoReviewPartner.
roy7 Jan 15, 2019
4ca0734
Reminder to install OpenCL driver if seperate.
gcp Jan 15, 2019
d4c0380
Fixed leelaz_file on Android.
inclement Jan 15, 2019
f944b97
Fix 'catching polymorphic type by value' warning.
akdtg Jan 15, 2019
4f12925
Fixed converter script for minigo removing bias.
sethtroisi Jan 15, 2019
44d0e6a
Add zlib to the mac OS X build instructions.
gcp Jan 15, 2019
d192fc6
UCTNodePtr rare race condition fix.
ihavnoid Jan 15, 2019
bd0d667
Make sure analysis is printed at least once.
dbosst Jan 15, 2019
1960e93
Don't post if not requested.
gcp Jan 15, 2019
fc83ec7
AutoGTP: Allow specifying initial GTP commands.
Hersmunch Jan 15, 2019
c7feb53
Update Eigen to 3.3.7.
gcp Jan 15, 2019
085d71b
Fix lz-setoption name playouts.
gcp Jan 22, 2019
9831c96
AutoGTP: More info in SGF comments.
Hersmunch Jan 22, 2019
885b9eb
copy branch
alreadydone Jan 31, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix tuner for heterogeneous GPUs and auto precision.
Fix full tuner for heterogeneous GPUs and auto precision detection.

--full-tuner implies --tune-only
--full-tuner requires an explicit precision

Fixes leela-zero#1973.

Pull request leela-zero#2004.
ihavnoid authored and gcp committed Nov 15, 2018
commit 6f58159a6b8166bead0968fa9d715209293197b0
18 changes: 16 additions & 2 deletions src/Leela.cpp
Original file line number Diff line number Diff line change
@@ -92,8 +92,9 @@ static void parse_commandline(int argc, char *argv[]) {
("full-tuner", "Try harder to find an optimal OpenCL tuning.")
("tune-only", "Tune OpenCL only and then exit.")
#ifdef USE_HALF
("precision", po::value<std::string>(), "Floating-point precision (single/half/auto).\n"
"Default is to auto which automatically determines which one to use.")
("precision", po::value<std::string>(),
"Floating-point precision (single/half/auto).\n"
"Default is to auto which automatically determines which one to use.")
#endif
;
#endif
@@ -218,6 +219,11 @@ static void parse_commandline(int argc, char *argv[]) {

if (vm.count("full-tuner")) {
cfg_sgemm_exhaustive = true;

// --full-tuner auto-implies --tune-only. The full tuner is so slow
// that nobody will wait for it to finish befure running a game.
// This simply prevents some edge cases from confusing other people.
cfg_tune_only = true;
}

if (vm.count("tune-only")) {
@@ -238,6 +244,14 @@ static void parse_commandline(int argc, char *argv[]) {
exit(EXIT_FAILURE);
}
}
if (cfg_precision == precision_t::AUTO) {
// Auto precision is not supported for full tuner cases.
if (cfg_sgemm_exhaustive) {
printf("Automatic precision not supported when doing exhaustive tuning\n");
printf("Please add '--precision single' or '--precision half'\n");
exit(EXIT_FAILURE);
}
}
#endif
#endif

10 changes: 6 additions & 4 deletions src/OpenCL.cpp
Original file line number Diff line number Diff line change
@@ -790,11 +790,13 @@ void OpenCL<net_t>::initialize(const int channels) {
auto sgemm_tuners =
t.load_sgemm_tuners(channels, WINOGRAD_P, channels, WINOGRAD_TILE);

// Exit immediately after tuning. Some NVIDIA drivers are buggy
// and will fail to compile the rest of the kernels after a tuning
// run. See #729.
// Some NVIDIA drivers are buggy and will fail to compile the rest of the
// kernels after a tuning run.
if (cfg_tune_only) {
exit(EXIT_SUCCESS);
// Originally this was an exit() but this will make the tuner
// only tune the first GPU. Return instead. Exit will be called
// after all GPUs are created.
return;
}

// Build program for these specific devices
6 changes: 6 additions & 0 deletions src/OpenCLScheduler.cpp
Original file line number Diff line number Diff line change
@@ -113,6 +113,12 @@ void OpenCLScheduler<net_t>::initialize(const int channels) {
}
gnum++;
}

// Exit immediately after tuning. We should exit here because we skipped
// initializing rest of the kernels due to some NVIDIA drivers crashing.
if (cfg_tune_only) {
exit(EXIT_SUCCESS);
}
}

template<typename net_t>
22 changes: 21 additions & 1 deletion src/Tuner.cpp
Original file line number Diff line number Diff line change
@@ -40,6 +40,9 @@

const auto TUNER_FILE_LOCAL = std::string("leelaz_opencl_tuning");

template <typename net_t>
std::vector<std::string> Tuner<net_t>::tuned_devices;

#ifndef USE_BLAS
// Eigen helpers
template <typename T>
@@ -579,7 +582,24 @@ std::string Tuner<net_t>::load_sgemm_tuners(const int m, const int n, const int
const int batch_size) {
auto tuner_file = leelaz_file(TUNER_FILE_LOCAL);
auto file = std::ifstream{tuner_file};
if (!cfg_sgemm_exhaustive && file.good()) {

auto try_prior_tuning = file.good();

// If we want full tuning, don't reuse previously tuned results
// except if the tuning was created from this run from a different
// GPU instance with the same name. This prevents the tuner running
// for multiple times if the system has multiple same GPUs.
if (try_prior_tuning && cfg_sgemm_exhaustive) {
auto dev = m_opencl.get_device_name();
try_prior_tuning = std::any_of(
begin(tuned_devices),
end(tuned_devices),
[&dev](const std::string & x) { return dev == x; }
);
}
tuned_devices.emplace_back(m_opencl.get_device_name());

if (try_prior_tuning) {
auto line = std::string{};
while (std::getline(file, line)) {
auto tuners = sgemm_tuners_from_line(line, m, n, k, batch_size);
4 changes: 4 additions & 0 deletions src/Tuner.h
Original file line number Diff line number Diff line change
@@ -40,6 +40,10 @@ class Tuner {
std::string load_sgemm_tuners(const int m, const int n, const int k,
const int batch_size);

// list of device types that was tuned in this run.
// This is to prevent the same device from being tuned multiple times.
static std::vector<std::string> tuned_devices;

static constexpr auto TUNER_VERSION = 0;
Tuner(OpenCL<net_t> & opencl, cl::Context context, cl::Device device) :
m_opencl(opencl), m_context(context), m_device(device) {}