Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do Not Merge: v1.10.8+rai compares to v1.10.2+rai #218

Draft
wants to merge 300 commits into
base: v1.10.2+RAI
Choose a base branch
from

Conversation

nickrobinson251
Copy link
Member

No description provided.

KristofferC and others added 30 commits June 4, 2024 12:41
…Lang#54634)

This avoids a: `error: non-private labels cannot appear between
.cfi_startproc / .cfi_endproc pairs` error.
That error was introduced in https://reviews.llvm.org/D155245#4657075
see also llvm/llvm-project#72802

(cherry picked from commit a4e793e)
…ang#54635)

We may use aggressive constprop in `trevc!` to eliminate branches, which
makes the return type of `eigvec(::UpperTriangular)` concretely
inferred.
```julia
julia> @inferred eigvecs(UpperTriangular([1 0; 0 1]))
2×2 Matrix{Float32}:
 1.0  -0.0
 0.0   1.0
```

(cherry picked from commit ea8b4af)
This may introduce a correctness issue in the work-stealing termination
loop if we're using interactive threads and GC threads simultaneously.

Indeed, if we forget to add `nthreadsi` to `nthreads`, then we're
checking in the mark-loop termination protocol a range `[gc_first_tid,
gc_first_tid + jl_n_markthreads)` of threads which is "shifted to the
left" compared to what it should be.

This implies that we will not be checking whether the GC threads with
higher TID actually have terminated the mark-loop.

(cherry picked from commit c52eee2)
…JuliaLang#54671)

The race here is that svec might be replaced and a new binding
introduced into the keyset while we hold a reference to the old svec,
which led to a OOB access on the svec with the index a binding
introduced at the same time. This now introduces a bounds check which
will force taking the lock if we fail the lookup i.e we had a data race.

Fixes JuliaLang#54285

---------

Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit 20f03dd)
…uliaLang#54672)

If the LLVM library was built without symbol versioning, previously this
would return the `nm` output in its entirety, instead of correctly
reporting "" as the LLVM symbol version.

(cherry picked from commit debaa73)
This should remove the dynamic dispatch flagged by JET, e.g. in
```julia
julia> using JET

julia> @report_opt reinterpret(Float64, [1.0im;;])
[ Info: tracking Base
┌ Warning: skipping (::Base.var"#thrownonint#375")(S::Type, T::Type, dim) @ Base reinterpretarray.jl:68 to avoid parsing too much code
└ @ Revise ~/.julia/packages/Revise/bAgL0/src/packagedef.jl:1092
┌ Warning: skipping (::Base.var"#show_bound#661")(io::IO, b) @ Base show.jl:2777 to avoid parsing too much code
└ @ Revise ~/.julia/packages/Revise/bAgL0/src/packagedef.jl:1092
┌ Warning: skipping (::Base.var"#show_bound#661")(io::IO, b) @ Base show.jl:2777 to avoid parsing too much code
└ @ Revise ~/.julia/packages/Revise/bAgL0/src/packagedef.jl:1092
═════ 32 possible errors found ═════
┌ reinterpret(::Type{Float64}, a::Matrix{ComplexF64}) @ Base ./reinterpretarray.jl:88
│┌ (::Base.var"#thrownonint#375")(S::Type{ComplexF64}, T::Type{Float64}, dim::Int64) @ Base ./reinterpretarray.jl:70
││┌ string(::String, ::Type{ComplexF64}, ::String, ::Type{Float64}, ::String, ::Int64, ::String) @ Base ./strings/io.jl:189
│││┌ print_to_string(::String, ::Type{ComplexF64}, ::String, ::Type{Float64}, ::String, ::Int64, ::String) @ Base ./strings/io.jl:148
││││┌ print(io::IOBuffer, x::DataType) @ Base ./strings/io.jl:35
│││││┌ show(io::IOBuffer, x::DataType) @ Base ./show.jl:970
││││││┌ _show_type(io::IOBuffer, x::Type) @ Base ./show.jl:975
│││││││┌ show_typealias(io::IOBuffer, x::Type) @ Base ./show.jl:810
││││││││┌ make_typealias(x::Type) @ Base ./show.jl:620
│││││││││┌ modulesof!(s::Set{Module}, x::Type) @ Base ./show.jl:595
││││││││││ runtime dispatch detected: Base.modulesof!(s::Set{Module}, %20::Any)::Any
│││││││││└────────────────────
│││││││││┌ modulesof!(s::Set{Module}, x::Type) @ Base ./show.jl:596
││││││││││ runtime dispatch detected: Base.modulesof!(s::Set{Module}, %34::Any)::Any
│││││││││└────────────────────
│││││││││┌ modulesof!(s::Set{Module}, x::TypeVar) @ Base ./show.jl:589
││││││││││ runtime dispatch detected: Base.modulesof!(s::Set{Module}, %1::Any)::Set{Module}
│││││││││└────────────────────
│││││││┌ show_typealias(io::IOBuffer, x::Type) @ Base ./show.jl:813
││││││││┌ show_typealias(io::IOBuffer, name::GlobalRef, x::Type, env::Core.SimpleVector, wheres::Vector{TypeVar}) @ Base ./show.jl:760
│││││││││┌ show_typeparams(io::IOContext{IOBuffer}, env::Core.SimpleVector, orig::Core.SimpleVector, wheres::Vector{TypeVar}) @ Base ./show.jl:724
││││││││││┌ show(io::IOContext{IOBuffer}, tv::TypeVar) @ Base ./show.jl:2788
│││││││││││┌ (::Base.var"#show_bound#661")(io::IOContext{IOBuffer}, b::Any) @ Base ./show.jl:2780
││││││││││││ runtime dispatch detected: show(io::IOContext{IOBuffer}, b::Any)::Any
│││││││││││└────────────────────
│││││││││┌ show_typeparams(io::IOContext{IOBuffer}, env::Core.SimpleVector, orig::Core.SimpleVector, wheres::Vector{TypeVar}) @ Base ./show.jl:719
││││││││││ runtime dispatch detected: show(io::IOContext{IOBuffer}, %252::Any)::Any
│││││││││└────────────────────
│││││││││┌ show_typeparams(io::IOContext{IOBuffer}, env::Core.SimpleVector, orig::Core.SimpleVector, wheres::Vector{TypeVar}) @ Base ./show.jl:722
││││││││││ runtime dispatch detected: show(io::IOContext{IOBuffer}, %313::Any)::Any
│││││││││└────────────────────
│││││││││┌ show_typeparams(io::IOContext{IOBuffer}, env::Core.SimpleVector, orig::Core.SimpleVector, wheres::Vector{TypeVar}) @ Base ./show.jl:727
││││││││││ runtime dispatch detected: show(io::IOContext{IOBuffer}, %191::Any)::Any
│││││││││└────────────────────
││││││┌ _show_type(io::IOBuffer, x::Type) @ Base ./show.jl:978
│││││││┌ show_datatype(io::IOBuffer, x::DataType) @ Base ./show.jl:1094
││││││││┌ show_datatype(io::IOBuffer, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1097
│││││││││┌ maybe_kws_nt(x::DataType) @ Base ./show.jl:1085
││││││││││ runtime dispatch detected: eltype(%76::DataType)::Any
│││││││││└────────────────────
││││││││┌ show_datatype(io::IOBuffer, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1186
│││││││││┌ show_typeparams(io::IOBuffer, env::Core.SimpleVector, orig::Core.SimpleVector, wheres::Vector{TypeVar}) @ Base ./show.jl:724
││││││││││┌ show(io::IOBuffer, tv::TypeVar) @ Base ./show.jl:2788
│││││││││││┌ (::Base.var"#show_bound#661")(io::IOBuffer, b::Any) @ Base ./show.jl:2780
││││││││││││ runtime dispatch detected: show(io::IOBuffer, b::Any)::Any
│││││││││││└────────────────────
│││││││││┌ show_typeparams(io::IOBuffer, env::Core.SimpleVector, orig::Core.SimpleVector, wheres::Vector{TypeVar}) @ Base ./show.jl:719
││││││││││ runtime dispatch detected: show(io::IOBuffer, %250::Any)::Any
│││││││││└────────────────────
│││││││││┌ show_typeparams(io::IOBuffer, env::Core.SimpleVector, orig::Core.SimpleVector, wheres::Vector{TypeVar}) @ Base ./show.jl:722
││││││││││ runtime dispatch detected: show(io::IOBuffer, %310::Any)::Any
│││││││││└────────────────────
│││││││││┌ show_typeparams(io::IOBuffer, env::Core.SimpleVector, orig::Core.SimpleVector, wheres::Vector{TypeVar}) @ Base ./show.jl:727
││││││││││ runtime dispatch detected: show(io::IOBuffer, %190::Any)::Any
│││││││││└────────────────────
││││││││┌ show_datatype(io::IOBuffer, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1157
│││││││││ runtime dispatch detected: show(io::IOBuffer, %224::Any)::Any
││││││││└────────────────────
││││││││┌ show_datatype(io::IOBuffer, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1162
│││││││││ runtime dispatch detected: show(io::IOBuffer, %54::Any)::Any
││││││││└────────────────────
││││││││┌ show_datatype(io::IOBuffer, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1148
│││││││││ runtime dispatch detected: show(io::IOBuffer, %57::Any)::Any
││││││││└────────────────────
││││││││┌ show_datatype(io::IOBuffer, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1150
│││││││││ runtime dispatch detected: show(io::IOBuffer, %54::Any)::Any
││││││││└────────────────────
││││││││┌ show_datatype(io::IOBuffer, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1172
│││││││││ runtime dispatch detected: Base.show_at_namedtuple(io::IOBuffer, %329::Tuple, %328::DataType)::Any
││││││││└────────────────────
││││││┌ _show_type(io::IOBuffer, x::Type) @ Base ./show.jl:981
│││││││┌ show_unionaliases(io::IOBuffer, x::Union) @ Base ./show.jl:901
││││││││┌ make_typealiases(x::Union) @ Base ./show.jl:822
│││││││││┌ modulesof!(s::Set{Module}, x::Union) @ Base ./show.jl:595
││││││││││ runtime dispatch detected: Base.modulesof!(s::Set{Module}, %3::Any)::Any
│││││││││└────────────────────
│││││││││┌ modulesof!(s::Set{Module}, x::Union) @ Base ./show.jl:596
││││││││││ runtime dispatch detected: Base.modulesof!(s::Set{Module}, %17::Any)::Any
│││││││││└────────────────────
│││││││┌ show_unionaliases(io::IOBuffer, x::Union) @ Base ./show.jl:914
││││││││ runtime dispatch detected: show(io::IOBuffer, %89::Any)::Any
│││││││└────────────────────
│││││││┌ show_unionaliases(io::IOBuffer, x::Union) @ Base ./show.jl:920
││││││││ runtime dispatch detected: Base.show_typealias(io::IOBuffer, %206::Any, x::Union, %204::Core.SimpleVector, %205::Vector{TypeVar})::Any
│││││││└────────────────────
│││││││┌ show_unionaliases(io::IOBuffer, x::Union) @ Base ./show.jl:928
││││││││ runtime dispatch detected: Base.show_typealias(io::IOBuffer, %269::Any, x::Union, %267::Core.SimpleVector, %268::Vector{TypeVar})::Any
│││││││└────────────────────
││││││┌ _show_type(io::IOBuffer, x::Type) @ Base ./show.jl:985
│││││││┌ show_delim_array(io::IOBuffer, itr::Vector{Any}, op::Char, delim::Char, cl::Char, delim_one::Bool) @ Base ./show.jl:1392
││││││││┌ show_delim_array(io::IOBuffer, itr::Vector{Any}, op::Char, delim::Char, cl::Char, delim_one::Bool, i1::Int64, l::Int64) @ Base ./show.jl:1403
│││││││││ runtime dispatch detected: show(%3::IOContext{IOBuffer}, %52::Any)::Any
││││││││└────────────────────
││││││┌ _show_type(io::IOBuffer, x::Type) @ Base ./show.jl:1012
│││││││┌ show_datatype(io::IOContext{IOBuffer}, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1185
││││││││┌ show_type_name(io::IOContext{IOBuffer}, tn::Core.TypeName) @ Base ./show.jl:1059
│││││││││ runtime dispatch detected: Base.isvisible(%29::Symbol, %86::Module, %80::Any)::Bool
││││││││└────────────────────
│││││││┌ show_datatype(io::IOContext{IOBuffer}, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1157
││││││││ runtime dispatch detected: show(io::IOContext{IOBuffer}, %227::Any)::Any
│││││││└────────────────────
│││││││┌ show_datatype(io::IOContext{IOBuffer}, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1162
││││││││ runtime dispatch detected: show(io::IOContext{IOBuffer}, %55::Any)::Any
│││││││└────────────────────
│││││││┌ show_datatype(io::IOContext{IOBuffer}, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1148
││││││││ runtime dispatch detected: show(io::IOContext{IOBuffer}, %58::Any)::Any
│││││││└────────────────────
│││││││┌ show_datatype(io::IOContext{IOBuffer}, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1150
││││││││ runtime dispatch detected: show(io::IOContext{IOBuffer}, %55::Any)::Any
│││││││└────────────────────
│││││││┌ show_datatype(io::IOContext{IOBuffer}, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1172
││││││││ runtime dispatch detected: Base.show_at_namedtuple(io::IOContext{IOBuffer}, %338::Tuple, %337::DataType)::Any
│││││││└────────────────────
│││││││┌ show_datatype(io::IOContext{IOBuffer}, x::DataType, wheres::Vector{TypeVar}) @ Base ./show.jl:1180
││││││││ runtime dispatch detected: Base.show_at_namedtuple(io::IOContext{IOBuffer}, %387::Tuple, %391::DataType)::Any
│││││││└────────────────────
││││││┌ _show_type(io::IOBuffer, x::Type) @ Base ./show.jl:1014
│││││││ runtime dispatch detected: show(%98::IOContext{IOBuffer}, %99::Any)::Any
││││││└────────────────────
│││││┌ show(io::IOBuffer, x::DataType) @ Base ./show.jl:970
││││││ runtime dispatch detected: Base._show_type(io::IOBuffer, %1::Any)::Nothing
│││││└────────────────────
```

(cherry picked from commit b54c688)
Apparently on some distributions `nm --with-symbol-versions` does not
report symbol versions, despite the flag.

`readelf` should be a more reliable alternative which is also part of
binutils.

See spack/spack#44534 (comment)
for more information

(cherry picked from commit d0f165f)
…JuliaLang#54781)

When testing a new version of `libblastrampoline` that may have flags
and values that we don't know about, raise a nice warning rather than a
hard error.

(cherry picked from commit 5044506)
The fact that `@async` causes the task that it was scheduled from to
also become sticky is well documented in the warning in [`@async`
docs](https://docs.julialang.org/en/v1/base/parallel/#Base.@async), but
it's not clear that creating a task and scheduling it also has the same
effect, by default.

(cherry picked from commit a9b4869)
…ions (JuliaLang#54837)

Enzyme.jl hit an issue where, in a dynamically typed allocation of size
`GC_MAX_SZCLASS`, because we mistakenly added they type tag size to the
allocation, the runtime disagreed if this was a pool allocation or a big
allocation. Causing a crash in the GC

(cherry picked from commit ded0b28)
…JuliaLang#55143)

After investigating JuliaLang/julia#54090, I found that the issue was
not caused by the effects of `checksquare`, but by the use of the
`@simd` macro within `tr(::Matrix)`:

https://github.com/JuliaLang/julia/blob/0945b9d7740855c82a09fed42fbf6bc561e02c77/stdlib/LinearAlgebra/src/dense.jl#L373-L380

While simply removing the `@simd` macro was considered, the strict
left-to-right summation without `@simd` otherwise is not necessarily
more accurate, so I concluded that the problem lies in the test code,
which tests the (strict) equality of two different `tr` execution
results. I have modified the test code to use `≈` instead of `==`.

- fixes #54090
…53961)

This reduces dynamic dispatch and makes JET happier.
Testing on v1.11:
```julia
julia> import LinearAlgebra: checksquare

julia> using JET

julia> @report_opt checksquare(rand(2,2), rand(2,2))
═════ 4 possible errors found ═════
┌ checksquare(::Matrix{Float64}, ::Matrix{Float64}) @ LinearAlgebra /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/LinearAlgebra/src/LinearAlgebra.jl:307
│┌ string(::String, ::Tuple{Int64, Int64}) @ Base ./strings/io.jl:189
││┌ print_to_string(::String, ::Tuple{Int64, Int64}) @ Base ./strings/io.jl:150
│││┌ _unsafe_take!(io::IOBuffer) @ Base ./iobuffer.jl:494
││││┌ wrap(::Type{Array}, m::MemoryRef{UInt8}, l::Int64) @ Base ./array.jl:3101
│││││ failed to optimize due to recursion: wrap(::Type{Array}, ::MemoryRef{UInt8}, ::Int64)
││││└────────────────────
│││┌ print_to_string(::String, ::Vararg{Any}) @ Base ./strings/io.jl:143
││││ runtime dispatch detected: Base._str_sizehint(%17::Any)::Int64
│││└────────────────────
│││┌ print_to_string(::String, ::Vararg{Any}) @ Base ./strings/io.jl:148
││││ runtime dispatch detected: print(%59::IOBuffer, %97::Any)::Any
│││└────────────────────
│││┌ string(::String, ::Int64, ::String, ::Tuple{Int64}, ::String, ::Int64, ::String, ::Int64, ::String) @ Base ./strings/io.jl:189
││││ failed to optimize due to recursion: string(::String, ::Int64, ::String, ::Tuple{Int64}, ::String, ::Int64, ::String, ::Int64, ::String)
│││└────────────────────

julia> function checksquare(A...) # This PR
                  sizes = Int[]
                  for a in A
                      size(a,1)==size(a,2) || throw(DimensionMismatch(lazy"matrix is not square: dimensions are $(size(a))"))
                      push!(sizes, size(a,1))
                  end
                  return sizes
              end
checksquare (generic function with 2 methods)

julia> @report_opt checksquare(rand(2,2), rand(2,2))
No errors detected

```
As discussed in
https://discourse.julialang.org/t/pinv-not-type-stable/103885/14, the
`tol` variable is captured which leads to it being boxed, as suggested
can be fixed by having two separate variables.

(cherry picked from commit 8e21b21)
JuliaLang#55141)

The devdocs here reflect a time when aarch64 was much less well
supported, it also reference Cudadrv which has been archived for years

(cherry picked from commit 2efcfd9)
This macro was added in v1.10 but was missing a compat notice.

(cherry picked from commit 3290904)
`a[begin]` indexing was added by JuliaLang#35779 in Julia 1.6, so this feature
needs a compat notice in the docstring.

(cherry picked from commit 43df7fb)
Correction to JuliaLang#55197: `a[begin]` indexing was added in Julia 1.4
(JuliaLang#33946), not in Julia 1.6 (which just changed the implementation in
JuliaLang#35779). My bad.

(cherry picked from commit 06467eb)
This avoids returning false positives where only the indices are shared.
As the indices are not mutated by array assignments (and are explicitly
warned against mutation in general), we can ignore the case where _only_
the indices are aliasing.

Fix JuliaLang#54621

(cherry picked from commit 97bf148)
…uliaLang#54690)

AllocOpt probably needs to handle that in other places more smartly but
this seems to at least stop it crashing. Fixes issue found in
JuliaLang#54604 (comment) by
@topolarity.

(cherry picked from commit 5cb1107)
…ng#55084)

closes JuliaLang#55083

Shouldu this also check for `\r`?

---------

Co-authored-by: Alex Arslan <[email protected]>
(cherry picked from commit e732706)
This release fixes issues with complex valued returns from functions
such as `cdotc` on Windows x64. See this discussion [0] for initial
diagnosis, and this PR [1] for the relevant fixes.

[0] JuliaLinearAlgebra/BLISBLAS.jl#15
[1] JuliaLinearAlgebra/libblastrampoline#129

(cherry picked from commit 3054c00)
d-netto and others added 28 commits February 26, 2025 12:16
…ng#55326) (#174)

The contents of strings can contain user data which may be proprietary
and emitting them in the heap snapshot makes the heap snapshot a
potential vulnerability rather than a useful debugging artifact.

There are likely other tweaks necessary to make heap snapshots "safe",
but this is one less.

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>
* Add heartbeat pause/resume capability

* Add check to avoid negative sleep duration

* Disable heartbeats in `jl_print_task_backtraces()`

`jl_print_task_backtraces()` can take long enough that there can
be heartbeat loss, which can trigger printing task backtraces
again, unless it is called from the heartbeat thread which takes
care of that possible problem.

* Pause heartbeats for GC

* Address review comment

* Address review comment
…g#55826) (#189)

Additional GC observability tool.

This will help us to diagnose why some of our servers are triggering so
many full GCs in certain circumstances.
Similar to `--trace-compile`, emit the `precompile` statement for a method
once, but only when it is dynamically dispatched.

For this, we rename the `precompiled` field in `jl_method_instance_t` to
`flags` and use bit 0 as `precompiled` and bit 1 as `dispatched`.

When the method is dispatched, the `dispatched` bit is set to 1 and the
precompile statement is emitted. This check is done in
`jl_gf_invoke_by_method` and in the slow path (cache miss) of
`jl_apply_generic`.
…#192)

There was a missing re-assignment of old = -1; at the end of that loop
which means in the ABA case, we accidentally actually acquire the lock
on the thread despite not actually having stopped the thread; or in the
counter-case, we try to run through this logic with old==-1 on the next
iteration, and that isn't valid either (jl_thread_suspend_and_get_state
should return failure and the loop will abort too early).

Fix JuliaLang#56046

Co-authored-by: Jameson Nash <[email protected]>
One limitation of sampling CPU/thread profiles, as is currently done in
Julia, is that they primarily capture samples from CPU-intensive tasks.

If many tasks are performing IO or contending for concurrency primitives
like semaphores, these tasks won’t appear in the profile, as they aren't
scheduled on OS threads sampled by the profiler.

A wall-time profiler, like the one implemented in this PR, samples tasks
regardless of OS thread scheduling. This enables profiling of IO-heavy
tasks and detecting areas of heavy contention in the system.

Co-developed with @nickrobinson251.
Instead of always updating it. This should speed up loading only
method specializations.
* Optionally disallow defining new methods and drop backedges
… counter -- per (module, method name) pair (JuliaLang#53719) (#179)

As mentioned in JuliaLang#53716, we've
been noticing that `precompile` statements lists from one version of our
codebase often don't apply cleanly in a slightly different version.

That's because a lot of nested and anonymous function names have a
global numeric suffix which is incremented every time a new name is
generated, and these numeric suffixes are not very stable across
codebase changes.

To solve this, this PR makes the numeric suffixes a bit more fine
grained: every pair of (module, top-level/outermost function name) will
have its own counter, which should make nested function names a bit more
stable across different versions.

This PR applies @JeffBezanson's idea of making the symbol name changes
directly in `current-julia-module-counter`.

Here is an example:

```Julia
julia> function foo(x)
           function bar(y)
               return x + y
           end
       end
foo (generic function with 1 method)

julia> f = foo(42)
(::var"#bar#foo##0"{Int64}) (generic function with 1 method)
```

Co-authored-by: Diogo Netto <[email protected]>
* Add per-task metrics (JuliaLang#56320)

Close JuliaLang#47351 (builds on top of
JuliaLang#48416)

Adds two per-task metrics:
- running time = amount of time the task was actually running (according
to our scheduler). Note: currently inclusive of GC time, but would be
good to be able to separate that out (in a future PR)
- wall time = amount of time between the scheduler becoming aware of
this task and the task entering a terminal state (i.e. done or failed).

We record running time in `wait()`, where the scheduler stops running
the task as well as in `yield(t)`, `yieldto(t)` and `throwto(t)`, which
bypass the scheduler. Other places where a task stops running (for
`Channel`, `ReentrantLock`, `Event`, `Timer` and `Semaphore` are all
implemented in terms of `wait(Condition)`, which in turn calls `wait()`.
`LibuvStream` similarly calls `wait()`.

This should capture everything (albeit, slightly over-counting task CPU
time by including any enqueuing work done before we hit `wait()`).

The various metrics counters could be a separate inlined struct if we
think that's a useful abstraction, but for now i've just put them
directly in `jl_task_t`. They are all atomic, except the
`metrics_enabled` flag itself (which we now have to check on task
start/switch/done even if metrics are not enabled) which is set on task
construction and marked `const` on the julia side.

In future PRs we could add more per-task metrics, e.g. compilation time,
GC time, allocations, potentially a wait-time breakdown (time waiting on
locks, channels, in the scheduler run queue, etc.), potentially the
number of yields.

Perhaps in future there could be ways to enable this on a per-thread and
per-task basis. And potentially in future these same timings could be
used by `@time` (e.g. writing this same timing data to a ScopedValue
like in JuliaLang#55103 but only for tasks
lexically scoped to inside the `@time` block).

Timings are off by default but can be turned on globally via starting
Julia with `--task-metrics=yes` or calling
`Base.Experimental.task_metrics(true)`. Metrics are collected for all
tasks created when metrics are enabled. In other words,
enabling/disabling timings via `Base.Experimental.task_metrics` does not
affect existing `Task`s, only new `Task`s.

The other new APIs are `Base.Experimental.task_running_time_ns(::Task)`
and `Base.Experimental.task_wall_time_ns(::Task)` for retrieving the new
metrics. These are safe to call on any task (including the current task,
or a task running on another thread). All these are in
`Base.Experimental` to give us room to change up the APIs as we add more
metrics in future PRs (without worrying about release timelines).

cc @NHDaly @kpamnany @d-netto

---------

Co-authored-by: Pete Vilter <[email protected]>
Co-authored-by: K Pamnany <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>

* Address review comments

---------

Co-authored-by: Pete Vilter <[email protected]>
Co-authored-by: K Pamnany <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>
…uliaLang#56814) (#200)

I propose a change in the implementation of the `ReentrantLock` to
improve its overall throughput for short critical sections and fix the
quadratic wake-up behavior where each unlock schedules **all** waiting
tasks on the lock's wait queue.

This implementation follows the same principles of the `Mutex` in the
[parking_lot](https://github.com/Amanieu/parking_lot/tree/master) Rust
crate which is based on the Webkit
[WTF::ParkingLot](https://webkit.org/blog/6161/locking-in-webkit/)
class. Only the basic working principle is implemented here, further
improvements such as eventual fairness will be proposed separately.

The gist of the change is that we add one extra state to the lock,
essentially going from:
```
0x0 => The lock is not locked
0x1 => The lock is locked by exactly one task. No other task is waiting for it.
0x2 => The lock is locked and some other task tried to lock but failed (conflict)
```
To:
```
```

In the current implementation we must schedule all tasks to cause a
conflict (state 0x2) because on unlock we only notify any task if the
lock is in the conflict state. This behavior means that with high
contention and a short critical section the tasks will be effectively
spinning in the scheduler queue.

With the extra state the proposed implementation has enough information
to know if there are other tasks to be notified or not, which means we
can always notify one task at a time while preserving the optimized path
of not notifying if there are no tasks waiting. To improve throughput
for short critical sections we also introduce a bounded amount of
spinning before attempting to park.

Not spinning on the scheduler queue greatly reduces the CPU utilization
of the following example:

```julia
function example()
    lock = ReentrantLock()
    @sync begin
        for i in 1:10000
            Threads.@Spawn begin
                @lock lock begin
                    sleep(0.001)
                end
            end
        end
    end
end

@time example()
```

Current:
```
28.890623 seconds (101.65 k allocations: 7.646 MiB, 0.25% compilation time)
```

![image](https://github.com/user-attachments/assets/dbd6ce57-c760-4f5a-b68a-27df6a97a46e)

Proposed:
```
22.806669 seconds (101.65 k allocations: 7.814 MiB, 0.35% compilation time)
```

![image](https://github.com/user-attachments/assets/b0254180-658d-4493-86d3-dea4c500b5ac)

In a micro-benchmark where 8 threads contend for a single lock with a
very short critical section we see a ~2x improvement.

Current:
```
8-element Vector{Int64}:
 6258688
 5373952
 6651904
 6389760
 6586368
 3899392
 5177344
 5505024
Total iterations: 45842432
```

Proposed:
```
8-element Vector{Int64}:
 12320768
 12976128
 10354688
 12845056
  7503872
 13598720
 13860864
 11993088
Total iterations: 95453184
```

~~In the uncontended scenario the extra bookkeeping causes a 10%
throughput reduction:~~
EDIT: I reverted _trylock to the simple case to recover the uncontended
throughput and now both implementations are on the same ballpark
(without hurting the above numbers).

In the uncontended scenario:

Current:
```
Total iterations: 236748800
```

Proposed:
```
Total iterations: 237699072
```

Closes JuliaLang#56182

Co-authored-by: André Guedes <[email protected]>
…JuliaLang#57004) (#204)

Fixes JuliaLang#56889.

Before this PR, an exception thrown while constructing the objects to
log (the `msg`) would be caught and logged. However, an exception thrown
while _printing_ the msg to an IO would _not_ be caught, and can abort
the program. This breaks the promise that enabling verbose debug logging
shouldn't introduce new crashes.

After this PR, an exception thrown during handle_message is caught and
logged, just like an exception during `msg` construction:

```julia
julia> struct Foo end

julia> Base.show(::IO, ::Foo) = error("oh no")

julia> begin
           # Unexpectedly, the execption thrown while printing `Foo()` escapes
           @info Foo()
           # So we never reach this line! :'(
           println("~~~~~ ALL DONE ~~~~~~~~")
       end
┌ Error: Exception while generating log record in module Main at REPL[10]:3
│   exception =
│    oh no
│    Stacktrace:
│      [1] error(s::String)
│        @ Base ./error.jl:44
│      [2] show(::IOBuffer, ::Foo)
│        @ Main ./REPL[9]:1
...
│     [30] repl_main
│        @ ./client.jl:593 [inlined]
│     [31] _start()
│        @ Base ./client.jl:568
└ @ Main REPL[10]:3
~~~~~ ALL DONE ~~~~~~~~
```

This PR respects the change made in
JuliaLang#36600 to keep the codegen as
small as possible, by putting the new try/catch into a no-inlined
function, so that we don't have to introduce a new try/catch in the
macro-generated code body.

---------

Co-authored-by: Jameson Nash <[email protected]>
---------

Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Nick Robinson <[email protected]>
…7045) (#208)

This is still a work in progress, but it should help determine what a
straggler thread was doing during the stop-the-world phase and why it
failed to reach a safepoint in a timely manner.

We've encountered long TTSP issues in production, and this tool should
provide a valuable means to accurately diagnose them.
…215)

Minor tweak to the error message: embed the exit code of the Julia child
process that failed to compile the package.
* inference: avoid inferring unreachable code methods (JuliaLang#51317)

(cherry picked from commit 0a82b71)

* inference: ensure inferring reachable code methods (JuliaLang#57088)

PR JuliaLang#51317 was a bit over-eager about inferring inferring unreachable
code methods. Filter out the Vararg case, since that can be handled by
simply removing it instead of discarding the whole call.

Fixes JuliaLang#56628

(cherry picked from commit eb9f24c)

---------

Co-authored-by: Jameson Nash <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.