-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Advance cursors through stores in strength reduction #105267
Conversation
This allows us to look through CSE'd locals when an IV was CSE'd.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
FYI @AndyAyersMS ... not sure if this makes preview 7 in time. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
@EgorBot -intel using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
namespace Loops
{
public class StrengthReduction
{
private int[] _arrayInts;
private int[] _dest;
[GlobalSetup]
public void Setup()
{
_arrayInts = Enumerable.Range(0, 10000).Select(i => i).ToArray();
_dest = new int[_arrayInts.Length];
}
[Benchmark]
public void CopyInts()
{
Copy(_dest, _arrayInts);
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Copy(int[] dest, int[] src)
{
if (src.Length != dest.Length)
return;
for (int i = 0; i < dest.Length; i++)
{
dest[i] = src[i];
}
}
}
} |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
Benchmark results on Intel
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok taking this. You have 45 mins to merge...
Benchmark results on Intel
|
Benchmark results on Arm64
|
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
cc @dotnet/jit-contrib PTAL @AndyAyersMS (you already approved, but this has changed a bit since then). Diffs. In some cases a size regression, but almost always a perfscore improvement. For example, here's a canonical size regression: |
while ((cur != nullptr) && !cur->OperIs(GT_ARR_ADDR)) | ||
{ | ||
cur = cur->gtGetParent(nullptr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole syntactic treatment of GT_ARR_ADDR
to figure out what address ranges are inside arrays is unfortunate. It's pretty fragile and leads e.g. to #108706 not just working out properly. Not sure how to improve this much however -- maybe we can keep some breadcrumbs about known managed ranges during the cursor advancement and query it any time we need to prove something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For advanced loop opts (parallelization, vectorization) it's not uncommon to create a "raised up" representation that describes the array element as an affine function of the IVs, so maybe we can look into something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like that could be a direction to go, although I guess it would mean that we go from INDEX_ADDR
to expanded nodes in morph and then back to some equivalent of INDEX_ADDR
as part of these opts. But maybe that's fine (and perhaps the structure could be recovered on VNs, similar to the comment on ParseArrayAddress
suggests).
Ping @AndyAyersMS for a re-look |
while ((cur != nullptr) && !cur->OperIs(GT_ARR_ADDR)) | ||
{ | ||
cur = cur->gtGetParent(nullptr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For advanced loop opts (parallelization, vectorization) it's not uncommon to create a "raised up" representation that describes the array element as an affine function of the IVs, so maybe we can look into something like this?
CursorInfo& cursor = m_intermediateIVStores.BottomRef(i); | ||
GenTreeLclVarCommon* store = cursor.Tree->AsLclVarCommon(); | ||
JITDUMP(" Replacing [%06u] with a zero constant\n", Compiler::dspTreeID(store->Data())); | ||
// We cannot remove these stores entirely as that will break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could set the ssa def trees to null, as that is a valid state (see eg #108548), though it might require some tweaking if a lot of code assumes a null def tree means initial value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps in some way remove the def from the SSA information, though I suppose that requires updating SSA numbers which requires finding all uses, so not really tractable with our representation.
This seems to work fine for now though, so I'll probably keep it as is, but it definitely falls into our SSA updating woes...
This allows us to look through CSE'd locals when an IV was CSE'd.
Example diff:
arm64 especially benefits from strength reduction, e.g. a benchmark for the above: