Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multigpu hcal #498

Conversation

vkhristenko
Copy link

PR description:

this is to allow hcal running on a node with multiple gpus.
all the modules have been updated for that and now basically no protection for cuda service is needed.

the only thing in this pr is that the newly added condition's Record should be moved to CondFormats/DataRecord eventually. Here it sits inside of the hcal mahi package

PR validation:

as usual using the provided exec for hcal

@fwyzard fwyzard added the HCAL HCAL-related developments label Jul 4, 2020
@fwyzard
Copy link

fwyzard commented Jul 6, 2020

@vkhristenko why do we move the pulseOffsets to a new "condition" ?
@mariadalfonso how are they handled in CPU code, as a configuration parameter or as a condition ?

@vkhristenko
Copy link
Author

@fwyzard
i moved cause i see no other way to handle a configuration that is an array/vector generically... for the case of multiple gpus

@mariadalfonso
Copy link

@fwyzard pulseOffsets is an algorithm configuration parameter. We use it to decide to fit with 1,3,8 templates.
It's not a detector condition.

My understanding is that the GPU-fit is not configurable, so in principle there is no need to have all this gymnastic

@vkhristenko
Copy link
Author

@mariadalfonso

this is actually not the right way to think about this. this array shows the distance from the sample of interest. and that can be configurable! for instance, if sample of interest goes from 4 to 3, all you need is to change the contents of this array, not the length. for whatever reasons... regarding the length, true, it is made a static parameter in there, but that can be changed in principle... (removing eigen even further and using more mapping kinda stuff)

furthermore, this does not solve a problem, i will need the same for ecal... whenever you want an array of parameters to be allocated/handled generically for the gpu, i found that thru es producer is the easiest way to make this generic, although u gotta write a bunch of code... i'm happy to change that if there are suggestions what i should use ...

@fwyzard
Copy link

fwyzard commented Jul 6, 2020

If it's not configurable but constant, I guess a simple approach is to use a __constant__ C array on the GPU.
If it's configurable, the simplest approach could be to copy it to the GPU on event (using the caching allocator, the stream from the context, etc.)
@makortel, do you have other suggestions for this ?

@vkhristenko
Copy link
Author

vkhristenko commented Jul 6, 2020 via email

@fwyzard
Copy link

fwyzard commented Jul 6, 2020 via email

@mariadalfonso
Copy link

@mariadalfonso

this is actually not the right way to think about this. this array shows the distance from the sample of interest. and that can be configurable! for instance, if sample of interest goes from 4 to 3, all you need is to change the contents of this array, not the length. for whatever reasons... regarding the length, true, it is made a static parameter in there, but that can be changed in principle... (removing eigen even further and using more mapping kinda stuff)

The SOI is stored in the Frame and has actually a dedicated condition for it
https://github.com/cms-sw/cmssw/blob/7a01de257e0495120f3568fe326659909281c9f6/RecoLocalCalo/HcalRecProducers/src/HBHEPhase1Reconstructor.cc#L472

pulseOffsets array there is a parameter of the multifit templates not the HCAL-frame. so it's not a detector condition.
of course you can configure all the parameter you want.
Once it goes in the release, and we will do some meaningful review, we will harmonise all.

furthermore, this does not solve a problem, i will need the same for ecal... whenever you want an array of parameters to be allocated/handled generically for the gpu, i found that thru es producer is the easiest way to make this generic, although u gotta write a bunch of code... i'm happy to change that if there are suggestions what i should use ...

@makortel
Copy link

makortel commented Jul 6, 2020

The problem with event one is how make sure u already transferred things for that device... I guess I could keep that info somehow

What I meant was to transfer them for every event.

Previously I was advised to use the same mechanism as for conditions...

OK, let's keep it like this.

(I still need to look the code but) in general configurable "constants" are currently most effectively treated as conditions data to avoid transferring them on each event.

In principle transferring them from the EDProducer is possible as well, e.g. at beginJob() time, but currently there are no helpers for that. And even then the mode of operation would need to resemble ESProducts (transfer to all devices, somehow check and cache on first events if that transfer has completed).

Copy link

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

edm::ValidityInterval&) override;

private:
edm::ParameterSet const& pset_;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the std::vector<int> would be better.

@@ -239,6 +215,10 @@ void HBHERecHitProducerGPU::acquire(edm::Event const& event,
setup.get<HcalSiPMCharacteristicsRcd>().get(sipmCharacteristicsHandle);
auto const& sipmCharacteristicsProduct = sipmCharacteristicsHandle->getProduct(ctx.stream());

edm::ESHandle<HcalMahiPulseOffsetsGPU> pulseOffsetsHandle;
setup.get<HcalMahiPulseOffsetsGPURecord>().get(pulseOffsetsHandle);
auto const& pulseOffsetsProduct = pulseOffsetsHandle->getProduct(ctx.stream());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to do it once we move to CMSSW_11_2_X ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to do it once we move to CMSSW_11_2_X ?

The "new code must use ESGetToken" is rather a policy than technical requirement (so "need" is before making a PR to CMSSW master). Technically the migration can be done at any time, the ESGetToken API has been there already for over a year.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I was wondering if it is technically enforced in 11.2.x.
If it isn't "before making the PR for master" (or rather "while the PR for master is being reviewed") seems like a good moment :-)

Copy link

@makortel makortel Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not enforced (yet) for EDModules because of ~5500 existing calls that need to be migrated first :)

Comment on lines +7 to +12
HcalMahiPulseOffsetsGPU::HcalMahiPulseOffsetsGPU(edm::ParameterSet const& ps)
{
auto const& values = ps.getParameter<std::vector<int>>("pulseOffsets");
values_.resize(values.size());
std::copy(values.begin(), values.end(), values_.begin());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HcalMahiPulseOffsetsGPU::HcalMahiPulseOffsetsGPU(edm::ParameterSet const& ps)
{
auto const& values = ps.getParameter<std::vector<int>>("pulseOffsets");
values_.resize(values.size());
std::copy(values.begin(), values.end(), values_.begin());
}
HcalMahiPulseOffsetsGPU::HcalMahiPulseOffsetsGPU(edm::ParameterSet const& ps) :
values_(ps.getParameter<std::vector<int>>("pulseOffsets"))
{
}

But I do agree with @makortel , it would be better to pass directly the std::vector<int> rather than the edm::ParameterSet.

auto const& product =
product_.dataForCurrentDeviceAsync(cudaStream, [this](HcalMahiPulseOffsetsGPU::Product& product, cudaStream_t cudaStream) {
// malloc
cudaCheck(cudaMalloc((void**)&product.values, this->values_.size() * sizeof(int)));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makortel is there any reason we shouldn't use the caching allocator here as well ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current caching allocator API does not suit well for the ESProduct case, because the life time of the memory is
not related to the processing queued to the argument cudaStream_t.

I do have an idea (old prototype if I still manage to find it) on how to improve on that on top of #412, although given #487 I'd do it a bit differently.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should interpret the silence on the RFC's as "no strong objections" and just go ahead with further prototyping.

Copy link

@fwyzard fwyzard Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I a re-read to remind myself about it once 11.2.0-pre2 Patatrack is out...

@vkhristenko
Copy link
Author

superseeeded by #502

@vkhristenko vkhristenko closed this Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HCAL HCAL-related developments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants