Add DLRM Model Computational Graph #1532

easyeasydev · 2024-10-31T00:13:19Z

Description of changes:

This PR adds the computational graph of DLRM model.

Related Issues:

Linked Issues:

Issue #

Issues closed by this PR:

Closes #

This change is

easyeasydev

Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)

lib/models/src/models/dlrm/dlrm.cc line 44 at r1 (raw file):

}

tensor_guid_t create_dlrm_mlp(ComputationGraphBuilder &cgb,

This MLP implementation refers to the original FlexFlow implementation at https://github.com/flexflow/FlexFlow/blob/inference/examples/cpp/DLRM/dlrm.cc#L44. Not sure if there was a specific reason to do this in the past. We can decide what to do here. Torchrec doesn't have this logic to initialize the initializers.
@lockshaw

lib/models/src/models/dlrm/dlrm.cc line 145 at r1 (raw file):

      /*mlp_layers=*/config.dense_arch_layer_sizes);

  std::vector<tensor_guid_t> emb_outputs;

As suggested, I tried this:

  std::vector<tensor_guid_t> emb_outputs = transform(
      zip(config.embedding_size, sparse_inputs),
      [&](std::pair<int, tensor_guid_t> &combined_pair) -> tensor_guid_t {
        return create_dlrm_sparse_embedding_network(
            /*cgb=*/cgb,
            /*config=*/config,
            /*input=*/combined_pair.second,
            /*input_dim=*/combined_pair.first,
            /*output_dim=*/config.embedding_dim);
      });

But I couldn't make it compile by using transform and zip...

easyeasydev

Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)

lib/models/src/models/dlrm/dlrm.cc line 128 at r1 (raw file):

  // Create input tensors
  std::vector<tensor_guid_t> sparse_inputs(

This had the following review comment before:

Quote:
I think this current implementation makes sparse_inputs a vector of the same tensor_guid_t over and over again, rather than being a vector of identically-shaped tensors. Was that intentional? If not, then maybe the below code would be better?

std::vector<tensor_guid_t> sparse_inputs = repeat(config.embedding_size.size(), [&]() { return create_input_tensor({config.batch_size, config.embedding_bag_size}, DataType::INT64); });

The intention was to create multiple tensors as the inputs. I'm not sure if this vector creation syntax will lead to having all inputs referring to a single tensor? I would think create_input_tensor should return a copy of tensor for each item in the vector?

codecov · 2024-10-31T00:55:43Z

Codecov Report

Attention: Patch coverage is 97.77778% with 2 lines in your changes missing coverage. Please review.

Project coverage is 60.80%. Comparing base (3ee5e48) to head (b81629c).

Files with missing lines	Patch %	Lines
lib/models/src/models/dlrm/dlrm.cc	97.40%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1532      +/-   ##
==========================================
+ Coverage   60.47%   60.80%   +0.32%     
==========================================
  Files         606      607       +1     
  Lines       14725    14808      +83     
==========================================
+ Hits         8905     9004      +99     
+ Misses       5820     5804      -16

Flag	Coverage Δ
unittests	`60.80% <97.77%> (+0.32%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
lib/pcg/src/pcg/computation_graph_builder.cc	`79.72% <100.00%> (+4.54%)`	⬆️
lib/models/src/models/dlrm/dlrm.cc	`97.40% <97.40%> (ø)`

easyeasydev

Check FB, Torchrec and the paper for DLRM initializer implementation to decide whether we want to keep the special logic.
Try to generate the dot representation.

Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)

lockshaw

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @easyeasydev, @lockshaw, and @reyna-abhyankar)

lib/models/include/models/dlrm/dlrm_config.struct.toml line 29 at r1 (raw file):

[[fields]]
name = "embedding_bag_size"
type = "size_t"

Prefer int (see https://reviewable.io/reviews/flexflow/FlexFlow/1532#-OB3KCVzBHRvW3WmKCxD for explanation)

lib/models/include/models/dlrm/dlrm_config.struct.toml line 37 at r1 (raw file):

[[fields]]
name = "dense_arch_layer_sizes"
type = "std::vector<size_t>"

Prefer int to avoid confusing failures if negative values are used (the long-term solution is to use the nonnegative_int getting added in #1533, but for now this is the best solution)

Suggestion:

type = "std::vector<int>

lib/models/include/models/dlrm/dlrm_config.struct.toml line 41 at r1 (raw file):

[[fields]]
name = "over_arch_layer_sizes"
type = "std::vector<size_t>"

Prefer int (see https://reviewable.io/reviews/flexflow/FlexFlow/1532#-OB3KCVzBHRvW3WmKCxD for explanation)

lib/models/include/models/dlrm/dlrm_config.struct.toml line 45 at r1 (raw file):

[[fields]]
name = "arch_interaction_op"
type = "std::string"

Would this make more sense as an enum? Is there a list of the allowed arch interaction ops somewhere? Based on https://github.com/facebookresearch/dlrm/blob/64063a359596c72a29c670b4fcc9450bb342e764/dlrm_s_pytorch.py#L483-L515 it seems that the only possible values are dot and cat--is that the implementation you were referring to?

lib/models/include/models/dlrm/dlrm_config.struct.toml line 49 at r1 (raw file):

[[fields]]
name = "batch_size"
type = "size_t"

Prefer int (see https://reviewable.io/reviews/flexflow/FlexFlow/1532#-OB3KCVzBHRvW3WmKCxD for explanation)

lib/models/src/models/dlrm/dlrm.cc line 10 at r1 (raw file):

/**
 * @brief Get the default DLRM config.

Why have this in the .cc file rather than the .h file? Does it make any difference in the rendered doxygen docs? (recall that you can run proj doxygen --browser to see the docs)

lib/models/src/models/dlrm/dlrm.cc line 13 at r1 (raw file):

 *
 * @details The configs here refer to the example at
 * https://github.com/flexflow/FlexFlow/blob/inference/examples/cpp/DLRM/dlrm.cc.

This link is great--always welcome to add more of these! 🙂

lib/models/src/models/dlrm/dlrm.cc line 13 at r1 (raw file):

 *
 * @details The configs here refer to the example at
 * https://github.com/flexflow/FlexFlow/blob/inference/examples/cpp/DLRM/dlrm.cc.

Can you change this to a permalink? Right now if inference moves or deletes the dlrm.cc file this url is going to 404, or could also just end up going out-of-date and being confusing. For this type of thing I'd prefer permalinks wherever possible

lib/models/src/models/dlrm/dlrm.cc line 44 at r1 (raw file):

Previously, easyeasydev wrote…

This MLP implementation refers to the original FlexFlow implementation at https://github.com/flexflow/FlexFlow/blob/inference/examples/cpp/DLRM/dlrm.cc#L44. Not sure if there was a specific reason to do this in the past. We can decide what to do here. Torchrec doesn't have this logic to initialize the initializers.
@lockshaw

This at least looks similar? https://github.com/facebookresearch/dlrm/blob/64063a359596c72a29c670b4fcc9450bb342e764/dlrm_s_pytorch.py#L218-L228

lib/models/src/models/dlrm/dlrm.cc line 49 at r1 (raw file):

                              std::vector<size_t> const &mlp_layers) {
  tensor_guid_t t = input;
  for (size_t i = 0; i < mlp_layers.size() - 1; i++) {

You could also just zip mlp_layers and mlp_layers[1:] if you want to avoid all the careful index calculation

Code quote:

 for (size_t i = 0; i < mlp_layers.size() - 1; i++) {

lib/models/src/models/dlrm/dlrm.cc line 50 at r1 (raw file):

  tensor_guid_t t = input;
  for (size_t i = 0; i < mlp_layers.size() - 1; i++) {
    float std_dev = sqrt(2.0f / (mlp_layers[i + 1] + mlp_layers[i]));

Suggestion:

    float std_dev = sqrt(2.0f / (mlp_layers.at(i + 1) + mlp_layers.at(i)));

lib/models/src/models/dlrm/dlrm.cc line 58 at r1 (raw file):

        }};

    std_dev = sqrt(2.0f / mlp_layers[i + 1]);

Suggestion:

    std_dev = sqrt(2.0f / mlp_layers.at(i + 1));

lib/models/src/models/dlrm/dlrm.cc line 66 at r1 (raw file):

    t = cgb.dense(/*input=*/t,
                  /*outDim=*/mlp_layers[i + 1],

Suggestion:

                  /*outDim=*/mlp_layers.at(i + 1),

lib/models/src/models/dlrm/dlrm.cc line 102 at r1 (raw file):

    tensor_guid_t const &bottom_mlp_output,
    std::vector<tensor_guid_t> const &emb_outputs) {
  if (config.arch_interaction_op != "cat") {

Considering that there are only two interaction ops, it might be cleaner to just use an enum?

lib/models/src/models/dlrm/dlrm.cc line 128 at r1 (raw file):

Previously, easyeasydev wrote…

This had the following review comment before:

Quote:
I think this current implementation makes sparse_inputs a vector of the same tensor_guid_t over and over again, rather than being a vector of identically-shaped tensors. Was that intentional? If not, then maybe the below code would be better?
std::vector<tensor_guid_t> sparse_inputs = repeat(config.embedding_size.size(), [&]() { return create_input_tensor({config.batch_size, config.embedding_bag_size}, DataType::INT64); });
The intention was to create multiple tensors as the inputs. I'm not sure if this vector creation syntax will lead to having all inputs referring to a single tensor? I would think create_input_tensor should return a copy of tensor for each item in the vector?

Using a lambda would fix it. I'm actually a bit confused about how this current code compiles--based on the source code of repeat (https://github.com/flexflow/FlexFlow/blob/1d5140d5e98c18e73ce576673aa34022ad6d804f/lib/utils/include/utils/containers/repeat.h#L16), doesn't F need to be callable?

lib/models/src/models/dlrm/dlrm.cc line 135 at r1 (raw file):

  tensor_guid_t dense_input = create_input_tensor(
      {config.batch_size, config.dense_arch_layer_sizes.front()},
      DataType::HALF); // TODO: change this to DataType::FLOAT

Leaving a comment so we don't accidentally merge this PR before this gets addressed

lib/models/src/models/dlrm/dlrm.cc line 145 at r1 (raw file):

Previously, easyeasydev wrote…

As suggested, I tried this:

  std::vector<tensor_guid_t> emb_outputs = transform(
      zip(config.embedding_size, sparse_inputs),
      [&](std::pair<int, tensor_guid_t> &combined_pair) -> tensor_guid_t {
        return create_dlrm_sparse_embedding_network(
            /*cgb=*/cgb,
            /*config=*/config,
            /*input=*/combined_pair.second,
            /*input_dim=*/combined_pair.first,
            /*output_dim=*/config.embedding_dim);
      });

But I couldn't make it compile by using transform and zip...

Can you give me an error message? It's a bit hard to debug without any information.

Also, if you'd like a slightly syntactically-cleaner solution (I'd probably recommend this if you don't mind the little bit of extra work), feel free to just add the zip_with.h implementation from my current branch into this PR, along with the associated tests: https://github.com/lockshaw/FlexFlow/blob/e635b1fe6a1c8a2e8ecb14fe301a4e3843be65fb/lib/utils/include/utils/containers/zip_with.h

lib/pcg/src/pcg/computation_graph_builder.cc line 168 at r1 (raw file):

                                  std::optional<std::string> const &name) {
  // NOT_IMPLEMENTED()
  return input;

Will do.

lockshaw

Reviewed 3 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @easyeasydev and @reyna-abhyankar)

lib/models/src/models/dlrm/dlrm.cc line 135 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Leaving a comment so we don't accidentally merge this PR before this gets addressed

Done.

lib/pcg/src/pcg/computation_graph_builder.cc line 168 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Will do.

Done.

easyeasydev

Reviewable status: 4 of 7 files reviewed, 15 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)

lib/models/include/models/dlrm/dlrm_config.struct.toml line 29 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer int (see https://reviewable.io/reviews/flexflow/FlexFlow/1532#-OB3KCVzBHRvW3WmKCxD for explanation)

Done.

lib/models/include/models/dlrm/dlrm_config.struct.toml line 37 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer int to avoid confusing failures if negative values are used (the long-term solution is to use the nonnegative_int getting added in #1533, but for now this is the best solution)

Done.

lib/models/include/models/dlrm/dlrm_config.struct.toml line 41 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer int (see https://reviewable.io/reviews/flexflow/FlexFlow/1532#-OB3KCVzBHRvW3WmKCxD for explanation)

Done.

lib/models/include/models/dlrm/dlrm_config.struct.toml line 45 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Would this make more sense as an enum? Is there a list of the allowed arch interaction ops somewhere? Based on https://github.com/facebookresearch/dlrm/blob/64063a359596c72a29c670b4fcc9450bb342e764/dlrm_s_pytorch.py#L483-L515 it seems that the only possible values are dot and cat--is that the implementation you were referring to?

Agree enum may make more sense. Do you have an example of using enum in dtgen? I can change to enum following that.

lib/models/include/models/dlrm/dlrm_config.struct.toml line 49 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer int (see https://reviewable.io/reviews/flexflow/FlexFlow/1532#-OB3KCVzBHRvW3WmKCxD for explanation)

Done.

lib/models/src/models/dlrm/dlrm.cc line 10 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why have this in the .cc file rather than the .h file? Does it make any difference in the rendered doxygen docs? (recall that you can run proj doxygen --browser to see the docs)

No particular reason. Have moved to .h file.

lib/models/src/models/dlrm/dlrm.cc line 13 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Can you change this to a permalink? Right now if inference moves or deletes the dlrm.cc file this url is going to 404, or could also just end up going out-of-date and being confusing. For this type of thing I'd prefer permalinks wherever possible

Have updated to the permalink. Thanks for the suggestion - I wasn't aware of this "permalink" until you pointed out.

lib/models/src/models/dlrm/dlrm.cc line 44 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This at least looks similar? https://github.com/facebookresearch/dlrm/blob/64063a359596c72a29c670b4fcc9450bb342e764/dlrm_s_pytorch.py#L218-L228

Good point. I have linked the permalink in the code.

lib/models/src/models/dlrm/dlrm.cc line 49 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

You could also just zip mlp_layers and mlp_layers[1:] if you want to avoid all the careful index calculation

Yes, agree that should work as well. But this current version should be clear to audience as well (hopefully lol).

lib/models/src/models/dlrm/dlrm.cc line 102 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Considering that there are only two interaction ops, it might be cleaner to just use an enum?

Agree. I can switch to enum if we have an example in dtgen.

lib/models/src/models/dlrm/dlrm.cc line 128 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Using a lambda would fix it. I'm actually a bit confused about how this current code compiles--based on the source code of repeat (https://github.com/flexflow/FlexFlow/blob/1d5140d5e98c18e73ce576673aa34022ad6d804f/lib/utils/include/utils/containers/repeat.h#L16), doesn't F need to be callable?

Hmm, yea, I actually meant I couldn't switch to use repeat after a few attempts. The quote above was your original suggestion in the lost PR.

I guess the main question here is that whether we want to create "multiple copies" of tensor_guid_t in this vector here.

lib/models/src/models/dlrm/dlrm.cc line 145 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Can you give me an error message? It's a bit hard to debug without any information.

Also, if you'd like a slightly syntactically-cleaner solution (I'd probably recommend this if you don't mind the little bit of extra work), feel free to just add the zip_with.h implementation from my current branch into this PR, along with the associated tests: https://github.com/lockshaw/FlexFlow/blob/e635b1fe6a1c8a2e8ecb14fe301a4e3843be65fb/lib/utils/include/utils/containers/zip_with.h

Hmm, this compilation error looks like:

/home/eric/ff3/FlexFlow/lib/models/src/models/dlrm/dlrm.cc: In function ‘FlexFlow::ComputationGraph FlexFlow::get_dlrm_computation_graph(const FlexFlow::DLRMConfig&)’:
/home/eric/ff3/FlexFlow/lib/models/src/models/dlrm/dlrm.cc:155:51: error: no matching function for call to ‘transform(std::vector<std::pair<int, FlexFlow::tensor_guid_t>, std::allocator<std::pair<int, FlexFlow::tensor_guid_t> > >, FlexFlow::get_dlrm_computation_graph(const FlexFlow::DLRMConfig&)::<lambda(std::pair<int, FlexFlow::tensor_guid_t>&)>)’
  155 | std::vector<tensor_guid_t> emb_outputs = transform(
      |                                          ~~~~~~~~~^
  156 |       zip(config.embedding_size, sparse_inputs),
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   
  157 |       [&](std::pair<int, tensor_guid_t> &combined_pair) -> tensor_guid_t {
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  158 |         return create_dlrm_sparse_embedding_network(
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  159 |             /*cgb=*/cgb,
      |             ~~~~~~~~~~~~                           
  160 |             /*config=*/config,
      |             ~~~~~~~~~~~~~~~~~~                     
  161 |             /*input=*/combined_pair.second,
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~        
  162 |             /*input_dim=*/combined_pair.first,
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~     
  163 |             /*output_dim=*/config.embedding_dim);
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  164 |       });
      |       ~~

I can check the zip_with.h implementation.

lib/models/src/models/dlrm/dlrm.cc line 50 at r1 (raw file):

  tensor_guid_t t = input;
  for (size_t i = 0; i < mlp_layers.size() - 1; i++) {
    float std_dev = sqrt(2.0f / (mlp_layers[i + 1] + mlp_layers[i]));

Done.

lib/models/src/models/dlrm/dlrm.cc line 58 at r1 (raw file):

        }};

    std_dev = sqrt(2.0f / mlp_layers[i + 1]);

Done.

lib/models/src/models/dlrm/dlrm.cc line 66 at r1 (raw file):

    t = cgb.dense(/*input=*/t,
                  /*outDim=*/mlp_layers[i + 1],

Done.

lockshaw

Reviewed 3 of 3 files at r5, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @easyeasydev and @reyna-abhyankar)

lib/models/include/models/dlrm/dlrm_config.struct.toml line 45 at r1 (raw file):

Previously, easyeasydev wrote…

Agree enum may make more sense. Do you have an example of using enum in dtgen? I can change to enum following that.

https://github.com/flexflow/flexflow-train/blob/master/lib/op-attrs/include/op-attrs/activation.enum.toml

lib/models/src/models/dlrm/dlrm.cc line 49 at r1 (raw file):

Previously, easyeasydev wrote…

Yes, agree that should work as well. But this current version should be clear to audience as well (hopefully lol).

Fair enough--in the future I probably prefer the zip approach (handcoded arithmetic is just easier to accidentally mess up), but it's not a big issue here

lib/models/src/models/dlrm/dlrm.cc line 128 at r1 (raw file):

Previously, easyeasydev wrote…

Hmm, yea, I actually meant I couldn't switch to use repeat after a few attempts. The quote above was your original suggestion in the lost PR.

I guess the main question here is that whether we want to create "multiple copies" of tensor_guid_t in this vector here.

Based on

flexflow-train/examples/cpp/DLRM/dlrm.cc

Lines 138 to 143 in 78307b0

    
           std::vector<Tensor> sparse_inputs; 
        
           for (size_t i = 0; i < dlrmConfig.embedding_size.size(); i++) { 
        
             int const dims[] = {ffConfig.batchSize, dlrmConfig.embedding_bag_size}; 
        
             Tensor input = ff.create_tensor<2>(dims, DT_INT64); 
        
             sparse_inputs.push_back(input); 
        
           }

I'd say you want multiple different tensor_guid_ts, in which case you should use the repeat code snippet you included above (and if you get any build errors using it, send me the error message so I can help debug)

lib/models/src/models/dlrm/dlrm.cc line 145 at r1 (raw file):

Previously, easyeasydev wrote…

Hmm, this compilation error looks like:

/home/eric/ff3/FlexFlow/lib/models/src/models/dlrm/dlrm.cc: In function ‘FlexFlow::ComputationGraph FlexFlow::get_dlrm_computation_graph(const FlexFlow::DLRMConfig&)’:
/home/eric/ff3/FlexFlow/lib/models/src/models/dlrm/dlrm.cc:155:51: error: no matching function for call to ‘transform(std::vector<std::pair<int, FlexFlow::tensor_guid_t>, std::allocator<std::pair<int, FlexFlow::tensor_guid_t> > >, FlexFlow::get_dlrm_computation_graph(const FlexFlow::DLRMConfig&)::<lambda(std::pair<int, FlexFlow::tensor_guid_t>&)>)’
  155 | std::vector<tensor_guid_t> emb_outputs = transform(
      |                                          ~~~~~~~~~^
  156 |       zip(config.embedding_size, sparse_inputs),
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   
  157 |       [&](std::pair<int, tensor_guid_t> &combined_pair) -> tensor_guid_t {
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  158 |         return create_dlrm_sparse_embedding_network(
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  159 |             /*cgb=*/cgb,
      |             ~~~~~~~~~~~~                           
  160 |             /*config=*/config,
      |             ~~~~~~~~~~~~~~~~~~                     
  161 |             /*input=*/combined_pair.second,
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~        
  162 |             /*input_dim=*/combined_pair.first,
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~     
  163 |             /*output_dim=*/config.embedding_dim);
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  164 |       });
      |       ~~

I can check the zip_with.h implementation.

In the future a little more of the error message would be helpful (you can also just include a link to a paste of the full error output somewhere if you'd prefer), but I think in this case it's because the lambda should take the pair as const &, not as mutable &

easyeasydev

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @lockshaw and @reyna-abhyankar)

lib/models/include/models/dlrm/dlrm_config.struct.toml line 45 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

https://github.com/flexflow/flexflow-train/blob/master/lib/op-attrs/include/op-attrs/activation.enum.toml

Converted it to an enum type.

lib/models/src/models/dlrm/dlrm.cc line 128 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Based on

flexflow-train/examples/cpp/DLRM/dlrm.cc

Lines 138 to 143 in 78307b0

std::vector<Tensor> sparse_inputs;

for (size_t i = 0; i < dlrmConfig.embedding_size.size(); i++) {

int const dims[] = {ffConfig.batchSize, dlrmConfig.embedding_bag_size};

Tensor input = ff.create_tensor<2>(dims, DT_INT64);

sparse_inputs.push_back(input);

}

I'd say you want multiple different tensor_guid_ts, in which case you should use the repeat code snippet you included above (and if you get any build errors using it, send me the error message so I can help debug)

Updated to use the repeat.

lib/models/src/models/dlrm/dlrm.cc line 145 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

In the future a little more of the error message would be helpful (you can also just include a link to a paste of the full error output somewhere if you'd prefer), but I think in this case it's because the lambda should take the pair as const &, not as mutable &

Thanks! This was indeed because of the const. I have updated this.

…to dlrm-pcg

easyeasydev added 2 commits October 30, 2024 17:00

For safety...

7551234

Update

aca2810

easyeasydev requested review from reyna-abhyankar and lockshaw October 31, 2024 00:13

easyeasydev commented Oct 31, 2024

View reviewed changes

easyeasydev commented Nov 7, 2024

View reviewed changes

lockshaw requested changes Nov 9, 2024

View reviewed changes

lockshaw added 4 commits November 8, 2024 19:06

Add cast support in ComputationGraphBuilder

4bd9d2c

Minor bugfixes for DLRM computation graph builder support

242fdca

Format

99ecd8f

Remove completed todo

12d0fe0

lockshaw reviewed Nov 11, 2024

View reviewed changes

goliaro added the repo-refactor label Nov 22, 2024

Update based on review comments

e0e10cc

easyeasydev commented Nov 28, 2024

View reviewed changes

lockshaw removed the repo-refactor label Dec 16, 2024

lockshaw changed the base branch from repo-refactor to master December 16, 2024 08:36

lockshaw requested changes Dec 19, 2024

View reviewed changes

Merge branch 'master' into dlrm-pcg

0e2e07f

easyeasydev commented Jan 23, 2025

View reviewed changes

easyeasydev and others added 4 commits January 22, 2025 21:07

Address review comments

4001926

Merge branch 'master' into dlrm-pcg

7b15464

Add dlrm dot graph

c8d0062

Merge branch 'dlrm-pcg' of https://github.com/easyeasydev/FlexFlow in…

b81629c

…to dlrm-pcg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DLRM Model Computational Graph #1532

Add DLRM Model Computational Graph #1532

easyeasydev commented Oct 31, 2024 •

edited

Loading

easyeasydev left a comment

easyeasydev left a comment

codecov bot commented Oct 31, 2024 •

edited

Loading

easyeasydev left a comment

lockshaw left a comment

lockshaw left a comment

easyeasydev left a comment

lockshaw left a comment

easyeasydev left a comment

	std::vector<Tensor> sparse_inputs;
	for (size_t i = 0; i < dlrmConfig.embedding_size.size(); i++) {
	int const dims[] = {ffConfig.batchSize, dlrmConfig.embedding_bag_size};
	Tensor input = ff.create_tensor<2>(dims, DT_INT64);
	sparse_inputs.push_back(input);
	}

Add DLRM Model Computational Graph #1532

Are you sure you want to change the base?

Add DLRM Model Computational Graph #1532

Conversation

easyeasydev commented Oct 31, 2024 • edited Loading

easyeasydev left a comment

Choose a reason for hiding this comment

easyeasydev left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 31, 2024 • edited Loading

Codecov Report

easyeasydev left a comment

Choose a reason for hiding this comment

lockshaw left a comment

Choose a reason for hiding this comment

lockshaw left a comment

Choose a reason for hiding this comment

easyeasydev left a comment

Choose a reason for hiding this comment

lockshaw left a comment

Choose a reason for hiding this comment

easyeasydev left a comment

Choose a reason for hiding this comment

easyeasydev commented Oct 31, 2024 •

edited

Loading

codecov bot commented Oct 31, 2024 •

edited

Loading