Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor duplicate code #7

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

probinso
Copy link

@probinso probinso commented Mar 26, 2021

I am working with Grammatech to use the DIRE model, and found a fair amount of duplicate code. I am working to refactor this project so that duplicate code is removed and instead the shared modules are used as libraries. In standardizing the duplicate code, I found a few discrepancies in the code and was hoping you could help me resolve them before I refactor further.

I also used isort and black to standardize code formatting.

I simplified all differences that were not operational, however there are some operational differences in the model and utils directories under prediction-plugin and neural-model.

  • model/attentional_recurrent_subtoken_decocder.py:AttentionalRecurrentSubtokenDecoder.predict() differs by
                # Take `len(beam)` more candidates to account for possible duplicate
                k = min(beam_cont_cand_hyp_scores.numel(), cont_beam_size + len(beam))

and the use of a beam_cnt variable

                beam_cnt = 0
                for i in range(len(beam_new_hyp_positions)):
                    ...
                    beam_cnt += 1
                    ...
                    if beam_cnt >= cont_beam_size:
                        break
  • model/graph_encoder.py:GraphASTEncoder.unpack_encoding() differs by the specified layer being of bool() or byte() type
            (1.0 - batch_tree_node_masks).bool().unsqueeze(-1), 0.0
  • utils/evalution.py:Evaluator.average() differs by value set to avg_results["corpus_cer"]
        avg_results["corpus_cer"] = sum(agg_results["edit_distance"]) / sum(
            agg_results["ref_len"]
        )
  • utils/preprocess.py:is_valid_example() differs by a more complicated is_valid check
        is_valid = (
            example.ast.size < 300
            and len(example.variable_name_map) > 0
            and any(k != v for k, v in example.variable_name_map.items())
        )

If you have any input on these discrepancies, then I would be very appreciative, and can move forward with further refactoring :). I am very excited to be working with this code

@edmcman
Copy link
Collaborator

edmcman commented Mar 26, 2021

Hi @probinso,

That's great to hear someone is working with the code. I've done quite a bit of refactoring on the ed-wip branch. I think we should probably merge this into master and you should probably start from there to avoid duplicating the effort. Plus, I think it will resolve (most?) of the discrepancies you noted.

So, I would think you can start by re-running your code formatters on ed-wip and see if there are any unresolved questions.

I'll make a PR for ed-wip.

@probinso
Copy link
Author

probinso commented Apr 2, 2021

Sorry for the slow follow up. We had a project come up that needed attention. I will be back on this next week. I'm excited to look at the new branch :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants