Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration puzzles for model #533

Merged
merged 6 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/workflows/plantuml.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: plantuml
on:
push:
paths:
- '**.puml'
branches:
- master
permissions:
contents: write
jobs:
plantuml:
runs-on: ubuntu-22.04
steps:
- name: Checkout Source
uses: actions/checkout@v4
- name: Generate SVG Diagrams
uses: holowinski/plantuml-github-action@main
with:
args: -v -tsvg doc/*.puml
- name: Commit changes
uses: EndBug/add-and-commit@v9
with:
author_name: ${{ github.actor }}
author_email: ${{ github.event.pusher.email }}
message: 'Diagram generated'
add: 'doc/*'
14 changes: 14 additions & 0 deletions doc/integration.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
@startuml
title LinerModel 0pdd Integration
participant "Git Repo" as repo
participant 0pdd
participant LinerModel as lm

0pdd -> repo
repo --> 0pdd: .0pdd.yml
alt model: true
0pdd -> lm: Puzzles
lm --> 0pdd: Ranked puzzles
0pdd --> repo: Ranked puzzles
end
@enduml
23 changes: 14 additions & 9 deletions model/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
Puzzle Ranking (Linear ML Model)

<<<<<<< Updated upstream
###### Note: This is an opt-in feature
=======
The data for puzzles is pre-processed and available in `~/data/proper_pdd_data_regression.csv`. In the data, The first row is the column index, the first column is the repo id the puzzle belongs to and the last column is the output variable (`y`).
>>>>>>> Stashed changes

### Internals

The ML model is a linear model with PSO optimizer. The optimizer is used to train the model on puzzle data, the weights are stored and used to predict future puzzles.
The ML model is a linear model with PSO optimizer.
The optimizer is used to train the model on puzzle data,
the weights are stored and used to predict future puzzles.

Because of the time required, training is a non-blocking process, and puzzle prioritization uses a naive ranking approach based on puzzle estimate. Subsequent events use the linear model for prioritization.
Because of the time required, training is a non-blocking process,
and puzzle prioritization uses a naive ranking approach based on puzzle estimate.
Subsequent events use the linear model for prioritization.

The linear model is the external API for the model. It has one method `predict(...)` which accepts an array of puzzles in xml. The output of this model is an array of positional index of the input puzzles:
The linear model is the external API for the model.
It has one method `predict(...)` which accepts an array of puzzles in xml.
The output of this model is an array of positional index of the input puzzles:

```ruby
# usage
Expand All @@ -25,3 +25,8 @@ rank = LinearModel.new(repo_name, storage).predict(puzzles)
#
# rank -> array of positional index of ranked puzzles
```

### Integration

This diagram shows how this model can be integrated into 0pdd workflow:
![integration.svg](../doc/integration.svg)
19 changes: 18 additions & 1 deletion model/linear.rb
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@

#
# Linear Model
# @todo #532:60min Add unit-tests.
# We should add unit-tests for this class that checks puzzle ranking.
# For now its untested, don't forget to remove this puzzle.
#
class LinearModel
def initialize(repo, storage)
Expand All @@ -46,9 +49,23 @@ def initialize(repo, storage)
end
end

# ranks the puzzles using Machine-Learning
# @param puzzles XML puzzles
# @return array of positional index of the input puzzles
# @todo #532:60min Implement a ranked puzzles.
# Let's implement a class that will use `LinearModel` to rank puzzles.
# This class is need in order to do an integration between original 0pdd
# and model modules. Probably it can be a decorator for `Puzzles`
# that ranks XML puzzles, and then submits them into `Puzzles`.
# Don't forget to remove this puzzle.
def predict(puzzles)
weights = @storage.load # load weights for repo from s3
clf = Predictor.new(layers: [{ name: 'w1', shape: [10, 1] }, { name: 'w2', shape: [1, 1] }])
clf = Predictor.new(
layers: [
{ name: 'w1', shape: [10, 1] },
{ name: 'w2', shape: [1, 1] }
]
)
if weights.nil?
train(clf) # find weights for repo backlog of puzzles
ranks = naive_rank(puzzles) # naive rank of puzzles in each repo
Expand Down
5 changes: 5 additions & 0 deletions objects/puzzles.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@

#
# Puzzles in XML/S3
# @todo #532:60min Implement a decorator for optional model configuration load.
# Let's implement a class that decorates `Puzzles` and
# based on presence of `model: true` attribute in YAML config, decides
# whether the puzzles should be ranked or not.
# Don't forget to remove this puzzle.
#
class Puzzles
def initialize(repo, storage)
Expand Down