MPNet implementation #1

mehmetcanay · 2024-02-14T13:54:34Z

No description provided.

tiadams

Good ob on the implementations, just some minor comments and suggestions if you want to take a look.

tiadams · 2024-02-14T14:13:50Z

index/mapping.py

@@ -2,16 +2,19 @@
 import numpy as np

 from index.embedding import EmbeddingModel
-from index.db.model import Terminology, Mapping, Concept, Variable


Here might be an issue - the model file was here moved to the db package. Weird that this does not throw an error in the parser test or python package though, I will have to take a look at that.

Could you adjust this?

tiadams · 2024-02-14T14:15:19Z

index/embedding.py

@@ -28,19 +27,42 @@ def get_embedding(self, text: str, model="text-embedding-ada-002"):
                return None
            if isinstance(text, str):
                text = text.replace("\n", " ")
-            return openai.Embedding.create(input=[text], model=model)['data'][0]['embedding']
+            return openai.Embedding.create(input=[text], model=model)["data"][0][


Not really a problem syntax wise but I am wondering about this formatting choice

I'm using Black Formatter on VScode and it automatically formats on save. I will try to disable it.

index/visualisation.py

tiadams · 2024-02-14T14:21:55Z

index/visualisation.py

+    save_plot=False,
+    save_dir="resources/results/plots",
+):
+    if (


Could be done a bit shorter ;)

if not len(A) == len(B) == len(C) do stuff

tiadams · 2024-02-14T14:23:13Z

index/visualisation.py

-    descriptions_table2 = np.concatenate([table.joined_mapping_table["description"] for table in tables2_cleaned])
-    boundaries1 = np.array([table.joined_mapping_table["embedding"].index.size for table in tables1_cleaned])
-    boundaries2 = np.array([table.joined_mapping_table["embedding"].index.size for table in tables2_cleaned])
+        table1.joined_mapping_table.dropna(


Personally I like to format on one line if possible, but this is fine too

tiadams · 2024-02-14T14:39:02Z

index/evaluation.py

+                raise NotImplementedError(
+                    "Specified matching method is not implemented!"
+                )
+        min_distance_indices = np.argsort(np.array(distances))[


formatting error?

tiadams · 2024-02-14T14:44:18Z

index/evaluation.py

@@ -177,3 +271,69 @@ def score_mappings(matches: pd.DataFrame) -> float:
    matches = matches[matches["target_concept_label"].notnull()]
    accuracy = matches["correct"].sum() / len(matches)
    return accuracy
+
+
+def evaluate(


Definitely made sense to move this here instead

tiadams · 2024-02-14T14:45:05Z

index/evaluation.py

+                    source, target, matching_method=MatchingMethod.FUZZY_STRING_MATCHING
+                )
+                if target == "jadni":
+                    print("check")


What is this check for? :D

It was there in the original code, so I just left it :D

tiadams · 2024-02-14T14:46:45Z

index/evaluation.py

+        fuzzy = pd.DataFrame(data_fuzzy, index=labels).T
+        return gpt, fuzzy
+
+    elif model == "mpnet":


You could probably move code out of each if clause to a higher common level to have less code duplication

tiadams · 2024-02-14T14:53:29Z

index/evaluation.py

+            data_mpnet[labels[idx]] = acc_mpnet
+        # transpose to have from -> to | row -> column like in the paper
+        mpnet = pd.DataFrame(data_mpnet, index=labels).T
+        return mpnet


Having a function return a different number of values for different arguments is not a good idea, since it makes the behaviour somewhat less predictable to the user.

Better solution would be here to have it return either gpt/fuzzy/mpnet results (since you already have a model paprameter) .

update: MPNet implementation

9cdf7f6

mehmetcanay changed the title ~~update: MPNet implementation~~ MPNet implementation Feb 14, 2024

tiadams requested changes Feb 14, 2024

View reviewed changes

mehmetcanay added 2 commits February 14, 2024 16:44

fix: formatting

041c4ca

add: model

3456fea

tiadams merged commit b1f500a into main Feb 15, 2024
4 checks passed

tiadams deleted the mpnet-implementation branch February 15, 2024 08:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPNet implementation #1

MPNet implementation #1

mehmetcanay commented Feb 14, 2024

tiadams left a comment

tiadams Feb 14, 2024

tiadams Feb 14, 2024

mehmetcanay Feb 14, 2024

tiadams Feb 14, 2024

tiadams Feb 14, 2024

tiadams Feb 14, 2024

tiadams Feb 14, 2024

tiadams Feb 14, 2024

mehmetcanay Feb 14, 2024

tiadams Feb 14, 2024

tiadams Feb 14, 2024

MPNet implementation #1

MPNet implementation #1

Conversation

mehmetcanay commented Feb 14, 2024

tiadams left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment