You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created this issue labeled as epic to have general planning for this project. All other tasks, subtasks, etc will be linked to this as separate issue pages. Let's ask Reza (@rezaBarzgar) to lead this project.@rezaBarzgar Please dispatch the tasks and monitor the progress, merging codes, etc ... thanks.
Ps. Thanks to Alice (@impedaka), who created a nice web demo at here. Also, she has done experiments using linear models. Also, thanks to @EhsanSl, who is working on the pipeline and bringing new insights to it.
Here are the main to-do tasks. Please feel free to comment or revise.
Hi
I created this diagram to show the general flow classification from raw data to the output label in this project. It for sure needs a lot of adjustments, so please let me know what you think about it. NOTE: It is not the class diagram but they have some overlapping.
Hi @hamedwaezi01, this is awesome. Thank you.
Everything seems clear to me. Just the following notes:
text data -> raw data?
not sure if the csv format is necessary when we can read from xml?
dataset -> input
why the non-recurrent models cannot have embeddings in the input?
activation -> output
Also, would be great if you add hint to the code file paths so the blocks can be found easily in the codeline too.
Btw, we need an experiment on early detection, meaning that how much of a conversation is needed to detection predatory one. remind me to discuss it more if not clear.
Hi @hosseinfani. Thanks. Sorry again for the late reply.
you're right. Raw data is more accurate.
About using the XML file
Since we use pandas DataFrame in preprocessing steps, it is better to mention that we are gonna convert the XML to a DataFrame without loss of data and save it as CSV.
Also in our MVP baseline, we converted the XML to CSV too.
dataset -> input
what about "Input Features"
why the non-recurrent models cannot have embeddings in the input?
Actually I have to add it too. I think missed it. Additionally, There should be a separate box for fine-tuned BERT models and the respective datasets.
activation -> output
Good idea. previously I had doubts since "output" might be confused with number of outputs or its configurations.
early detection
Yes, there were a couple of papers about it. We need to list a couple of metrics that measure it and then proceed.
@rezaBarzgar
@hamedwaezi01
@impedaka
@EhsanSl
I created this issue labeled as epic to have general planning for this project. All other tasks, subtasks, etc will be linked to this as separate issue pages. Let's ask Reza (@rezaBarzgar) to lead this project. @rezaBarzgar Please dispatch the tasks and monitor the progress, merging codes, etc ... thanks.
Ps. Thanks to Alice (@impedaka), who created a nice web demo at here. Also, she has done experiments using linear models. Also, thanks to @EhsanSl, who is working on the pipeline and bringing new insights to it.
Here are the main to-do tasks. Please feel free to comment or revise.
(1) Problem Definition
(2) Proposed Method
(3) Experimentation
Dataset
Metrics
Baselines + Literature Review
(4) Paper Write Up
(5) Demo website (#16 ) ==> By @impedaka
The text was updated successfully, but these errors were encountered: