Online Predatory Conversation Detection (Epic) #19

hosseinfani · 2023-02-04T03:01:02Z

@rezaBarzgar
@hamedwaezi01
@impedaka
@EhsanSl

I created this issue labeled as epic to have general planning for this project. All other tasks, subtasks, etc will be linked to this as separate issue pages. Let's ask Reza (@rezaBarzgar) to lead this project. @rezaBarzgar Please dispatch the tasks and monitor the progress, merging codes, etc ... thanks.

Ps. Thanks to Alice (@impedaka), who created a nice web demo at here. Also, she has done experiments using linear models. Also, thanks to @EhsanSl, who is working on the pipeline and bringing new insights to it.

Here are the main to-do tasks. Please feel free to comment or revise.

(1) Problem Definition

Classification:
- Text Classification
- Conversation Classification
- Author Classification
Outlier Detection

(2) Proposed Method

Model Architecture:
- Non-neural Models ==> Refactor the current codebase and make it ready to add new models!
- Neural Models: (these models can be configured by changing the settings file)
  - Feedforward Network
  - A naïve CNN
  - RNN
  - LSTM
  - GRU
Training Strategy
- Text preprocessing (informal to formal)
- Sampling to handle the imbalance
- Curriculum Learning

(3) Experimentation

Dataset
- PAN12: Stats (PAN Dataset Files Structure #3 )
- PANC: Still waiting to get access to the required datasets
Metrics
- Area Under ROC
- Area Under Precision-Recall Curve
- F2 score
- Recall, Precision, Accuracy
Baselines + Literature Review

(4) Paper Write Up

Target Conference/Journal: ECIR 2024
- Full paper abstract submission: September 20, 2023, 11:59 pm (AoE)
- Full paper submission: September 27, 2023, 11:59 pm (AoE)
- Full paper notification: December 14, 2023
- Main conference: March 25-27, 2024
The Google Docs draft link

(5) Demo website (#16 ) ==> By @impedaka

hamedwaezi01 · 2023-06-29T17:36:32Z

Hi
I created this diagram to show the general flow classification from raw data to the output label in this project. It for sure needs a lot of adjustments, so please let me know what you think about it.
NOTE: It is not the class diagram but they have some overlapping.

hosseinfani · 2023-06-30T07:40:24Z

Hi @hamedwaezi01, this is awesome. Thank you.
Everything seems clear to me. Just the following notes:

text data -> raw data?
not sure if the csv format is necessary when we can read from xml?
dataset -> input
why the non-recurrent models cannot have embeddings in the input?
activation -> output

Also, would be great if you add hint to the code file paths so the blocks can be found easily in the codeline too.

Btw, we need an experiment on early detection, meaning that how much of a conversation is needed to detection predatory one. remind me to discuss it more if not clear.

hamedwaezi01 · 2023-07-03T15:54:28Z

Hi @hosseinfani. Thanks. Sorry again for the late reply.
you're right. Raw data is more accurate.

About using the XML file
Since we use pandas DataFrame in preprocessing steps, it is better to mention that we are gonna convert the XML to a DataFrame without loss of data and save it as CSV.
Also in our MVP baseline, we converted the XML to CSV too.
dataset -> input
what about "Input Features"
why the non-recurrent models cannot have embeddings in the input?
Actually I have to add it too. I think missed it. Additionally, There should be a separate box for fine-tuned BERT models and the respective datasets.
activation -> output
Good idea. previously I had doubts since "output" might be confused with number of outputs or its configurations.
early detection
Yes, there were a couple of papers about it. We need to list a couple of metrics that measure it and then proceed.

hosseinfani added the Epic label Feb 4, 2023

hosseinfani assigned rezaBarzgar Feb 4, 2023

hamedwaezi01 assigned hamedwaezi01 and unassigned rezaBarzgar Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Online Predatory Conversation Detection (Epic) #19

Online Predatory Conversation Detection (Epic) #19

hosseinfani commented Feb 4, 2023 •

edited by hamedwaezi01

Loading

hamedwaezi01 commented Jun 29, 2023

hosseinfani commented Jun 30, 2023

hamedwaezi01 commented Jul 3, 2023

Online Predatory Conversation Detection (Epic) #19

Online Predatory Conversation Detection (Epic) #19

Comments

hosseinfani commented Feb 4, 2023 • edited by hamedwaezi01 Loading

hamedwaezi01 commented Jun 29, 2023

hosseinfani commented Jun 30, 2023

hamedwaezi01 commented Jul 3, 2023

hosseinfani commented Feb 4, 2023 •

edited by hamedwaezi01

Loading