Skip to content

Latest commit

 

History

History
45 lines (28 loc) · 3.22 KB

README.md

File metadata and controls

45 lines (28 loc) · 3.22 KB

Data_Hackathon_2023

Compete in this hackathon to win, practice, learn, and build your Data Science portfolio. These hackathons enable you to compete with leading data scientists and machine learning experts in the world.

Task 🎯

Sentiment analysis remains one of the key problems that has seen extensive application of natural language processing. This time around, given the tweets from customers about various tech firms who manufacture and sell mobiles, computers, laptops, etc, the task is to identify if the tweets have a negative sentiment towards such companies or products.

Duration ⏰

The duration of this hackathon is 12 hours. From 1:30pm WAT 15/12/2023 - 1:30am 16/12/2023

Evaluation Metric

The metric used for evaluating the performance of the classification model would be weighted F1-Score.

Data ⛓

  • train.csv - For training the models, we provide a labeled dataset of 7920 tweets. The dataset is provided in the form of a CSV file with each line storing a tweet ID, its label, and the tweet.

  • test.csv - The test data file contains only tweet IDs and the tweet text with each tweet in a new line.

  • sample_submission.csv - The exact format for a valid submission

Most profane and vulgar terms in the tweets have been replaced with “$&@*#”. However, please note that the dataset still might contain text that may be considered profane, vulgar, or offensive.

Rules 👮🏼‍♂️👮🏽‍♀️

  • This hackathon should only be done by an individual.
  • You can use any programming language or statistical software.
  • You are free to use any tool and machine you have rightful access to.
  • You should upload your solution to GitHub, and the link should be sent to this email: [email protected]
  • Download and use only the dataset in this repository, the sample_submission.csv is the exact format for a valid submission.
  • Submission time matters, the first to submit has a higher point.

About Practice Problem: Identify the Sentiments 📑

Sentiment analysis is contextual mining of text that identifies and extracts subjective information in the source material and helps a business to understand the social sentiment of their brand, product, or service while monitoring online conversations. Brands can use this data to measure the success of their products in an objective manner. In this challenge, you are provided with tweet data to predict sentiment on electronic products of netizens.

Resources 🧰

  • Get started with NLP and text classification with our latest offering ‘Natural Language Processing (NLP) using Python’ course
  • Refer to this comprehensive guide that exhaustively covers text classification techniques using different libraries and their implementation in Python.
  • You can also refer this guide that covers multiple techniques including TF-IDF, Word2Vec etc. to tackle problems related to Sentiment Analysis.