Skip to content

satilmiskabasakal0/YapayGazeteci-Teknofest2024-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

News Generation

Introduction

This project is a part of the Teknofest 2024 Türkçe Doğal Dil İşleme competition. The aim of the project is to generate news title and content from a given image.

Dataset

The dataset is collected from the Sabah news website. The dataset consist of news titles, news content and images. The dataset is in Turkish Language.

Data-Preprocessing

  • Sample Data: image


      title = "Balıkesir’de tarihi bina yangında küle döndü"
      word_index = {'Balıkesir’de': 9, 'tarihi': 5, 'bina': 3, 'yangında': 7, 'küle': 5, 'döndü': 6 }
      tokens: [start_token, 9, 5, 3, 7, 5, 6, end_token]
    Input Output
    Image + start_token 9
    Image + start_token + 9 5
    Image + start_token + 9 + 5 3
    Image + start_token + 9 + 5 + 3 7
    Image + start_token + 9 + 5 + 3 + 7 5
    Image + start_token + 9 + 5 + 3 + 7 + 5 6
    Image + start_token + 9 + 5 + 3 + 7 + 5 + 6 end_token

Model

The model is a combination of CNN and LSTM, where the image is fed to the Encoder(CNN) and the output of the CNN is fed to the Decoder(LSTM) along with the input text.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published