From fd976b8232331217c1a61ddca1da4a91680ebc39 Mon Sep 17 00:00:00 2001 From: Philipp Kessling Date: Mon, 22 Jan 2024 17:41:25 +0100 Subject: [PATCH] add data structure docs --- notebooks/01-introduction.ipynb | 124 +++++++++++++++++++++++++++++--- 1 file changed, 116 insertions(+), 8 deletions(-) diff --git a/notebooks/01-introduction.ipynb b/notebooks/01-introduction.ipynb index 67cefe9..be64f1b 100644 --- a/notebooks/01-introduction.ipynb +++ b/notebooks/01-introduction.ipynb @@ -4,13 +4,66 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Wie man Daten aus Telegram bekommt.\n", + "# How to get data from Telegram.\n", + "\n", + "## Data Structure\n", + "\n", + "Telegram is a messenger app that not only offers one-to-one chats but also group chats, channels, bots and more. The data structure is quite complex as it uses a few base objects for all of the different chat and group types.\n", + "\n", + "### Base Objects\n", + "\n", + "The base objects are the following:\n", + "\n", + "**Chats**: A chat is a conversation between one or more users. A chat can be a private chat, a group or a channel.\n", + "\n", + "- **User**: A user is a person that uses Telegram. A user can be part of a group or channel.\n", + "- **Group**: A group is a chat with multiple users. A group can be public or private.\n", + "- **Channel**: A channel is a chat with multiple users. A channel can be public or private.\n", + "\n", + "However, there are a few complexities that need to be considered[^1]. A channel is per default a one-to-many communication [channel](https://telegram.org/tour/channels) with an unlimited number of subscribers. A channel can be converted into a [supergroup](https://telegram.org/tour/groups) which is a group with up to 200,000 members and technically also a channels. A \n", + "\n", + "[^1]: [Telegram Documentation](https://core.telegram.org/api/channel#channels)\n", + "\n", + "**Messages**: A message is a text message that is sent in a chat. A message can also contain media like images, videos, documents, etc. It may contain:\n", + "\n", + "- **Media**: A media is a file that is sent in a message. A media can be an image, video, document, etc.\n", + "- **Sticker**: A sticker is a special type of media that is sent in a message. A sticker is an image that is sent in a special format.\n", + "- **Location**: A location is a special type of media that is sent in a message. A location is a latitude and longitude value.\n", + "- **Contact**: A contact is a special type of media that is sent in a message. A contact is a person that is saved in the contacts of the user.\n", + "- **Poll**: A poll is a special type of media that is sent in a message. A poll is a question with multiple answers.\n", + "- **Action**: An action is a special type of message that is sent in a chat. An action is a message that is sent when a user joins or leaves a group or channel.\n", + "- **Reply**: A reply is a special type of message that is sent in a chat. A reply is a message that is sent as a reply to another message.\n", + "- **Forward**: A forward is a special type of message that is sent in a chat. A forward is a message that is sent as a forward of another message.\n", + "- **Edit**: An edit is a special type of message that is sent in a chat. An edit is a message that is sent when a message is edited.\n", + "\n", + "**Bots**: A bot is a special type of user that is used to automate tasks. A bot can be part of a group or channel.\n", + "\n", + "\n", + "```mermaid\n", + "graph TD\n", + "A[Chat] --> B[User]\n", + "A --> C[Group]\n", + "A --> D[Channel]\n", + "A --> E[Bot]\n", + "```\n", + "\n", + "\n", + "## API-Access\n", + "\n", + "Telegram offers a [Telegram API](https://core.telegram.org/api) to access the data of your account. The API is not public and you need to create a developer app to get access to the API. The API is not very well documented and you need to figure out a lot of things by yourself.\n", "\n", - "![Telegram Logo](https://telegram.org/img/t_logo.png)\n", "\n", - "---\n", "\n", - "## Voraussetzungen\n", + "### Requirements\n", + "\n", + "- **Telegram account**: Have Telegram installed on your phone and create an account.\n", + "- **Developer app**: Create a developer app on [Telegram](https://my.telegram.org/apps) and retrieve the following information:\n", + " - Telegram API key\n", + " - Telegram API hash\n", + "- ...\n", + "- Profit!\n", + "\n", + "---\n", "\n", "![Telegram device overview](../images/telegram-device-overview-small.PNG)\n", "\n", @@ -56,7 +109,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Um `tegracli` zu konfigurieren, führe den folgenden Befehl in einem Terminal[^1] aus:\n", + "To configure `tegracli`, run the following command in a terminal[^1]:\n", "\n", "```bash\n", "tegracli configure\n", @@ -71,7 +124,7 @@ "- Deine Telefonnummer,\n", "- den Code, den Du per Telegram-Nachricht erhältst.\n", "\n", - "Beispielsweise könnte die Konfiguration so aussehen:\n", + "Take the following as an example for the configuration process:\n", "\n", "```bash\n", "\n", @@ -83,12 +136,21 @@ "Enter 2FA code: 12345\n", "```\n", "\n", - "[^1]: Eine Eingabe über JupyterLab ist nicht möglich, da `tegracli` an dieser Stelle interaktiv arbeitet." + "[^1]: Running this in JupyterLab or a Jupyter notebook is not possible, since they do not allow interactive prompts." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Web Interface\n", + "\n", + "https://t.me/s/reitschusterde\n" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -107,6 +169,52 @@ "source": [ "! tegracli configure --help" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```json\n", + "{\n", + " \"post_id\": \"reitschusterde/8920\",\n", + " \"views\": 14400,\n", + " \"datetime\": 1705571988000,\n", + " \"user\": \"reitschuster.de\",\n", + " \"from_author\": null,\n", + " \"text\": \"Ampel will „Regenbogen-Familien“ stärken – auf Kosten der Kinder?Anpassung „an soziale Wirklichkeit“.Die Bundesregierung plant weitreichende Reformen bei Adoption und Sorgerecht. Dazu sollen die Mindeststrafen bei Kinderpornografie wieder gesenkt werden. Einige der dabei verwendeten Wörter und Formulierungen müssen aufhorchen lassen. Von Kai Rebmann. https://reitschuster.de/post/ampel-will-regenbogen-familien-staerken-auf-kosten-der-kinder/\",\n", + " \"link\": [\n", + " \"https://reitschuster.de/post/ampel-will-regenbogen-familien-staerken-auf-kosten-der-kinder/\",\n", + " \"https://reitschuster.de/post/ampel-will-regenbogen-familien-staerken-auf-kosten-der-kinder/\",\n", + " \"https://reitschuster.de/post/ampel-will-regenbogen-familien-staerken-auf-kosten-der-kinder/\",\n", + " \"https://reitschuster.de/post/ampel-will-regenbogen-familien-staerken-auf-kosten-der-kinder/\"\n", + " ],\n", + " \"reply_to_user\": null,\n", + " \"reply_to_text\": null,\n", + " \"reply_to_link\": null,\n", + " \"image_url\": [],\n", + " \"forwarded_message_url\": null,\n", + " \"forwarded_message_user\": null,\n", + " \"video_url\": [],\n", + " \"video_duration\": null,\n", + " \"handle\": \"reitschusterde\",\n", + " \"post_number\": \"8920\"\n", + "}\n", + "```\n", + "\n", + "```json\n", + "{\n", + " \"name\": \"reitschusterde\",\n", + " \"fullname\": \"reitschuster.de\",\n", + " \"url\": \"https://t.me/reitschusterde\",\n", + " \"description\": \"Offizieller Kanal von Boris Reitschuster\",\n", + " \"subscriber_count\": 235000,\n", + " \"photos_count\": 754,\n", + " \"videos_count\": 86,\n", + " \"files_count\": 9,\n", + " \"links_count\": 7440\n", + "}\n", + "```" + ] } ], "metadata": {