Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correctly filter system messages #147

Open
lucasrodes opened this issue Mar 18, 2024 · 2 comments
Open

fix: correctly filter system messages #147

lucasrodes opened this issue Mar 18, 2024 · 2 comments
Assignees
Labels
enhancement system-mesages Chats have system messages, we deal with them

Comments

@lucasrodes
Copy link
Owner

We are currently working on parsing system messages. However, there seems to be some particularities depending on the OS (Android, iOS), the device (mobile, desktop), etc.

Multiple environments

iOS

System messages are shown the same as the user names, but using the chat name as the user name

Android

System messages are sent without using the field allocated for "username".

See the example from #139:

04/03/2024, 22:29 - Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more.
12/12/2023, 10:46 - ~ +1-xx3-yyy4 created group "Group1"
04/03/2024, 22:29 - +1-xx4-yyy3 added you to a group in the community: Community 1
07/03/2024, 12:12 - +1-xx9-yyy5 joined from the community
07/03/2024, 12:12 - +1-xx8-yyy3 joined from the community
07/03/2024, 12:12 - +1-xx4-yyy2 joined from the community
07/03/2024, 12:12 - +1-xx1-yyy3 joined from the community
09/03/2024, 17:32 - +1-xx7-yyy3 joined from the community
11/03/2024, 20:54 - +1-xx6-yyy5 joined using this group's invite link
11/03/2024, 20:54 - +1-xx8-yyy4 joined from the community
11/03/2024, 20:54 - +1-xx1-yyy8 joined using this group's invite link
16/03/2024, 09:10 - +1-xxx-yyy3: Hey everyone! We're looking for volunteers to act as facilitators for a "Event 1"  First Aid Workshop organized by <Message redacted>.

Mobile vs desktop

Also, there are some differences between mobile vs. desktop, where the desktop seems to filter some system messages.

Communities

How the format in community chat history differs from groups' is unclear. We should explore this further (and potentially create a separate tracking issue for that).

Solutions

iOS/Android

We are almost finished with a solution supporting system messages for iOS devices:

We should investigate a similar solution for chats exported from Android devices.

mobile/desktop

My intuition here is to go with full mobile support and skip desktop support for now. We should signal this somewhere in the docs.

Communities

Same as in desktop, do not support this for now.

@JoshEe00
Copy link

JoshEe00 commented Mar 21, 2024

Hi @lucasrodes.

I managed to overcome the issues for Desktop by reformatting the headers
"[04/03/2024, 10:29 pm] +1-374-8523:"
converted into
"04/03/2024, 10:29 pm - +1-374-8523:"

And I've also overcome the Community issue by simply finding the substring "added you to a group in the community" and removing that line completely:

def _remove_community_message(self, chat_data, file_location) -> bool:
        # To remove "Community" message
        community_text_found = False

        text_to_remove = "added you to a group in the community"
        if text_to_remove in chat_data:
            print("This chat is from a community chat. Removing community message...")
            # Define the regular expression to match the whole line
            pattern = r"(.*)added you to a group in the community: .*"

            # Use re.sub to search and replace the entire line
            new_text = re.sub(pattern, "", chat_data, flags=re.MULTILINE)

            # Write to exisitng file
            # Change file location for new file
            new_file = open(file_location, "w", encoding="utf-8")
            new_file.write(new_text)
            print("Community message removed. Existing chat overwritten.")
            
            community_text_found = True

        return community_text_found

But my latest issue is when the text is from a community & desktop. I will remove the following line:
"[04/03/2024, 10:29:00 pm] +1-374-8523 added you to a group in the community: Community 1"

But the following function returns NoneType:
extract_header = extract_header_from_text(chat_data)

If the text originates from mobile, and I remove the "added you to a group in the community" line, the extract_header_from_text() method still works.

Similarly if the text is converted from desktop to mobile, I am able to get it to work. But when it is a combination of both, it will fail.

Anyway I am able to overcome it?

@JoshEe00
Copy link

I've found a temporary workaround:

For desktop exports, there exists an additional username/phone number before the first message.

extract_header_from_text() will work for when it is a username, but breaks when it is a phone number.

Works:
" [04/03/2024, 10:29 pm] Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more. "
" [04/03/2024, 10:29 pm] testingUser: Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more. "

Doesn't work:
" [04/03/2024, 10:29 pm] +1-374-8523: Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more. "

Temporary approach is to check if its a phone number, if so remove it.

Very adhoc but this will temporarily solve my problem, but it may help you when fixing for community and desktop support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement system-mesages Chats have system messages, we deal with them
Projects
None yet
Development

No branches or pull requests

2 participants