Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language/locale differences from en-US will raise an exception at various points #2

Open
eddyharrington opened this issue Feb 28, 2021 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@eddyharrington
Copy link
Owner

eddyharrington commented Feb 28, 2021

Issue

Various exceptions are raised when WhatsApp settings are set to anything other than English because there are a few areas in WhatSoup that depend on English characters/words. The date/time formats for non-English settings are likely different as well and also need to be revised with a more flexible solution such as dateutil.

Temporary workaround

Set WhatsApp settings on the phone to use English as the language before running the script. It can be changed back after scraping/exporting a chat.

Issue details

WhatSoup areas that depend on English language/locale:

  1. Identifying 'Search results' element after searching for a specific chat
  2. Loading all messages in a selected chat, has an xpath containing 'Message list'
  3. Finding sender when a message does not contain text, has a condition for 'Voice message'
  4. Determining if vCard/VCF media is in a message, has conditions for 'Message' and 'Add to a group'
  5. Date/time string formatting all expects in the format of MM/DD/YYYY HH:MM AM/PM but there are variations such as YYYY-MM-DD, A.M. / P.M., etc.

Identifying search results

# Look for the unique class that holds 'Search results.'
WebDriverWait(driver, 5).until(expected_conditions.presence_of_element_located(
       (By.XPATH, "//*[@id='pane-side']/div[1]/div/div[contains(@aria-label,'Search results.')]")))

Loading all messages

# Set focus to chat window (xpath == div element w/ aria-label set to 'Message list. Press right arrow key...')
message_list_element = driver.find_element_by_xpath(
  "//*[@id='main']/div[3]/div/div/div[contains(@aria-label,'Message list')]")

Finding sender when a message does not contain text

# Last char in aria-label is always colon after the senders name
if span.get('aria-label') != 'Voice message':
  return span.get('aria-label')[:-1]

Determining if vCard/VCF media is in a message

# Check if 'Message' is in the title (full title would be for example 'Message Bob Ross')
if 'Message' in button.get('title'):
  # Next sibling should always be the 'Add to a group' button
  if button.nextSibling:
    if button.nextSibling.get('title') == 'Add to a group':
      return True
@eddyharrington eddyharrington added the bug Something isn't working label Feb 28, 2021
@eddyharrington eddyharrington self-assigned this Feb 28, 2021
@amitvyas17
Copy link

This project not pasting the full chat in any format please help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants