Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing update of start_token while backfill prev messages #284

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mattthias
Copy link

Hello,

while writing a script that should parse the whole channel history i wondered why i always got the same messages when i called backfill_previous_messages(). Please see the example code which i run with a room that just contains 30 messages (message 1 ... message 30).

>>> from matrix_client.client import MatrixClient
>>> client = MatrixClient("https://synapse-server")
>>> token = client.login_with_password(username="mattthias", password="guesswhat")
>>> room = client.join_room("#matrixtest:synapse-server")
>>> print(room.events)
[]
>>> room.backfill_previous_messages()
>>> for event in room.events:
...   print(event['content'])
... 
{'body': 'Message 21', 'msgtype': 'm.text'}
{'body': 'Message 22', 'msgtype': 'm.text'}
{'body': 'Message 25', 'msgtype': 'm.text'}
{'body': 'Message 26', 'msgtype': 'm.text'}
{'body': 'Message 23', 'msgtype': 'm.text'}
{'body': 'Message 27', 'msgtype': 'm.text'}
{'body': 'Message 24', 'msgtype': 'm.text'}
{'body': 'Message 28', 'msgtype': 'm.text'}
{'body': 'Message 29', 'msgtype': 'm.text'}
{'body': 'Message 30', 'msgtype': 'm.text'}
>>> room.backfill_previous_messages()
>>> len(room.events)
20
>>> for event in room.events:
...     print(event['content'])
... 
{'body': 'Message 21', 'msgtype': 'm.text'}
{'body': 'Message 22', 'msgtype': 'm.text'}
{'body': 'Message 25', 'msgtype': 'm.text'}
{'body': 'Message 26', 'msgtype': 'm.text'}
{'body': 'Message 23', 'msgtype': 'm.text'}
{'body': 'Message 27', 'msgtype': 'm.text'}
{'body': 'Message 24', 'msgtype': 'm.text'}
{'body': 'Message 28', 'msgtype': 'm.text'}
{'body': 'Message 29', 'msgtype': 'm.text'}
{'body': 'Message 30', 'msgtype': 'm.text'}
{'body': 'Message 21', 'msgtype': 'm.text'}
{'body': 'Message 22', 'msgtype': 'm.text'}
{'body': 'Message 25', 'msgtype': 'm.text'}
{'body': 'Message 26', 'msgtype': 'm.text'}
{'body': 'Message 23', 'msgtype': 'm.text'}
{'body': 'Message 27', 'msgtype': 'm.text'}
{'body': 'Message 24', 'msgtype': 'm.text'}
{'body': 'Message 28', 'msgtype': 'm.text'}
{'body': 'Message 29', 'msgtype': 'm.text'}
{'body': 'Message 30', 'msgtype': 'm.text'}
>>> 

The function backfill_previous_messages() updates the room's event list
(room.events) using the api function "client.api.get_room_messages". The
latter function takes a start token as parameter to know from where in
the history of events to start.

backfill_previous_messages() handes over self.prev_batch as start token
but never updates it. So repeated calls to backfill_previous_messages()
always return the same chunk of events from the room's event history.

This commit fixes this wrong behavior by updating self.prev_batch on
every function call.

Signed-off-by: Matthias Schmitz [email protected]

The function backfill_previous_messages() updates the room's event list
(room.events) using the api function "client.api.get_room_messages". The
latter function takes a start token as parameter to know from where in
the history of events to start.

backfill_previous_messages() handes over self.prev_batch as start token
but never updates it. So repeated calls to backfill_previous_messages()
always return the same chunk of events from the room's event history.

This commit fixes this wrong behavior by updating self.prev_batch on
every function call.

Signed-off-by: Matthias Schmitz <[email protected]>
@Zil0
Copy link
Contributor

Zil0 commented Oct 8, 2018

Thanks for looking at this! It already came up that this behavior needs fixing.

This is trickier than it seems though, since to me it looks like the update of prev_batch that is done here is not consistent with the one done in client.py. Before, prev_batch was the point before the last messages we received in the last sync. With this, prev_batch can be either that, or any previous token depending on how many times backfilling was done.

There is a very basic case were an issue arises in an obvious way:

  • The client is listening for events in a thread T2 (eg start_listener_thread was called).
  • We operate in main thread T1, and we backfill. prev_batch gets updated.
  • A new event comes up, prev_batch gets updated.
  • We backfill again. Backfilling still exhibits the broken behavior you underlined.

I'm not 100% sure what is the proper fix. My intuition is that the prev_batch returned when backfilling shouldn't end up in the room state, but handed back to the user, and that a way to provide it to backfill_previous_messages should exist in order to retrieve even earlier messages. Possibly via an optional argument, which would default to the token obtained via sync if not present.

Feel free to try different approaches, I'm really interested in seeing this part of the SDK improved :)

@remram44
Copy link

I agree that backfilling should be completely independent. It shouldn't affect the sync state, but it probably shouldn't be calling the listeners either, since otherwise the listeners get events out-of-order. Worse, there is no way to distinguish a new message from the ones you get from the backfill request.

Currently, I avoid using backfill_previous_messages entirely, and call get_room_messages manually so I don't mess anything up.

mirukana added a commit to mirukana/matrix-python-sdk that referenced this pull request Jan 9, 2019
When loading new events, Client resets Room.prev_batch
matrix-org#284 (comment)
mirukana added a commit to mirukana/matrix-python-sdk that referenced this pull request Jan 9, 2019
When loading new events, Client resets Room.prev_batch
matrix-org#284 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants