Fix missing update of start_token while backfill prev messages #284

mattthias · 2018-10-05T12:03:44Z

Hello,

while writing a script that should parse the whole channel history i wondered why i always got the same messages when i called backfill_previous_messages(). Please see the example code which i run with a room that just contains 30 messages (message 1 ... message 30).

>>> from matrix_client.client import MatrixClient
>>> client = MatrixClient("https://synapse-server")
>>> token = client.login_with_password(username="mattthias", password="guesswhat")
>>> room = client.join_room("#matrixtest:synapse-server")
>>> print(room.events)
[]
>>> room.backfill_previous_messages()
>>> for event in room.events:
...   print(event['content'])
... 
{'body': 'Message 21', 'msgtype': 'm.text'}
{'body': 'Message 22', 'msgtype': 'm.text'}
{'body': 'Message 25', 'msgtype': 'm.text'}
{'body': 'Message 26', 'msgtype': 'm.text'}
{'body': 'Message 23', 'msgtype': 'm.text'}
{'body': 'Message 27', 'msgtype': 'm.text'}
{'body': 'Message 24', 'msgtype': 'm.text'}
{'body': 'Message 28', 'msgtype': 'm.text'}
{'body': 'Message 29', 'msgtype': 'm.text'}
{'body': 'Message 30', 'msgtype': 'm.text'}
>>> room.backfill_previous_messages()
>>> len(room.events)
20
>>> for event in room.events:
...     print(event['content'])
... 
{'body': 'Message 21', 'msgtype': 'm.text'}
{'body': 'Message 22', 'msgtype': 'm.text'}
{'body': 'Message 25', 'msgtype': 'm.text'}
{'body': 'Message 26', 'msgtype': 'm.text'}
{'body': 'Message 23', 'msgtype': 'm.text'}
{'body': 'Message 27', 'msgtype': 'm.text'}
{'body': 'Message 24', 'msgtype': 'm.text'}
{'body': 'Message 28', 'msgtype': 'm.text'}
{'body': 'Message 29', 'msgtype': 'm.text'}
{'body': 'Message 30', 'msgtype': 'm.text'}
{'body': 'Message 21', 'msgtype': 'm.text'}
{'body': 'Message 22', 'msgtype': 'm.text'}
{'body': 'Message 25', 'msgtype': 'm.text'}
{'body': 'Message 26', 'msgtype': 'm.text'}
{'body': 'Message 23', 'msgtype': 'm.text'}
{'body': 'Message 27', 'msgtype': 'm.text'}
{'body': 'Message 24', 'msgtype': 'm.text'}
{'body': 'Message 28', 'msgtype': 'm.text'}
{'body': 'Message 29', 'msgtype': 'm.text'}
{'body': 'Message 30', 'msgtype': 'm.text'}
>>>

The function backfill_previous_messages() updates the room's event list
(room.events) using the api function "client.api.get_room_messages". The
latter function takes a start token as parameter to know from where in
the history of events to start.

backfill_previous_messages() handes over self.prev_batch as start token
but never updates it. So repeated calls to backfill_previous_messages()
always return the same chunk of events from the room's event history.

This commit fixes this wrong behavior by updating self.prev_batch on
every function call.

Signed-off-by: Matthias Schmitz [email protected]

The function backfill_previous_messages() updates the room's event list (room.events) using the api function "client.api.get_room_messages". The latter function takes a start token as parameter to know from where in the history of events to start. backfill_previous_messages() handes over self.prev_batch as start token but never updates it. So repeated calls to backfill_previous_messages() always return the same chunk of events from the room's event history. This commit fixes this wrong behavior by updating self.prev_batch on every function call. Signed-off-by: Matthias Schmitz <[email protected]>

Zil0 · 2018-10-08T23:52:40Z

Thanks for looking at this! It already came up that this behavior needs fixing.

This is trickier than it seems though, since to me it looks like the update of prev_batch that is done here is not consistent with the one done in client.py. Before, prev_batch was the point before the last messages we received in the last sync. With this, prev_batch can be either that, or any previous token depending on how many times backfilling was done.

There is a very basic case were an issue arises in an obvious way:

The client is listening for events in a thread T2 (eg start_listener_thread was called).
We operate in main thread T1, and we backfill. prev_batch gets updated.
A new event comes up, prev_batch gets updated.
We backfill again. Backfilling still exhibits the broken behavior you underlined.

I'm not 100% sure what is the proper fix. My intuition is that the prev_batch returned when backfilling shouldn't end up in the room state, but handed back to the user, and that a way to provide it to backfill_previous_messages should exist in order to retrieve even earlier messages. Possibly via an optional argument, which would default to the token obtained via sync if not present.

Feel free to try different approaches, I'm really interested in seeing this part of the SDK improved :)

remram44 · 2018-12-22T18:56:06Z

I agree that backfilling should be completely independent. It shouldn't affect the sync state, but it probably shouldn't be calling the listeners either, since otherwise the listeners get events out-of-order. Worse, there is no way to distinguish a new message from the ones you get from the backfill request.

Currently, I avoid using backfill_previous_messages entirely, and call get_room_messages manually so I don't mess anything up.

When loading new events, Client resets Room.prev_batch matrix-org#284 (comment)

Zil0 added Awaiting Changes Client layer labels Oct 18, 2018

mirukana added a commit to mirukana/matrix-python-sdk that referenced this pull request Jan 9, 2019

Use separate history token Room attribute

f151d29

When loading new events, Client resets Room.prev_batch matrix-org#284 (comment)

mirukana added a commit to mirukana/matrix-python-sdk that referenced this pull request Jan 9, 2019

Use separate history token Room attribute

3271f49

When loading new events, Client resets Room.prev_batch matrix-org#284 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing update of start_token while backfill prev messages #284

Fix missing update of start_token while backfill prev messages #284

mattthias commented Oct 5, 2018

Zil0 commented Oct 8, 2018

remram44 commented Dec 22, 2018

Fix missing update of start_token while backfill prev messages #284

Are you sure you want to change the base?

Fix missing update of start_token while backfill prev messages #284

Conversation

mattthias commented Oct 5, 2018

Zil0 commented Oct 8, 2018

remram44 commented Dec 22, 2018