Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made TorrentDef.load async and threaded #7666

Merged
merged 1 commit into from
Nov 14, 2023

Conversation

qstokkink
Copy link
Contributor

@qstokkink qstokkink commented Nov 3, 2023

Related to #5399 (PR 2/4)

This PR moves TorrentDef.load to a thread. The fallout from this - essentially 12-line - change consists of 245 additions and 309 deletions across 21 files.

Our codecov check uncovered some dead code, which I removed (instead of writing tests for it):

  • tribler/core/components/metadata_store/tests/gen_test_data.py
  • tribler/core/components/metadata_store/restapi/tests/test_channels_endpoint.py (test_add_torrent_from_magnet)

@qstokkink qstokkink force-pushed the upd_async_tdef_load branch from fe50618 to e41f3b6 Compare November 3, 2023 10:40
@qstokkink qstokkink marked this pull request as ready for review November 3, 2023 10:48
@qstokkink qstokkink requested review from a team and kozlovsky and removed request for a team November 3, 2023 10:48
@qstokkink qstokkink changed the title WIP: Made TorrentDef.load async and threaded READY: Made TorrentDef.load async and threaded Nov 3, 2023
Copy link
Contributor

@kozlovsky kozlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, some bugs need to be addressed

@@ -293,7 +294,7 @@ def updated_my_channel(self, tdef):
dcfg = DownloadConfig(state_dir=self.state_dir)
dcfg.set_dest_dir(self.mds.channels_dir)
dcfg.set_channel_download(True)
return self.download_manager.start_download(tdef=tdef, config=dcfg)
return await self.download_manager.start_download(tdef=tdef, config=dcfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an indentation bug. When the if condition is not satisfied, the dcfg local variable is not defined, and we can't refer to it in the function call.

Suggested change
return await self.download_manager.start_download(tdef=tdef, config=dcfg)
return await self.download_manager.start_download(tdef=tdef, config=dcfg)

@@ -1,22 +1,10 @@
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like other functions from this file are also not used, and the entire file can be deleted. (not just the gen_sample_channel function)

@@ -617,10 +617,9 @@ def start_download(self, torrent_file=None, tdef=None, config: DownloadConfig =
if infohash not in self.metainfo_requests or self.metainfo_requests[infohash][0] == download:
self.downloads[infohash] = download
if not self.dummy_mode:
self.start_handle(download, atp)
await self.start_handle(download, atp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the start_handle method is an async method that returns a Future object as a result. Here, we wait only for the start_handle method itself and not for the future it returns. If we want to wait for the future returned from the method, we should write await (await self.start_handle(download, atp)).

As I understand it, it is not necessary to wait for the future result here. But then, the question is, should we return the future from the start_handle method at all? Currently, we never await for the future result returned from the start_handle method, so it may be cleaner to just return None instead of the future.

@@ -662,7 +661,7 @@ async def start_handle(self, download, atp):
except asyncio.TimeoutError:
self._logger.warning("Timeout waiting for libtorrent DHT getting enough peers")
ltsession.async_add_torrent(encode_atp(atp))
return await download.future_added
return download.future_added
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the previous comment, we never await for the future returned from the start_handle method, not even in tests. It may be unclear for the reader of the code what the await self.start_handle(...) call actually waits for. I suggest that we remove this return statement completely to make the code a bit easier to understand.

@@ -212,7 +211,7 @@ def rec_gen(dir_):
for chunk in chunks(torrents_list, 100):
for f in chunk:
try:
self.add_torrent_to_channel(TorrentDef.load(f))
self.add_torrent_to_channel(await TorrentDef.load(f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this code was inside db_session. PonyORM ignores nested db session in the add_torrent_to_channel method, and, as a result, each batch of 100 torrents was processed as a single transaction.

Now, the top-level db_session is removed from the current method, and each individual add_torrent_to_channel call is processed as a separate transaction. It can slow down adding a large number of torrents. Basically, torrents are not grouped into batches anymore, and the orm.commit() call does nothing, as there is no active transaction at this moment.

I suggest changing the code a bit: we can load all TorrentDef instances first and then store them in the database in a single db_session with batches:

            torrent_defs = []
            for filename in torrents_list:
                try:
                    torrent_defs.append(await TorrentDef.load(filename))
                except Exception:  # pylint: disable=W0703
                    # Have to use the broad exception clause because Py3 versions of libtorrent
                    # generate generic Exceptions
                    errors_list.append(filename)

            with db_session:
                for chunk in chunks(torrent_defs, 100):
                    for tdef in chunk:
                        self.add_torrent_to_channel(tdef)
                    orm.commit()

@qstokkink qstokkink changed the title READY: Made TorrentDef.load async and threaded WIP: Made TorrentDef.load async and threaded Nov 14, 2023
@qstokkink qstokkink changed the title WIP: Made TorrentDef.load async and threaded READY: Made TorrentDef.load async and threaded Nov 14, 2023
@qstokkink qstokkink requested a review from kozlovsky November 14, 2023 10:59
@qstokkink
Copy link
Contributor Author

@kozlovsky Thanks for your review. I addressed your points pretty much one-to-one according to your suggestions. Regarding start_handle, I adopted your suggestion of returning None.

I requested your re-review: please ensure that I indeed adopted your suggestions as you intended.

Copy link
Contributor

@kozlovsky kozlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good now!

@qstokkink qstokkink changed the title READY: Made TorrentDef.load async and threaded Made TorrentDef.load async and threaded Nov 14, 2023
@qstokkink qstokkink merged commit f610c82 into Tribler:main Nov 14, 2023
16 checks passed
@qstokkink qstokkink deleted the upd_async_tdef_load branch November 14, 2023 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants