-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made TorrentDef.load async and threaded #7666
Conversation
fe50618
to
e41f3b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, some bugs need to be addressed
@@ -293,7 +294,7 @@ def updated_my_channel(self, tdef): | |||
dcfg = DownloadConfig(state_dir=self.state_dir) | |||
dcfg.set_dest_dir(self.mds.channels_dir) | |||
dcfg.set_channel_download(True) | |||
return self.download_manager.start_download(tdef=tdef, config=dcfg) | |||
return await self.download_manager.start_download(tdef=tdef, config=dcfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an indentation bug. When the if condition is not satisfied, the dcfg
local variable is not defined, and we can't refer to it in the function call.
return await self.download_manager.start_download(tdef=tdef, config=dcfg) | |
return await self.download_manager.start_download(tdef=tdef, config=dcfg) |
@@ -1,22 +1,10 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like other functions from this file are also not used, and the entire file can be deleted. (not just the gen_sample_channel
function)
@@ -617,10 +617,9 @@ def start_download(self, torrent_file=None, tdef=None, config: DownloadConfig = | |||
if infohash not in self.metainfo_requests or self.metainfo_requests[infohash][0] == download: | |||
self.downloads[infohash] = download | |||
if not self.dummy_mode: | |||
self.start_handle(download, atp) | |||
await self.start_handle(download, atp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the start_handle
method is an async method that returns a Future
object as a result. Here, we wait only for the start_handle
method itself and not for the future it returns. If we want to wait for the future returned from the method, we should write await (await self.start_handle(download, atp))
.
As I understand it, it is not necessary to wait for the future result here. But then, the question is, should we return the future from the start_handle
method at all? Currently, we never await for the future result returned from the start_handle
method, so it may be cleaner to just return None
instead of the future.
@@ -662,7 +661,7 @@ async def start_handle(self, download, atp): | |||
except asyncio.TimeoutError: | |||
self._logger.warning("Timeout waiting for libtorrent DHT getting enough peers") | |||
ltsession.async_add_torrent(encode_atp(atp)) | |||
return await download.future_added | |||
return download.future_added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the previous comment, we never await for the future returned from the start_handle
method, not even in tests. It may be unclear for the reader of the code what the await self.start_handle(...)
call actually waits for. I suggest that we remove this return
statement completely to make the code a bit easier to understand.
@@ -212,7 +211,7 @@ def rec_gen(dir_): | |||
for chunk in chunks(torrents_list, 100): | |||
for f in chunk: | |||
try: | |||
self.add_torrent_to_channel(TorrentDef.load(f)) | |||
self.add_torrent_to_channel(await TorrentDef.load(f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this code was inside db_session
. PonyORM ignores nested db session in the add_torrent_to_channel
method, and, as a result, each batch of 100 torrents was processed as a single transaction.
Now, the top-level db_session
is removed from the current method, and each individual add_torrent_to_channel
call is processed as a separate transaction. It can slow down adding a large number of torrents. Basically, torrents are not grouped into batches anymore, and the orm.commit()
call does nothing, as there is no active transaction at this moment.
I suggest changing the code a bit: we can load all TorrentDef instances first and then store them in the database in a single db_session with batches:
torrent_defs = []
for filename in torrents_list:
try:
torrent_defs.append(await TorrentDef.load(filename))
except Exception: # pylint: disable=W0703
# Have to use the broad exception clause because Py3 versions of libtorrent
# generate generic Exceptions
errors_list.append(filename)
with db_session:
for chunk in chunks(torrent_defs, 100):
for tdef in chunk:
self.add_torrent_to_channel(tdef)
orm.commit()
e41f3b6
to
82c708f
Compare
@kozlovsky Thanks for your review. I addressed your points pretty much one-to-one according to your suggestions. Regarding I requested your re-review: please ensure that I indeed adopted your suggestions as you intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good now!
Related to #5399 (PR 2/4)
This PR moves
TorrentDef.load
to a thread. The fallout from this - essentially 12-line - change consists of 245 additions and 309 deletions across 21 files.Our codecov check uncovered some dead code, which I removed (instead of writing tests for it):
tribler/core/components/metadata_store/tests/gen_test_data.py
tribler/core/components/metadata_store/restapi/tests/test_channels_endpoint.py (test_add_torrent_from_magnet)