-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scanning very slow #170
Comments
There should be logs in |
Hello, 19-Jan-2022 20:29:41.728 INFO [MessageBroker-4] org.springframework.web.socket.config.WebSocketMessageBrokerStats.lambda$initLoggingTask$0 WebSocketSession[0 current WS(0)-HttpStream(0)-HttpPoll(0), 5 total, 0 closed abnormally (0 connect failure, 0 send limit, 0 transport error)], stompSubProtocol[processed CONNECT(0)-CONNECTED(0)-DISCONNECT(0)], stompBrokerRelay[null], inboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5], sockJsScheduler[pool size = 8, active threads = 1, queued tasks = 4, completed tasks = 275440] |
One thought that came to my mind: |
The database should already have quite a few indexes on relevant columns; you could check using a tool like DBeaver to connect to the database, although if your tables are missing indexes, I would expect that to drastically slow down searching, not scanning new files. In fact, lacking indexes would actually make inserting new records faster since it can simply append them to the table without updating the indexes. Scanning should mostly be limited by disk read speed, since it has to read in the entire file, and to a lesser degree by CPU speed, since it has to generate a hash to identify the bag file. This could be an issue if you're reading very large bags over a slow network connection, or potentially if you're reading large bag files from slow HDDs, especially if the bag files themselves are unindexed and there's a lot of disk thrashing going on. |
Are your bags large by any chance, like more than 2GB?
…On Wed, Aug 24, 2022, 4:01 PM P. J. Reed ***@***.***> wrote:
The database should already have quite a few indexes on relevant columns;
you could check using a tool like DBeaver to connect to the database,
although if your tables are missing indexes, I would expect that to
drastically slow down searching, not scanning new files. In fact, lacking
indexes would actually make inserting new records faster since it can
simply append them to the table without updating the indexes.
Scanning should mostly be limited by disk read speed, since it has to read
in the entire file, and to a lesser degree by CPU speed, since it has to
generate a hash to identify the bag file. This could be an issue if you're
reading very large bags over a slow network connection, or potentially if
you're reading large bag files from slow HDDs, especially if the bag files
themselves are unindexed and there's a lot of disk thrashing going on.
—
Reply to this email directly, view it on GitHub
<#170 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBZWHDLKSD74EUXGOQZXF3V22ETRANCNFSM5MJZDFSQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hello, |
@ptulpen There's not limit to my knowledge. I mostly wanted to make sure you weren't uploading 100GB bag files that might be causing network issues. |
How much free RAM does your server have? Is it possible that it's hitting swap space while trying to read the bags? |
I have 32 GB ram(and 8 cpu) and it is not fully used |
I also see in the scanning process some interesting errors like these:
Description is part of the forders structure, which, but not even a complete folder name. |
That's interesting, that definitely isn't a normal error... That exception looks like it's being thrown from code that is trying to parse tags in the bag file. If you've configured metadata topics, then it expects every message on those topics to have a
I suspect you've got a bag file with metadata that is formatted in a way it doesn't expect; do you have an example of anything in your files that might be formatted differently from that? |
I've submitted a PR at #196 that will make it handle invalid metadata more gracefully when scanning bag files. I don't know if that will fix the speed issue you're having, but it may fix some other issues people have seen with it failing to recognize certain bag files... |
Hello, now I rebuild it to With a small subset I tested it and if looks much faster The more graceful metadata scanning sounds also good, issues like that could happen in other scenarios as well |
Sure, I've pushed a image containing my build to |
v3.5.1 has been released with this fix. |
The patch regarding the metadata is great. EDIT: also it appears to be on the database when everything is done (at least according to grepping through pqdump ) |
another thought regarding this: we saw that we have many images and quite big videos inside the bags |
For us analysis is CPU bound. It seems to be single-threaded which is a waste of the 16 threads available 🙁 |
Hello,
I have the issue that the scanning is very slow, now running over a week for around 3000 files
Is there a way to restart and/or troubleshoot the scanning progress?
(restarting the container and even the server does not help)
Version is 3.4.2
The text was updated successfully, but these errors were encountered: