Stuck at 'Stage 1 - Document filtering' #68

peteruithoven · 2017-08-02T10:25:28Z

I'm using couchdb-dump version: 1.1.7

I have a database, which is successfully downloaded to a file (39MB), but it get's stuck at Stage 1 - Document filtering.

... INFO: Output file bob.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 38.2M    0 38.2M    0     0  10.0M      0 --:--:--  0:00:03 --:--:-- 10.0M
... INFO: File may contain Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering

Since it's below 250MB, the parsing isn't multi-threaded.

I'm assuming it's stuck at the sed line:

$sed_cmd ${sed_edit_in_place} 's/.*,"doc"://g'

Could someone what's the purpose of removing .*,"doc":? Is this the Database Compaction or Purge Historic and Deleted Data logic?

Looking into the json file, it removed the following part on each line.

{"id":"...","key":"...","value":{"rev":"..."},"doc":

I think a comment above that code is welcome.

I'm assuming my issue is caused by binary attachments in all the docs.

I don't think I'm helped with #31, since I do want this to happen.

The text was updated successfully, but these errors were encountered:

peteruithoven · 2017-08-02T10:37:45Z

Testing this just with that doc it seems like we can make that regular expression more performant by making the start more specific, using s/{"id".*,"doc"://g.
Does that make sense?

See: danielebailo#68

dalgibbard · 2017-08-02T13:25:04Z

Hey @peteruithoven - the initial raw export JSON from couchdb contains an encapsulating id/key/rev/doc section for each individual document within the database. To make the document importable back into couchdb, we need to strip this off; then the stage2 sed is to remove the leftover closing curly brace for this 'wrapping' section, and then sed sections 3 and 4 are to fix the header and footer of the JSON.
I think your proposed suggestion to make the sed statement more specific is valid and looks sane.

I don't have a means to test it right now- would you be able to confirm that the exported file is importable again using this change? And for the record, any detail of the speed improvement/time reduction as a result?

peteruithoven · 2017-08-02T22:13:25Z

Thanks for the clarifications.

I've exported all my databases with the altered script, removed them and then reimported them. I haven't found any issues so far.

In regards to speed, I've let old version work for > 10 minutes, on that 39MB file with no progress, I've not seen it finish at all. With my alteration, it takes maybe a few seconds.

* Optimized stage 1 reg-exp See: #68 * Version bump to 1.1.8

dalgibbard · 2017-08-03T11:52:46Z

Merged and closed; thanks!

peteruithoven · 2017-08-03T12:06:34Z

Thanks for checking and merging

epos-eu · 2017-08-03T13:43:43Z

Thanks Darren

…

Il giorno 03 ago 2017, alle ore 14:06, Peter Uithoven ***@***.***> ha scritto: Thanks for checking and merging — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#68 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD3IdpPLlXzxJLSQlBHYorrZkZMio_L-ks5sUbfKgaJpZM4Oq5O4>.

peteruithoven added a commit to peteruithoven/couchdb-dump that referenced this issue Aug 2, 2017

Optimized stage 1 reg-exp

9af84ea

See: danielebailo#68

peteruithoven mentioned this issue Aug 2, 2017

Optimized stage 1 reg-exp #69

Merged

dalgibbard self-assigned this Aug 2, 2017

dalgibbard added the bug label Aug 2, 2017

dalgibbard pushed a commit that referenced this issue Aug 3, 2017

Optimized stage 1 reg-exp (#69)

676f56c

* Optimized stage 1 reg-exp See: #68 * Version bump to 1.1.8

dalgibbard closed this as completed Aug 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck at 'Stage 1 - Document filtering' #68

Stuck at 'Stage 1 - Document filtering' #68

peteruithoven commented Aug 2, 2017

peteruithoven commented Aug 2, 2017

dalgibbard commented Aug 2, 2017

peteruithoven commented Aug 2, 2017

dalgibbard commented Aug 3, 2017

peteruithoven commented Aug 3, 2017

epos-eu commented Aug 3, 2017 via email

Stuck at 'Stage 1 - Document filtering' #68

Stuck at 'Stage 1 - Document filtering' #68

Comments

peteruithoven commented Aug 2, 2017

peteruithoven commented Aug 2, 2017

dalgibbard commented Aug 2, 2017

peteruithoven commented Aug 2, 2017

dalgibbard commented Aug 3, 2017

peteruithoven commented Aug 3, 2017

epos-eu commented Aug 3, 2017 via email