Skip to content

Checksum verification process

Jayanth Dungavath edited this page Nov 6, 2017 · 1 revision
  1. We have exported checksums of all the FileSets along with the files and have also computed checksums of all the files after we exported them.
  2. After ingesting all the Version 0 files, we extracted checksums of all the files ingested using the following recipe.
    File.open('tmp/console.out', 'w') do |file|
        FileSet.all.each do | pid |
            file.syswrite "sufia:#{pid.id} #{pid.original_checksum.first}\n"
        end
    end
  1. Wrote a short shell script to fetch the PID, New Prod checksum exported using the above recipe, old computed checksums, old extracted checksum and created a CSV file out of it and compared new prod CS and old computed CS if they matched.

  2. Of 6119, about 93 FileSets didn't match and realized that those were the FileSets that have more than one version and I was comparing new prod version 0 against old computed CSs of the latest version FS.

  3. We ingested versions using the following recipe

    file_set = FileSet.find '05741w48b'
    user = User.find_by_email(file_set.creator.first)
    actor = Hyrax::Actors::FileSetActor.new(file_set, user)
    uploaded_file = Hyrax::UploadedFile.create(file: File.open('/tmp/versions/sufia:05741w48b/1/bain1887.pdf'))
    actor.update_content(uploaded_file.file)
  1. After the ingest, we could not extract CSs using the recipe in 1. Hence, we downloaded all the FileSets from new prod locally and computed CSs and compared the same against the old computed CSs.
Clone this wiki locally