Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix data_file in SigMFArchiveReader() #42

Merged
merged 6 commits into from
Jun 6, 2024
Merged

Conversation

liambeguin
Copy link
Contributor

When opening a sigmf archive with SigMFArchiveReader(), the data file is
currently set to the full archive (including metadata).

This causes issues when writing the archive back to disk and invalidates
the metadata hash since the data_file is now a tar archive and not just
the set of samples.

To work around the issue, carry around a BytesIO buffer with the content
of the data_file and write it to the tmpdir just before saving a new
archive to disk.

Copy link
Collaborator

@Teque5 Teque5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR; used SigMFFileError and even included a test for the issue.

sigmf/utils.py Outdated
@@ -18,7 +18,7 @@

def get_sigmf_iso8601_datetime_now() -> str:
"""Get current UTC time as iso8601 string"""
return datetime.isoformat(datetime.utcnow()) + "Z"
return datetime.isoformat(datetime.now(UTC)) + "Z"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why but ok.

Copy link
Collaborator

@Teque5 Teque5 Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this part is failing in tests. datetime.UTC was added in 3.11, so if you revert this part it will work in 3.6-3.10 at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Teque5,

Nice PR; used SigMFFileError and even included a test for the issue.

Thanks!

we can drop the datetime change, it's pretty irrelevant. I was just getting a warning and thought I'd address it while I was at it.

@liambeguin liambeguin marked this pull request as draft December 27, 2023 18:41
@liambeguin
Copy link
Contributor Author

I put this as a draft because I noticed the following:

file = 'QPSK_4SPS.sigmf-data'

s = sigmf.sigmffile.fromfile(file.replace('-data', '-meta'))
d1 = s.read_samples()

s = sigmf.sigmffile.fromarchive(file.replace('-data', ''))
d2 = s.read_samples()
d3 = s[:]

print(np.array_equal(d1, d2)) # False!
print(np.array_equal(d1, d3)) # True

This is also true on master, I'll try to get ready shortly.

@Teque5
Copy link
Collaborator

Teque5 commented Feb 8, 2024

Any updates on this? It was close to being ready.

@liambeguin
Copy link
Contributor Author

Hi @Teque5 apologies for the delay! I haven't had time to look at this more closely. Since this was also seen on master, maybe we should open another issue to keep track of it separately. I'll rebase and change the status of the MR.

@liambeguin liambeguin marked this pull request as ready for review April 11, 2024 21:30
Copy link
Collaborator

@Teque5 Teque5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a closer look at the PR and I found a case where these changes break the implementation. Use the sigmf_logo from the SigMF spec repository:

meta = sigmf.sigmffile.fromfile('sigmf_logo')
meta.archive('logo_archive')
# identical to using `fromarchive`
arc = sigmf.sigmffile.fromfile('logo_archive.sigmf')

# reading from memory map works correctly
np.array_equal(meta[:], arc[:])
True

# reading from the archive seems to be broken on this branch:
arc.read_samples()
ValueError: <doesn't detect 2 channels>

# ideally this should also pass True
np.array_equal(meta.read_samples(), arc.read_samples())

FYI when reading from a memory map, the samples retrieved will be in the format inside the file (like uint8) but otherwise we expect the reader to return either np.float32 or np.complex64.

Please take a moment to address this (and maybe add a test) and I'll be happy to merge. thx

@liambeguin
Copy link
Contributor Author

Hi @Teque5, thanks for pointing that out! this last update should fix the issue. here's a short range-diff:

$ git range-diff origin/main lvb/archive-fix archive-fix 
1:  c17754ca7415 = 1:  f12fea276be6 sigmffile: propagate skip_checksum when opening archives
2:  385804da7dae = 2:  676d69cb0f90 tests: archive: make sure we don't change samples
3:  4087e628f708 = 3:  72f5fe595867 sigmffile: be more specific on except statement
4:  bc7375138e62 = 4:  2b5ba795d743 sigmffile: separate data_offset and data_size
5:  747e162aac47 ! 5:  5c4c33d67a72 archive: use BytesIO to store data file
    @@ sigmf/archivereader.py
      import tempfile
     +from pathlib import Path
      
    - from . import __version__  #, schema, sigmf_hash, validate
    - from .sigmffile import SigMFFile
    + from . import __version__
    + from .archive import SIGMF_ARCHIVE_EXT, SIGMF_DATASET_EXT, SIGMF_METADATA_EXT, SigMFArchive
     @@ sigmf/archivereader.py: class SigMFArchiveReader():
                      elif memb.name.endswith(SIGMF_DATASET_EXT):
                          data_offset = memb.offset_data
    @@ sigmf/archivereader.py: class SigMFArchiveReader():
              )
     
      ## sigmf/sigmffile.py ##
    -@@ sigmf/sigmffile.py: import tempfile
    +@@ sigmf/sigmffile.py: from collections import OrderedDict
      from os import path
    - import warnings
    + 
      import numpy as np
     +import io
      
    - from . import __version__, schema, sigmf_hash, validate
    - from .archive import SigMFArchive, SIGMF_DATASET_EXT, SIGMF_METADATA_EXT, SIGMF_ARCHIVE_EXT, SIGMF_COLLECTION_EXT
    + from . import __specification__, __version__, schema, sigmf_hash, validate
    + from .archive import SIGMF_ARCHIVE_EXT, SIGMF_COLLECTION_EXT, SIGMF_DATASET_EXT, SIGMF_METADATA_EXT, SigMFArchive
     @@ sigmf/sigmffile.py: class SigMFFile(SigMFMetafile):
                  # check for any non-zero `header_bytes` fields in captures segments
                  if capture.get(self.HEADER_BYTES_KEY, 0):
    @@ sigmf/sigmffile.py: class SigMFFile(SigMFMetafile):
     +            fp = open(self.data_file, "rb")
     +            fp.seek(first_byte, 0)
     +            data = np.fromfile(fp, dtype=data_type_in, count=nitems)
    ++        elif self.data_buffer is not None:
    ++            data = np.frombuffer(self.data_buffer.getbuffer(), dtype=data_type_in, count=nitems)
     +        else:
     +            data = self._memmap
     +

I feel like this is a little complicated for what we're trying to do, but unfortunately I don't have a lot of time to put into simplifying this. I'll try to get back to it as soon as possible.

@liambeguin liambeguin requested a review from Teque5 June 6, 2024 16:04
@Teque5
Copy link
Collaborator

Teque5 commented Jun 6, 2024

I believe you need to add import sigmf to test_archivereader to make your test pass checks.

Propagate skip_checksum down when opening a sigmf archive just as it's
done for other formats.

Signed-off-by: Liam Beguin <[email protected]>
Make sure opening and closing a sigmf archive doesn't alter samples.

Right now this test case fails because SigMFArchiveReader sets the
data_file to the tarfile on open.

Signed-off-by: Liam Beguin <[email protected]>
This tries to catch the ValueError raised one line above. Be more
specifispecific about the exception type so as to not mask other error.

This is a preparatory change for the archive data file fix.

Signed-off-by: Liam Beguin <[email protected]>
These elements are usually checked independently, separate them into two
arguments to facilitate checks.

Signed-off-by: Liam Beguin <[email protected]>
When opening a sigmf archive with SigMFArchiveReader(), the data file is
currently set to the full archive (including metadata).

This causes issues when writing the archive back to disk and invalidates
the metadata hash since the data_file is now a tar archive and not just
the set of samples.

To work around the issue, carry around a BytesIO buffer with the content
of the data_file and write it to the tmpdir just before saving a new
archive to disk.

Signed-off-by: Liam Beguin <[email protected]>
@liambeguin
Copy link
Contributor Author

I believe you need to add import sigmf to test_archivereader to make your test pass checks.

oh.. apologies for that! somehow this was passing on the branch I had locally. just pushed a fix.

@Teque5
Copy link
Collaborator

Teque5 commented Jun 6, 2024

Sorry the PySimpleGUI devs just yanked their package from pip so I'm fixing that bug on your branch as part of this PR.

* sort imports & black formatting
* remove import * from gui
* disable testing of gui (PySimpleGUI issue)
* increment to v1.2.2
Copy link
Collaborator

@Teque5 Teque5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@Teque5 Teque5 merged commit f1a3f47 into sigmf:main Jun 6, 2024
3 checks passed
@liambeguin liambeguin deleted the archive-fix branch June 6, 2024 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants