Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CGMES loading from zipped profiles inside a folder #3309

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

quinarygio
Copy link
Contributor

@quinarygio quinarygio commented Feb 5, 2025

Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

Does this PR already have an issue describing the problem?

#3259

Does this PR introduce a breaking change or deprecate an API?

  • Yes
  • No

Other information:

@quinarygio quinarygio marked this pull request as draft February 5, 2025 16:58
@alicecaron alicecaron linked an issue Feb 6, 2025 that may be closed by this pull request
Signed-off-by: Giovanni Ferrari <[email protected]>
Signed-off-by: Giovanni Ferrari <[email protected]>
@alicecaron alicecaron requested a review from So-Fras February 13, 2025 09:08
@quinarygio quinarygio marked this pull request as ready for review February 13, 2025 13:14
@alicecaron alicecaron requested a review from olperr1 February 13, 2025 15:41
* Mutualize code to extract InputStream
* try-with-resources through namespaceGetter function

Signed-off-by: alicecaron <[email protected]>
Co-authored-by: Florian Dupuy <[email protected]>
@flo-dup flo-dup force-pushed the cgmes-load-zipped-profiles branch from 959f6f4 to 987c38f Compare February 19, 2025 14:29
Signed-off-by: Giovanni Ferrari <[email protected]>
Fix sonar issues

Signed-off-by: Giovanni Ferrari <[email protected]>
Signed-off-by: Giovanni Ferrari <[email protected]>
Signed-off-by: Giovanni Ferrari <[email protected]>
Comment on lines 61 to 68
// copy and compress each of the profile to the file system
Path workDir = fileSystem.getPath("/work");
for (String profile : profiles) {
try (var is = testDataSource.newInputStream(profile);
var os = new ZipOutputStream(Files.newOutputStream(workDir.resolve(profile + ".zip")))) {
os.closeEntry();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// copy and compress each of the profile to the file system
Path workDir = fileSystem.getPath("/work");
for (String profile : profiles) {
try (var is = testDataSource.newInputStream(profile);
var os = new ZipOutputStream(Files.newOutputStream(workDir.resolve(profile + ".zip")))) {
os.closeEntry();
}
}
Path workDir = fileSystem.getPath("/work");
for (String profile : profiles) {
try (var os = new ZipOutputStream(Files.newOutputStream(workDir.resolve(profile + ".zip")))) {
os.closeEntry();
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In emptyZipErrorTest(), there's no need to create an input stream for each profile since they are not read.

try (InputStream in = dataSource.newInputStream(n)) {
String fileExtension = n.substring(n.lastIndexOf('.') + 1);
if (fileExtension.equals(CompressionFormat.ZIP.getExtension())) {
ZipSecurityHelper.checkIfZipExtractionIsSafe(dataSource, n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a performance issue here. We're reading the whole unzipped file to detect if the zip extraction is safe but we only need the first tag (for namespaces definition or the base attribute). I think we need to do something smarter to get safely the first characters only, not unzipping the complete file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to check just first zip entry safety, checking only the compression ratio

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simpler way is to use a SafeZipInputStream class wrapping the ZipInputStream instead of the ZipInputStream itself:

public class SafeZipInputStream extends ForwardingInputStream<ZipInputStream> {
    @Override
    public int read() throws IOException {
        int byteRead = super.read();
        if (byteRead != -1 && this.bytesRead++ > this.maxBytesToRead) {
          throw new IOException();
        }
        return byteRead;
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        ... // similar way
    }
}

That way you'll only read the first lines, not unzipping the full file, which might be more than 1GB uncompressed. Besides we could restrict maxBytesRead a lot in this PR (a few kB, or 1MB?), as we only read the first tag anyway (we stop at first START_ELEMENT in the 3 use cases).

@@ -5,6 +5,8 @@ The CIM-CGMES importer reads and converts a CIM-CGMES model to the PowSyBl grid
- Convert CIM-CGMES data retrieved by SPARQL requests from the created triplestore to PowSyBl grid model

The data in input CIM/XML files uses RDF (Resource Description Framework) syntax. In RDF, data is described making statements about resources using triplet expressions: (subject, predicate, object).
The CIM-CGMES importer supports also ZIP compressed input files. The importer will decompress the ZIP files to read CIM/XML data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so clear here, I think we should enumerate the 3 ways (or more?) to import a CGMES file:

  • a folder containing all uncompressed profile files
  • a folder containing the zipped profile files, each one being in a separate zip file
  • a zipped file containing all profile files

Fix documentation

Signed-off-by: Giovanni Ferrari <[email protected]>
Signed-off-by: Giovanni Ferrari <[email protected]>
@quinarygio
Copy link
Contributor Author

Using the new SafeZipInputStream class to read from the ZipInputStream, the ZipSecurityHelper class is not used anymore. Should we keep it or not ?

Signed-off-by: Giovanni Ferrari <[email protected]>
Comment on lines 18 to 19
private int bytesRead;
private int maxBytesToRead;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to go further than 2GB, so an int is not enough.

Suggested change
private int bytesRead;
private int maxBytesToRead;
private long bytesRead;
private long maxBytesToRead;

super(in);
this.maxBytesToRead = maxBytesToRead;
for (int i = 0; i < entryNumber; i++) {
ZipEntry zipEntry = in.getNextEntry();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to reuse this for other usecases where several ZipInputStream entries have to be read. This is not really easy to use that way, I think you'd need to create a new SafeZipInputStream(zin, 1, max) for each entry?

Adding a SafeZipInputStream::getNextEntry method would do the trick, at the cost of adding a protected T getDelegate() in ForwardingInputStream.

if (byteRead != -1 && (this.bytesRead + byteRead) > this.maxBytesToRead) {
throw new IOException("Max bytes to read exceeded");
}
this.bytesRead += byteRead;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid doing twice the addition include this in an if (byteRead + -1) before comparing to the max

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed as it's not needed anymore (in this PR at least!) we should not keep it

Remove unused ZipSecurityHelper class.

Signed-off-by: Giovanni Ferrari <[email protected]>
Copy link

sonarqubecloud bot commented Mar 6, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for review
Development

Successfully merging this pull request may close these issues.

CGMES loading from zipped profiles inside a folder
5 participants