-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CGMES loading from zipped profiles inside a folder #3309
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Giovanni Ferrari <[email protected]>
Signed-off-by: Giovanni Ferrari <[email protected]>
Signed-off-by: Giovanni Ferrari <[email protected]>
cgmes/cgmes-model/src/main/java/com/powsybl/cgmes/model/CgmesOnDataSource.java
Outdated
Show resolved
Hide resolved
commons/src/test/java/com/powsybl/commons/compress/ZipSecurityHelperTest.java
Outdated
Show resolved
Hide resolved
cgmes/cgmes-model/src/main/java/com/powsybl/cgmes/model/CgmesOnDataSource.java
Outdated
Show resolved
Hide resolved
* Mutualize code to extract InputStream * try-with-resources through namespaceGetter function Signed-off-by: alicecaron <[email protected]> Co-authored-by: Florian Dupuy <[email protected]>
959f6f4
to
987c38f
Compare
commons/src/main/java/com/powsybl/commons/compress/ZipSecurityHelper.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Giovanni Ferrari <[email protected]>
Fix sonar issues Signed-off-by: Giovanni Ferrari <[email protected]>
Signed-off-by: Giovanni Ferrari <[email protected]>
Signed-off-by: Giovanni Ferrari <[email protected]>
commons/src/main/java/com/powsybl/commons/compress/ZipSecurityHelper.java
Outdated
Show resolved
Hide resolved
...cgmes-conversion/src/test/java/com/powsybl/cgmes/conversion/test/LoadZippedProfilesTest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Giovanni Ferrari <[email protected]>
// copy and compress each of the profile to the file system | ||
Path workDir = fileSystem.getPath("/work"); | ||
for (String profile : profiles) { | ||
try (var is = testDataSource.newInputStream(profile); | ||
var os = new ZipOutputStream(Files.newOutputStream(workDir.resolve(profile + ".zip")))) { | ||
os.closeEntry(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// copy and compress each of the profile to the file system | |
Path workDir = fileSystem.getPath("/work"); | |
for (String profile : profiles) { | |
try (var is = testDataSource.newInputStream(profile); | |
var os = new ZipOutputStream(Files.newOutputStream(workDir.resolve(profile + ".zip")))) { | |
os.closeEntry(); | |
} | |
} | |
Path workDir = fileSystem.getPath("/work"); | |
for (String profile : profiles) { | |
try (var os = new ZipOutputStream(Files.newOutputStream(workDir.resolve(profile + ".zip")))) { | |
os.closeEntry(); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In emptyZipErrorTest()
, there's no need to create an input stream for each profile since they are not read.
commons/src/main/java/com/powsybl/commons/compress/ZipSecurityHelper.java
Outdated
Show resolved
Hide resolved
commons/src/main/java/com/powsybl/commons/compress/ZipSecurityHelper.java
Outdated
Show resolved
Hide resolved
commons/src/main/java/com/powsybl/commons/compress/ZipSecurityHelper.java
Outdated
Show resolved
Hide resolved
...cgmes-conversion/src/test/java/com/powsybl/cgmes/conversion/test/LoadZippedProfilesTest.java
Show resolved
Hide resolved
try (InputStream in = dataSource.newInputStream(n)) { | ||
String fileExtension = n.substring(n.lastIndexOf('.') + 1); | ||
if (fileExtension.equals(CompressionFormat.ZIP.getExtension())) { | ||
ZipSecurityHelper.checkIfZipExtractionIsSafe(dataSource, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a performance issue here. We're reading the whole unzipped file to detect if the zip extraction is safe but we only need the first tag (for namespaces definition or the base attribute). I think we need to do something smarter to get safely the first characters only, not unzipping the complete file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to check just first zip entry safety, checking only the compression ratio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a simpler way is to use a SafeZipInputStream
class wrapping the ZipInputStream
instead of the ZipInputStream
itself:
public class SafeZipInputStream extends ForwardingInputStream<ZipInputStream> {
@Override
public int read() throws IOException {
int byteRead = super.read();
if (byteRead != -1 && this.bytesRead++ > this.maxBytesToRead) {
throw new IOException();
}
return byteRead;
}
@Override
public int read(byte[] b, int off, int len) throws IOException {
... // similar way
}
}
That way you'll only read the first lines, not unzipping the full file, which might be more than 1GB uncompressed. Besides we could restrict maxBytesRead a lot in this PR (a few kB, or 1MB?), as we only read the first tag anyway (we stop at first START_ELEMENT
in the 3 use cases).
@@ -5,6 +5,8 @@ The CIM-CGMES importer reads and converts a CIM-CGMES model to the PowSyBl grid | |||
- Convert CIM-CGMES data retrieved by SPARQL requests from the created triplestore to PowSyBl grid model | |||
|
|||
The data in input CIM/XML files uses RDF (Resource Description Framework) syntax. In RDF, data is described making statements about resources using triplet expressions: (subject, predicate, object). | |||
The CIM-CGMES importer supports also ZIP compressed input files. The importer will decompress the ZIP files to read CIM/XML data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not so clear here, I think we should enumerate the 3 ways (or more?) to import a CGMES file:
- a folder containing all uncompressed profile files
- a folder containing the zipped profile files, each one being in a separate zip file
- a zipped file containing all profile files
Fix documentation Signed-off-by: Giovanni Ferrari <[email protected]>
Signed-off-by: Giovanni Ferrari <[email protected]>
Using the new SafeZipInputStream class to read from the ZipInputStream, the ZipSecurityHelper class is not used anymore. Should we keep it or not ? |
Signed-off-by: Giovanni Ferrari <[email protected]>
private int bytesRead; | ||
private int maxBytesToRead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to go further than 2GB, so an int is not enough.
private int bytesRead; | |
private int maxBytesToRead; | |
private long bytesRead; | |
private long maxBytesToRead; |
super(in); | ||
this.maxBytesToRead = maxBytesToRead; | ||
for (int i = 0; i < entryNumber; i++) { | ||
ZipEntry zipEntry = in.getNextEntry(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to reuse this for other usecases where several ZipInputStream entries have to be read. This is not really easy to use that way, I think you'd need to create a new SafeZipInputStream(zin, 1, max)
for each entry?
Adding a SafeZipInputStream::getNextEntry
method would do the trick, at the cost of adding a protected T getDelegate()
in ForwardingInputStream
.
if (byteRead != -1 && (this.bytesRead + byteRead) > this.maxBytesToRead) { | ||
throw new IOException("Max bytes to read exceeded"); | ||
} | ||
this.bytesRead += byteRead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid doing twice the addition include this in an if (byteRead + -1)
before comparing to the max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed as it's not needed anymore (in this PR at least!) we should not keep it
Remove unused ZipSecurityHelper class. Signed-off-by: Giovanni Ferrari <[email protected]>
|
Please check if the PR fulfills these requirements
Does this PR already have an issue describing the problem?
#3259
Does this PR introduce a breaking change or deprecate an API?
Other information: