-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hashInHttpHeaders option for Coursier resolver #383
base: master
Are you sure you want to change the base?
Add hashInHttpHeaders option for Coursier resolver #383
Conversation
…f checksumming from network and the disk cache)
Also note that this is the first bit of serious Scala that I've written, so please feel free to suggest anything stylistic also :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really exciting.
Thanks for sending the PR.
I had a few requests for changes or comments before merging.
case e: java.nio.file.FileAlreadyExistsException => () | ||
} | ||
} else { | ||
// println(s"not caching error for $artifact") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove the commented code or use a logging api here (I think other files are using slf4j if I remember correctly).
Task.schedule(CoursierResolver.downloadPool) { | ||
// Since we use atomic moves, we can guarantee that if the header file exists, it is complete. | ||
if (!Files.exists(headersPath)) { | ||
val tmp = CachePath.temporaryFile(headersPath.toFile).toPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we factor lines 197 - 246 into a method such as fetchHeadersToPath(...)
or something.
Failure(Recoverable(new RuntimeException(s"failed to parse headers file $headersPath", error))) | ||
case Right(obj) => Success(obj) | ||
}) | ||
.map((headerMap) => HttpHeaders.of(headerMap.map { case (k, v) => (k, v.asJava) }.asJava, { (_, _) => true })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You don't need (headerMap)
it is the same as .map(headerMap => HttpHeaders...
} | ||
.map { case (sha, length) => (artifact, ShaValue(sha, digestType), length) } | ||
)) | ||
} else Task.fail(Recoverable(new RuntimeException("skipped HEAD request"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if for this else is very far some here... maybe it will get closer if we take the body of the if and put it in a function or method.
.flatMap { | ||
case Right(r) => Task.point(r) | ||
case Left(e: Recoverable) => | ||
// println(s"falling back to downloading the whole file: $e") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove the commented code please?
"hashInHttpHeaders", | ||
hashInHttpHeaders.map(b => Doc.text(s"$b")) | ||
), | ||
).sortBy(_._1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this was copied, but we don't need to sort a list of length 1. but you can leave it if you add a comment that this is here so we don't forget if we add more options.
|
||
val items = List( | ||
( | ||
"hashInHttpHeaders", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure why we wouldn't always want to enable this if it works. Can you think of a reason? Can you document the reason here so someone reading the code can remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it doesn't work with all Maven repository servers. E.g., https://repo1.maven.org/maven2/ only supplies SHA-1 checksums, so it would be a waste to enable this option if that's the repo that's getting used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense.
) | ||
case g: ResolverType.Gradle => | ||
val ec = scala.concurrent.ExecutionContext.Implicits.global | ||
import scala.concurrent.duration._ | ||
|
||
lazy val coursierResolver = | ||
new CoursierResolver(model.getOptions.getResolvers, ec, 3600.seconds, resolverCachePath) | ||
new CoursierResolver(model.getOptions.getResolvers, false, ec, 3600.seconds, resolverCachePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above: if this works, shouldn't true be the default? Why would you set false? Maybe if you know with certainty your http service doesn't give out the headers?
.firstValue(digestType match { | ||
// See also https://maven.apache.org/resolver/expected-checksums.html#non-standard-x-headers | ||
case DigestType.Sha1 => "x-checksum-sha1" | ||
case DigestType.Sha256 => "x-checksum-sha256" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be moved to a method on DigestType
? so this function is: headers.firstValue(digestType.mavenHeader)
or something?
Also what about x-goog-meta-checksum-sha1
and x-goog-meta-checksum-md5
?
Maybe in fact we need to something like:
sealed trait DigestType {
def getDigestInstance: MessageDigest
def expectedHexLength: Int
def name: String
def headerKeys: List[String]
}
then do:
Try {
val digest = digestType
.headerKeys
.flatMap(key => headers.firstValue(key).asScala)
.headOption
.getOrElse(throw new Recoverable(new RuntimeException(s"no ${digestType} found in headers in $headersPath"))
val len = headers
.firstValueAsLong("Content-Length")
.orElseThrow(() =>
Recoverable(new RuntimeException(s"no Content-Length found in headers $headersPath"))
)
(digest, len)
}
also it looks like the tests fail to build:
ideally we could generate Coursier with and without the option set to test the parsing and formatting code. |
Add a new resolver option
hashInHttpHeaders
for Coursier, which tells bazel-deps to use checksums in HTTP headers if possible, instead of downloading the jar artifact and computing hash digests locally. The option is read from theresolverOptions
object in the input YAML file.If this option is true, when computing checksums, bazel-deps will:
HEAD
request to the artifact.HEAD
request was successful:HEAD
request was unsuccessful:.pom
file exists for this artifact, assume that the artifact will never be published in the future. Cache the error status in the Coursier cache directory to avoid downloading this file in the future. This is the same heuristic that Coursier itself uses.This change reduces the fully-cached runtime of bazel-deps on our internal repo (which uses Artifactory, which supplies both SHA-1 and SHA-256 hashes via headers) from 80s to 7s, and reduces the size of the Coursier cache directory by 99%.
Co-authored-by: Keith Lea [email protected]
cc @keithl-stripe