Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json primitive map keys #319

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/workflows/scala.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: Scala CI

on:
[push, pull_request]

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- run: git fetch --prune --unshallow
- name: Set up JDK 1.8
uses: actions/setup-java@v1
with:
java-version: 1.8
- name: Run tests
run: sbt test
11 changes: 11 additions & 0 deletions core/src/main/scala/io/bullet/borer/Reader.scala
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,17 @@ final class InputReader[Config <: Reader.Config](
@inline def hasByte(value: Byte): Boolean = hasByte && receptacle.intValue == value.toInt
@inline def tryReadByte(value: Byte): Boolean = clearIfTrue(hasByte(value))

private def readLongFromString(): Long = {
clearDataItem()
new String(receptacle.charBufValue, 0, receptacle.intValue).toLong
}

def readShort(): Short =
if (hasShort) {
clearDataItem()
receptacle.intValue.toShort
} else if (hasChars) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think it's a good idea to generally attempt to parse a String element when the user want's to read a number. If the user wants to read a short we shouldn't accept a String.
Remember that borer is first and foremost a CBOR serialization library and supports JSON only as a subset of what CBOR offers.

If we want to enable this "read-simple-types-from-strings" feature we should at least put it behind a configuration flag. Also, we'd have to provide rock-solid error handling since we are now parsing in the Reader itself. And: We'd have to make sure that this additional code doesn't end up hurting performance, e.g. by increasing the method size beyond some inlining limit.

Luckily we are outside of the main hot path here and can easily move all the code in the else branch out into a separate method, where it doesn't mess with JVM inlining scopes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to enable this "read-simple-types-from-strings" feature

I would prefer enabling this for map keys only, but as I wrote, I have no idea how to check if I am parsing a map key or not. Do you see some way for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rock-solid error handling

Is there any parsing with error handling ready for individual numeric types I could use here, or should I use readLongFromString as I do now and check for the range?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... what's your reasoning behind only allowing this auto-conversion on map keys and not other elements as well? What's so special about map keys? To me they are are simply data elements like every other one as well.

The Reader.Config interface already has two other config flags called readIntegersAlsoAsFloatingPoint and readDoubleAlsoAsFloat. We can simply add readStringsAlsoAsPrimitives, which would enable this auto-conversion everywhere (but the default should be false).

(And while we are at it, breaking binary compatibility, we can also rename readDoubleAlsoAsFloat to readDoublesAlsoAsFloat for better consistency.)

Is there any parsing with error handling ready for individual numeric types I could use here, or should I use readLongFromString as I do now and check for the range?

I would simply let java.lang.Integer.parseInteger, java.lang.Short.parseShort, java.lang.Float.parseFloat, etc. do the job and wrap all exceptions in a Borer.InvalidInputData.
When reading booleans from a String I would also allow for "off" and "no" to be read as false and "on" and "yes" to be read as true, in addition to "false" and "true".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not thinking about it as auto-conversion, more like map key encoding / decoding - which is a topic of this PR. I have implemented the encoding, that I think is easy and straightforward. I need some decoding as well. A universal auto-conversion is a possible way, but as it achieves more than desired, it raises the need of config flags.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get the point of your last comment.
What is it that you are suggesting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just trying to explain why I wanted to limit the string parsing to map key. I have no idea how to achieve that, therefore I am unable to suggest something. I leave the decision to you - if you think a broader implementation accepting strings as primitives everywhere is good with you, I can proceed.

There is only one minor point I do not like about this: it will be possible to encode a map to JSON without any flags, the numeric keys will be encoded as string, but when one wants to decode the result of such encoding, it will be necessary to use the config allowing readStringsAlsoAsPrimitives. If you think this is fine, I can live with that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Reader abstraction has no way of knowing what the role of the next data element is. It doesn't have to know, because in CBOR all data element types can appear in all "positions". It's only JSON that limits certain positions (map keys) to certain types (Strings).
As such there is no readily available way to restrict the "primitives-from-string" auto-conversion to map keys at the reader level and I wouldn't like to change the design or add additional layers/complexity just for this (minor, IMHO) feature.
So, I'm afraid it's going to be an all-or-nothing decision for or against the readStringsAlsoAsPrimitives functionality.

There is only one minor point...

Yes, I agree. This isn't perfect but acceptable to me. borer offers configuration options at the encoding or decoding level, not overarching, across everything. A certain amount of asymmetry is ok, I think.
We already have readIntegersAlsoAsFloatingPoint and readDoubleAlsoAsFloat, which are also somewaht asymmetric.

readLongFromString().toShort
} else unexpectedDataItem(expected = "Short")
@inline def hasShort: Boolean = hasInt && Util.isShort(receptacle.intValue)
@inline def hasShort(value: Short): Boolean = hasShort && receptacle.intValue == value.toInt
Expand All @@ -141,6 +148,8 @@ final class InputReader[Config <: Reader.Config](
if (hasInt) {
clearDataItem()
receptacle.intValue
} else if (hasChars) {
readLongFromString().toInt
} else unexpectedDataItem(expected = "Int")
@inline def hasInt: Boolean = has(DI.Int)
@inline def hasInt(value: Int): Boolean = hasInt && receptacle.intValue == value
Expand All @@ -151,6 +160,8 @@ final class InputReader[Config <: Reader.Config](
val result = if (hasInt) receptacle.intValue.toLong else receptacle.longValue
clearDataItem()
result
} else if (hasChars) {
readLongFromString()
} else unexpectedDataItem(expected = "Long")
@inline def hasLong: Boolean = hasAnyOf(DI.Int | DI.Long)

Expand Down
4 changes: 3 additions & 1 deletion core/src/main/scala/io/bullet/borer/json/JsonRenderer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,9 @@ final private[borer] class JsonRenderer(var out: Output) extends Renderer {
def onLong(value: Long): Unit =
if (isNotMapKey) {
out = count(writeLong(sep(out), value))
} else failCannotBeMapKey("integer values")
} else {
out = count(writeLong(sep(out).writeAsByte('"'), value).writeAsByte('"'))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move that into it's own method, so that it doesn't get inlined when the JVM decides to inline the onLong method.

Also, what about booleans, ints, overlongs, floats, doubles and number strings?
Should we also stringify null and undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about booleans, ints, overlongs, floats, doubles and number strings

I will add that. As I wrote, I first wanted to be sure the style I am implementing it is OK with you.

}

def onOverLong(negative: Boolean, value: Long): Unit =
if (isNotMapKey) {
Expand Down
17 changes: 16 additions & 1 deletion core/src/test/scala/io/bullet/borer/AbstractJsonSuiteSpec.scala
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,6 @@ abstract class AbstractJsonSuiteSpec extends AbstractBorerSpec {
import Codec.ForEither.default

roundTrip("{}", Map.empty[Int, String])
intercept[Borer.Error.ValidationFailure[_ <: AnyRef]](encode(ListMap(1 -> 2)))
roundTrip("""{"":2,"foo":4}""", ListMap("" -> 2, "foo" -> 4))
roundTrip("""{"a":[0,1],"b":[1,[2,3]]}""", ListMap("a" -> Left(1), "b" -> Right(Vector(2, 3))))
roundTrip("""[[1,"a"],[0,{"b":"c"}]]""", Vector(Right("a"), Left(ListMap("b" -> "c"))))
Expand All @@ -276,6 +275,22 @@ abstract class AbstractJsonSuiteSpec extends AbstractBorerSpec {
ListMap("addr" -> "1x6YnuBVeeE65dQRZztRWgUPwyBjHCA5g"))
}

"Maps with numeric keys" - {
verifyEncoding(ListMap(1 -> 2, 2 -> 4), """{"1":2,"2":4}""")

case class Maps(
intKey: ListMap[Int, String]
)

implicit val mapsCodec = Codec(Encoder.from(Maps.unapply _), Decoder.from(Maps.apply _))

val map = Maps(
intKey = ListMap(1 -> "Int")
)

roundTrip("""{"1":"Int"}""", map)
}

"Whitespace" - {
val wschars = " \t\n\r"
val random = new Random()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -446,7 +446,7 @@ abstract class DerivationSpec(target: Target) extends AbstractBorerSpec {
decode[List[Animal]](encoded) ==> animals
} catch {
case NonFatal(e) if target == Json =>
e.getMessage ==> "JSON does not support integer values as a map key (Output.ToByteArray index 124)"
e.getMessage ==> "an implementation is missing"
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ object JsonDerivationSpec extends DerivationSpec(Json) {
ArrayElem.Unsized(
ArrayElem.Unsized(StringElem("Dog"), ArrayElem.Unsized(IntElem(12), StringElem("Fred"))),
ArrayElem
.Unsized(StringElem("TheCAT"), ArrayElem.Unsized(DoubleElem(1.0f), StringElem("none"), StringElem("there"))),
.Unsized(StringElem("TheCAT"), ArrayElem.Unsized(DoubleElem(1.0), StringElem("none"), StringElem("there"))),
ArrayElem.Unsized(StringElem("Dog"), ArrayElem.Unsized(IntElem(4), StringElem("Lolle"))),
ArrayElem.Unsized(IntElem(42), BooleanElem.True))

Expand Down