Skip to content

Protocol Message description

gbarton edited this page May 2, 2023 · 22 revisions

Premise

The goals of this messaging protocol are to:

  • Focus very clearly on the movement of primitive datatypes to enable communication
  • Allow for flexible server implementations
  • Be purposefully agnostic to security implementations (to allow best of breed techniques to develop independently of message spec)
  • Support identification of client (I am who I say I am)

Below is the description of the data that moves within the given message types. This is the transport independent protocol that will be adhered to when implemented in a transport API.

Identification-only shared message history

  • Central data structure is a timestamped history of key-signed messages from users
  • Messages can be fairly arbitrary data, but idea is text-like with markup
  • Users aren’t authenticated, but they are identified by their public key (plus short name for readability), and all messages must be signed
  • Clients can request history of messages and list of public keys (so anyone can stand up a server of their own without losing chat history)
  • Servers have their own key-pairs, and can be configured to trust other server’s keys and forward messages from clients
  • Signatures are always of the JSON data excluding the signature item. No unnecessary whitespace or headers/footers/encapsulation.
  • Need to look into encoding media as part of messages (it CAN be done! Data URI scheme)

Protocol notes

  • Client Timestamps are not trusted, used as nonces.
  • Signatures are always of the entire JSON message object excluding signature (order matters).
  • I’ve been using \n to indicate end of message, but might need to re-think this. Primarily because it makes java’s BufferedReader.readLine() block for a complete message without knowing max message length. Either we use a terminator character, or re-work the message history message and get a maximum length.

Sequence of operations

Typical sequence of operations:

  1. Server starts, reads/loads history and key-nick associations, begins accepting connections
  2. Client starts, loads keys (if they exist) or generates keys (if they don’t)
  3. Client initiates connection to server, sends “hello”
  4. Server records key-nick association
  5. Client subscribes to receive (subset of?) messages
  6. Server adds client to list of clients that are listening
  7. Clients send messages to server which relays them to listening clients
  8. Client disconnects with “disconnect,” server removes from listening list and closes connection

Wire Protocol

This message format is purposely agnostic to transport mechanisms to allow a multitude of implementations to be used. Please see the Transport APIs page for more details.

Message format - Protocol Version 1

All messages are JSON format. Most components are strings denoted as <str>. The notation "{str1, str2, ..., str3}" indicates that only a specific set of strings is allowed. For example, response: "{accept, reject}" represents a key response whose value is a string datatype which can be either the string "accept" or the string "reject" but cannot be any other string.

Supported Primitive DataTypes:

  • <str> strings
  • <timestamp> integer with millisecond since epoch
  • <Base64> string in base64 format
  • <hex> string in lowercase hexadecimal format
  • <messageType> one of the following values: hello, subscribe, unsubscribe, post, reply, users, history, disconnect, response, unknown
    • the last 2 are special responses only for servers to use.

General message format:

{
  message: <messageData>,
  sig: <Base64>
  protocolVersion: <number>,
  keyHash: <Base64>
}

The message wrapper encases all messages sent from clients and servers. It provides a mechanism for verification of message integrity via the user to know that it was sent by a particular client.

sig: The signature of the message, used to verify message came from client and was not messed with. Is created by converting the Json message object to string (i.e. JSON.stringify(message) in javascript), hashing the message using 'SHA512', signing with the private key and outputting via base64 encoding.

protocolVersion: Contains the version of the protocol that this message was created with. e.g. 1

keyHash: md5 hash (base64 formatted) of the public key that is the key pair of the private key used in the message signature creation. This allows other clients to disambiguate which public key to use when a nick collision occurs.

messageData format:

{ 
  type: <type>,
  nick: <str>,
  time: <timestamp>,
  content: <object>
}

Server response messages:

{
  type: “response”,
  responseToType: <messageType>,
  origSig: <Base64>,
  response: ”{accept, reject}”, 
  reason: ”{none, format, signature, access, exception}”,
  time: <timestamp>,
  content: <object>
}

Server responds to every message the client sends, even if there is no server response content (if there is response content, it's described below). Server response messages are wrapped in the same general message format and signed with the server's public key.

  • Server de-serializes all messages, on failure to de-serialize server must respond with response: "rejected", reason: "format", responseToType: "unknown". Content is empty, and origSig is left as empty string "" due to being unable to parse the JSON.
  • Type is always messageType value of "response", it is the only place that messageType is used.
  • Server verifies all signatures on messages, responds with “rejected” with reason “signature” for any non-verified message
  • Server must include the original signature (origSig) of the message being responded to as that is the only way to know which message the server is replying to.

Example Message:

{
  message: {
    type: 'response',
    responseToType: 'hello',
    response: 'accept',
    time: 1681658799934,
    origSig: 'M1im....F9qk7o=',
    content: {
      serverKey: '-----BEGIN PUBLIC KEY-----\n' +
        'MII....Q==\n' +
        '-----END PUBLIC KEY-----\n'
    }
  },
  sig: 'FTTy....McE=',
  keyHash: 'l9O4....DVZq8CIg==',
  protocolVersion: 1
}

Object descriptions for specific messages

The following are the object definitions for the standard "client-to-server" messages

"hello" messages

Content: { publicKey: <Base64> }

Response-content: { serverKey: <Base64> }

For initial connections to the server, announces user presence.

Should the server reject if the nick is already present on the server? No, multiple clients can present with the same nick but differing public keys. Clients should always display client key information as part of presenting messages to a user. This enables keeping the same nick across multiple clients without having to worry about transferring keys, but also indicates that nick information is not, cannot, and should not be authoritative or authenticated. Clients can "prove" they sent a specific message (via signatures), but connecting identity to keys is intentionally beyond the scope of this protocol.

Public key is sent in the message so we have the ability to verify signatures. The keyhash in the wrapper message is this keys hash.

Example Message:

{
  message: {
    type: 'hello',
    nick: 'exampleUser',
    time: 1681658623017,
    content: {
      publicKey: '-----BEGIN PUBLIC KEY-----\n' +
        'MIICI...Q==\n' +
        '-----END PUBLIC KEY-----\n'
    }
  },
  sig: 'Uyk4....6OKzg=',
  keyHash: 'l9O4....6DVZq8CIg==',
  protocolVersion: 1
}

"subscribe" messages

Content: { publicKey: <Base64>, lastClientTime: <timestamp> }

response-content: { oldestMessageTime: <timestamp>, latestMessageTime: <timestamp> }

Response is meta informing the client about the oldest and latest message date ranges (useful for seeding history). If the server has no messages, return 0 for both timestamps (shouldn't be possible though, have to have at least a login message, or this subscribe message saved right?).

Server should add this host to list of listening clients and broadcast all post

Example Message:

{
  message: {
    type: 'subscribe',
    nick: 'exampleUser',
    time: 1681875608676,
    content: {
      publicKey: '-----BEGIN PUBLIC KEY-----\n' +
        'MIICIjANBgkqh....EAAQ==\n' +
        '-----END PUBLIC KEY-----\n',
      lastClientTime: 0
    }
  },
  sig: 'qwj/socjo/ahuCnk6....RD0TbeY=',
  keyHash: 'l9O4o....q8CIg==',
  protocolVersion: 1
}

"users" messages

Content: { since: <timestamp> }

response-content: { userList: [{ publicKey: <Base64>, nick: <string>, joinDate: <timestamp>, status: "{ online, offline }" }]}

Since is optional, supports a disconnect event and wanting to 'catch up' vs downloading all names. If since is present filters users that join-date is after given date.

  • (nb: optional verses explicit null value? I have a hunch that optional might cause more problems.)

"post" messages

Content: {postContent: <str>}

response-content:{}

post-content is a string w/ html. Media encoded using data URI scheme. Servers may impose a byte limit on post messages or media within post messages and reject a post message with the reason "format" if this is violated.

Example Message:

{
  message: {
    type: 'post',
    nick: 'chatUser',
    time: 1681245291330,
    content: {
      postContent: 'Hello!\n',
    },
  },
  sig: 'abc...',
  keyHash: 'l9O4o....DVZq8CIg==',
  protocolVersion: 1
}

"history" messages

Content: {start:<timestamp>, end:<timestamp>}

response-content: {msgList: [<message1>, <message2>,...,<messageN>]}

Should order not matter in msg-list? No, order should represent the order in which the messages were received by the server. Sending clients are free to report whatever timestamp they wish on their messages, but it's up to the receiving client whether to believe this or not, and how to order these messages when displaying to the user. Servers may reject messages whose timestamp is significantly out of sync with the reason "format". Servers may not include all messages received between the given timestamps, or only respect timestamps within given ranges for efficiency or anti-DoS reasons. Whether the server returns messages of all types or only a subset is left up to the server. The maximum length of msg-list array is not currently defined. A client requesting all messages by convention should use 0 as the start timestamp and the current clock as the end timestamp.

Ensuring history completeness is done by repeat calling of history using a new start time of the last message received, and the same end time. When you receive an empty array back you have pulled all the messages the server holds.

"reply" messages

Content: { postTime: <timestamp>, origSig: <Base64>, replyContent: <str> }

response-content: {}

For threaded replies. OrigSig for disambiguating messages with identical timestamps (but timestamps are probably easier to search for, even unsorted). Note that the server does not actually do anything special to handle reply messages. It is up to the receiving client to determine how these are displayed (in-line, reference, etc.)

"disconnect" messages

Content: {}

response-content: {}

Unsubscribe listener, close socket. Reject (and don’t unsub/close) if sig does not match.

Possible messages

The following are candidates for message types that haven't been tested or completely approved yet, either because there is disagreement about whether all servers should support them, or because the complexity to the server or potential security issues haven't been fully worked out yet. It is likely these will be included in the future, perhaps in a modified form.

"edit" messages

content: {postTime: <timestamp>, origSig: <Base64>, postContent: <str>}

response-content: {}

Similar to reply, but server possibly rejects if pub-key associated with original message does not verify signature of the post-to-be-edited. Also, potentially does not verify, and leaves verification up to clients. Unclear how difficult this verification would be, and what the security issues are if clients also perform verification.

Server-to-server messages

As mentioned elsewhere, one goal of this project is to encourage peer servers and redundancy (and discourage single points of failure). To that end, several messages types are being considered for server-to-server connections. The security and configuration of peer servers is mostly beyond the scope of this protocol, but it should follow the general guidelines and approaches already described: namely, low-trust and ease of setup.

Client forwarded messages

If two servers are peers, they should share messages received from their individually connected clients to eachother transparently. This allows for clients to trust specific hosts.

Format: {type: ”client-forward”, clientMsg: <client message>, time: <timestamp>, sig: <Base64>}

response-content: {serverMsg: <response-message>}

It's unclear if the 1-to-1 relationship between client messages and server responses, and single input/output stream makes this an efficient approach, which is why this isn't part of the current protocol

Server-to-server history

The primary difference between a server-to-server connection and client-server is that servers are expected to transparently keep synchronized with each other, so there needs to be some mechanism for sharing a complete history of all messages received. Full history messages are intended for that purpose, and so are slightly different than client history messages because they should not be filtered by type or timestamp.

Format: { type: ”history-full”, time: <timestamp>, sig:<Base64> }

response-content: {msgList: [<message1>, <message2>,...,<messageN>]}

It is also possible that rather than encapsulating the entire message history as a single object, a more fragmented approach would be more robust, efficient, or secure. The specifics of server-to-server connections need to be worked out before this can be determined though.

Thoughts about the protocol

Desired properties

  • Should be encrypted, as long as we can do that easily
  • Users are identified by public key
  • The server shouldn't "own" the data

Consequences of current protocol (possibly NOT desired)

  • Protocol is streaming (not RESTful)
  • Server is a single topic/group

Keep talking about

  • Post contents - The Text Portions
    • Option #1: Markdown format (github format?)
      • Pro: Many built text editors for it
      • Con: less popular
    • Option #2: Straight Unicode (what twitter does)
      • Pro: Fair few libs to deal with twitters choice (like how to bold text using unicode only)
      • Con: Fugly as heck
    • Option #3: HTML Subset (attempted removal of scripts)
      • Pro: Supported by all the things
      • Con: Vulnerable to so much, wicked hard to remove all scripts from code blocks
  • Post Length:
    • 4k?
  • Media Handling (Pics/Vids)
    • To embed in the post
      • Pro: Atomic posts
      • Con: massive message sizes, no choice for client to dl media or skip it
    • Separate media message
      • Pro: Clients (e.g. cli's, or bandwidth sensitive ones) can pull media out of band/async/parallel from text
      • Con: Non-atomic post if post is expected to always have everything, must pull media to really have it all. Must workout how to embed link back into post message
  • key negotiation for signatures
    • An api that expresses what appropriate algs are ok for the public keys to be generated from
    • How exactly does message signing work, is it meant for the servers and clients to be able to verify the message (encrypted hash of the message)?? Or is it just a smaller identifier of the client (hash of the key?). Hash of key does not give any guarantee!
    • Maybe move message signing to the wrapper message? Makes cleaner code to not have to deal with extra fields in objects to omit before hashing.
  • alg rotation for historical support
  • Adding a server timestamp to the wrapper
    • Client clicks send, no internet, but message was created/hashed/signed/etc. Message sends an hour later. Do we care?
  • Server response
    • Doesn't need the response type afaict since its static
    • Does need the message sig or something since this isn't a synchronous semantic protocol included to know what was acknowledged