Skip to content

emc-mongoose/mongoose-storage-driver-pulsar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gitter chat Issue Tracker CI status Maven metadata URL Docker Pulls

Apache Pulsar extension for Mongoose

Content

  1. Introduction
  2. Features
  3. Deployment
      3.1. Basic
      3.2. Docker
        3.2.1. Standalone
        3.2.2. Distributed
          3.2.2.1. Additional Node
          3.2.2.2. Entry Node
  4. Configuration
      4.1. Specific Options
      4.2. Tuning
  5. Usage
      5.1. Message Operations
        5.1.1. Create
        5.1.2. Read
          5.1.2.1. Basic
          5.1.2.2. Tail
          5.1.2.3. End-to-end Latency
      5.2. Topic Operations
        5.2.1. Create
        5.2.2. Read
        5.2.3. Update
        5.2.4. Delete
  6. Open Issues
  7. Development
      7.1. Build
      7.2. Test
        7.2.1. Automated
          7.2.1.1. Unit
          7.2.1.2. Integration
          7.2.1.3. Functional
        7.2.2. Manual

1. Introduction

Pulsar Mongoose
Message Data Item
Topic Item Path or Data Item

2. Features

3. Deployment

3.1. Basic

Java 11+ is required to build/run.

  1. Get the latest mongoose-base jar from the maven repo and put it to your working directory. Note the particular version, which is referred as BASE_VERSION below.

  2. Get the latest mongoose-storage-driver-coop jar from the maven repo and put it to the ~/.mongoose/<BASE_VERSION>/ext directory.

  3. Get the latest mongoose-storage-driver-pulsar jar from the maven repo and put it to the ~/.mongoose/<BASE_VERSION>/ext directory.

java -jar mongoose-base-<BASE_VERSION>.jar \
    --storage-driver-type=pulsar \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --storage-net-node-port=6650 \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    ...

3.2. Docker

3.2.1. Standalone

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    ...

3.2.2. Distributed

3.2.2.1. Additional Node

docker run \
    --network host \
    --expose 1099 \
    emcmongoose/mongoose-storage-driver-pulsar \
    --run-node

3.2.2.2. Entry Node

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --load-step-node-addrs=<ADDR1,ADDR2,...> \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    ...

4. Configuration

Reference

4.1. Specific Options

Name Type Default Value Description
storage-driver-create-batch-enabled boolean true Producer batching
storage-driver-create-batch-delayMicros long integer 1000 Maximum publish latency (microseconds)
storage-driver-create-compression enum none Should compress data upon messages create or not (default). The available compression types are defined by the Pulsar client
storage-driver-create-timestamp boolean false If enabled, will record the message creation timestamp into the message's metadata. Required for end-to-end latency measurement
storage-driver-read-tail boolean false Should read the latest message of the topic or read from the topic beginning (default). Should be true for end-to-end latency measurement
storage-net-tcpNoDelay boolean true The option specifies whether the server disables the delay of sending successive small packets on the network.
storage-net-timeoutMilliSec integer 0 Connection timeout. 0 means no timeout
storage-net-node-addrs list of strings 127.0.0.1 The list of the storage node IPs or hostnames to use for HTTP load. May include port numbers.
storage-net-node-port integer 6650 The common port number to access the storage nodes, may be overriden adding the port number to the storage-driver-addrs, for example: "127.0.0.1:6650,127.0.0.1:6651,..."

4.2. Tuning

  • load-batch-size Determines how many operations the driver will try to submit at once. Large values may increase both the throughput and the memory consumption.

  • storage-driver-concurrency-limit Determines how many operations may be in flight at every moment of the time. Specifying 0 will cause the "burst mode", when the driver submits as many operations as possible regardless their completion. The unlimited concurrency is useful for the sustained rate measurement but may cause either Pulsar or Mongoose inconsistent state.

  • storage-driver-threads Determines the Pulsar client IO worker threads count.

5. Usage

5.1. Message Operations

5.1.1. Create

Example, write 1KB messages to the topic "topic1" in the Pulsar instance w/ address 12.34.56.78:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=12.34.56.78 \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    --item-data-size=1KB \
    --item-output-path=topic1

5.1.2. Read

Notes:

  1. Generally, load-op-recycle option should be set to true to make the messages reading working.
  2. Mongoose couldn't determine the end of the topic(s), so it's mandatory to specify the count/time limit.

5.1.2.1. Basic

Example, read 1M messages from the beginning of the topic "topic1":

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=12.34.56.78 \
    --load-batch-size=100 \
    --storage-driver-limit-concurrency=100 \
    --read \ 
    --item-input-path=topic1 \
    --load-op-recycle \
    --load-op-limit-count=1000000

5.1.2.2. Tail

Example, read all new messages from the topic "topic1" during the 1 minute:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=12.34.56.78 \
    --load-batch-size=100 \
    --storage-driver-limit-concurrency=100 \
    --read \ 
    --item-input-path=topic1 \
    --storage-driver-read-tail \
    --load-op-recycle \
    --load-step-limit-time=1m

5.1.2.3. End-To-End Latency

  1. Start writing the messages to some topic with enabled timestamps recording. Example command:
docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --item-data-size=1KB \
    --item-output-path=topic1 \
    --load-batch-size=1000 \
    --storage-driver-create-timestamp
  1. Start the tail read from the same topic:
docker run \
    --network host \
    --volume $PWD/log:/root/.mongoose/4.2.16/log
    emcmongoose/mongoose-storage-driver-pulsar \
    --load-batch-size=1000 \
    --read \
    --item-input-path=topic1 \
    --storage-driver-read-tail \
    --load-op-recycle \
    --load-step-id=e2e_test
  1. Check the end-to-end time data in the log/e2e_test/op.trace.csv log file. The data is in the CSV format with 3 columns:
  • internal message id
  • message payload size
  • end-to-end time in milliseconds

Note: the end-to-end time data will not be aggregated in the distributed mode.

5.2. Topic Operations

TODO: https://mongoose-issues.atlassian.net/browse/PULSAR-2

5.2.1. Create

TODO

5.2.2. Read

TODO

5.2.3. Update

TODO

5.2.4. Delete

TODO

6. Open Issues

Please refer to the issue tracker

7. Development

7.1. Build

./gradlew clean jar

7.2. Test

7.2.1. Automated

7.2.1.1. Unit

./gradlew clean test

7.2.1.2. Integration

docker run -it \
    -p 6650:6650 \
    -p 8080:8080 \
    apachepulsar/pulsar:<PULSAR_VERSION> \
    bin/pulsar standalone
./gradlew integrationTest

7.2.1.3. Functional

./gradlew jar
export SUITE=api.storage
TEST=<TODO_TEST_NAME> ./gradlew robotest

7.2.2. Manual

  1. Build the storage driver
  2. Copy the storage driver's jar file into the mongoose's ext directory:
cp -f build/libs/mongoose-storage-driver-pulsar-*.jar ~/.mongoose/<MONGOOSE_BASE_VERSION>/ext/

Note that the Pulsar storage driver depends on the Coop Storage Driver extension so it should be also put into the ext directory 3. Install and run the Apache Pulsar

docker run -it \
    -p 6650:6650 \
    -p 8080:8080 \
    apachepulsar/pulsar:<PULSAR_VERSION> \
    bin/pulsar standalone
  1. Run Mongoose's default scenario with some specific command-line arguments:
java -jar mongoose-<MONGOOSE_BASE_VERSION>.jar \
    --storage-driver-type=pulsar \
    --storage-net-node-port=6650 \
    --storage-driver-limit-concurrency=10 \
    --item-output-path=topic-0