Skip to content

splatterxl/mrkv

Repository files navigation

mrkv

A simple, fast and lightweight (zero dependencies) Markov chain library for use in browser and backend JavaScript environments. It uses the latest conventions and the Promises API to speed up computing, and provides simple utilities to read from files for Node.js environments too.

Useful for creating silly texts from a pattern of sentences, e.g. exported from a chat platform!

Caveats

At datasets over 1 million sentences, large memory issues have been observed. The particular mechanism the library uses to predict words is based on lengthy arrays of words mapped to other words. There is not much optimisation for memory usage, however the library is the fastest according to benchmarks. If you're not worried about memory usage, this library is fine.

Usage

  • Loading an array

    const map = await loadArray(["i like hamburgers", "i like cats"]);
  • Loading a file (newline-seperated sentences)

    const map = await loadFile("data.txt");
  • Generating a sentence

    generateFromMap(map); // i like hamburgers or i like cats
  • Completing a sentence

    generateFromMap(map, {
      start: "i like",
    });
    
    // -> i like hamburgers or i like cats

Installation

From npm:

npm install mrkv
yarn add mrkv

Documentation

loadArray(sentences: string): Promise<Corpus>

Takes an array of sentences and returns a data map representing the consumed sentences' patterns. This data structure is designed to maximise speed over RAM usage, so in some cases (>500k sentences) it can be very resource intensive to generate and store the result of this function.

const map = await loadArray([
  "i like apples",
  "apples are my favourite fruit",
  "i like fruit but chocolate bars are better",
]);

console.log(generateFromMap(map));

More information about generateFromMap

loadFile(name: string): Promise<Corpus>

Node.js only

Reads a file, then uses loadArray to generate a data map and return the resulting value. The same caveat, although possibly worse, applies from loadArray, given that you have to first read a large file, store that result and then generate a data structure for the chain.

# data.txt
i like apples
apples are my favourite fruit
i like fruit but chocolate bars are better
const map = await loadFile("data.txt");

console.log(generateFromMap(map));

generateFromMap(corpus: Corpus, options?): string

Generate a string value from the data structure. This function has seen many underlying logic iterations and is now optimised for speed and RAM usage. Make sure to call this every time you want a value from the map instead of generating a new map every time!

const map = await loadArray([
  "i like apples",
  "apples are my favourite fruit",
  "i like fruit but chocolate bars are objectively better",
  "chocolate bars are delicious",
]);

console.log(generateFromMap(map));

// the result could be different, e.g. "apples are objectively better" or
// "chocolate bars are my favourite fruit", etc.

// the options control the sentence's start and maximum length
generateFromMap(map, {
  start: "i like",
  limit: 50,
});

generateFromArray(array: Array<string>, options?): Promise<string>

Generates a string value from the array of sentences, calling loadArray and then generateFromMap. If you need to generate a string more than once from a specific sentence array, do not use this method. Instead, follow the example explained in both of the linked functions.

console.log(
  await generateFromArray([
    "i like apples",
    "apples are my favourite fruit",
    "i like fruit but chocolate bars are objectively better",
    "chocolate bars are delicious",
  ])
);

generateFile(name: string, options?): Promise<string>

Node.js only

This method should only be used in exceptional circumstances or in scripting. This is a once-off function, if you need to reuse the file do not do it this way. Refer to generateFromMap.

Calls loadFile and then generateFromMap to simplify one-time operations.

# data.txt
i like apples
apples are delicious
i like fruit but chocolate bars are better
chocolate bars are unhealthy.
console.log(await generateFile("data.txt"));
// e.g. chocolate bars are delicious

Features

Feature mrkv kurwov markov-typescript markov-generator markov-strings markov-chains
Dependency-free ✔️ ✔️ ✔️
Typings ✔️ ✔️ ✔️
Generating sentences ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
Completing sentences ✔️ ✔️

Benchmarks

Benchmarks ran on Apple M2 chip, 16GB RAM.

Benchmark mrkv kurwov markov-typescript markov-generator markov-strings markov-chains
Generating a set from 10k sentences 30.051ms 50.53ms 419.66ms 346.16ms 1834.32ms Errored
Generating a set with 100k sentences 403.806ms 572.49ms 6221.28ms 28329.17ms Couldn't finish in over 10 minutes Errored

Buy me a coffee

ko-fi