Skip to content

Commit

Permalink
Add JSON-defined input format
Browse files Browse the repository at this point in the history
Allows creating input formats using JSON files rather than Java code.
  • Loading branch information
apmasell authored and avarsava committed Feb 12, 2024
1 parent 79373f6 commit 274c407
Show file tree
Hide file tree
Showing 14 changed files with 999 additions and 619 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ the provenance data.
- [Shesmu Glossary](glossary.md)
- [Shesmu FAQ](faq.md)
- [Ask your doctor if Shesmu is right for you](ask-your-doctor.md)
- [JSON-Defined Input Formats](json-defined-input-formats.md)
- [Plugin Implementation Guide](implementation.md)
- [Compiler Hacking](compiler-hacking.md)

Expand Down
1 change: 1 addition & 0 deletions changes/add_json_input_defs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* Allow defining input formats using JSON files
87 changes: 87 additions & 0 deletions json-defined-input-formats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# JSON-Defined Input Formats
Shesmu olives can access user-defined input formats. Normally, data is provided from Java plugins,
so the input format itself is also defined in Java. If that is the case,
consult [Plugin Implementation Guide](implementation.md) to see how to do that. If a plugin is not
required, it is possible to define an input format without Java. This format can consume input from
JSON, either using a local file or from a remote endpoint.

Unlike other Shesmu configuration files, JSON-defined input formats are read _only on startup_.
Adding a new input format will require a restart and Shesmu will scan the `SHESMU_DATA`
for `.shesmuschema` files. The format of this file is:

```
{
"timeFormat": "ISO8660_STRING",
"variables": {
"x": {
"gangs": [
{
"dropIfDefault": false,
"gang": "hello",
"order": 0
}
],
"signable": true,
"type": "i"
},
"y": {
"gangs": [
{
"dropIfDefault": false,
"gang": "hello",
"order": 1
}
],
"signable": true,
"type": "s"
},
"z": {
"gangs": [],
"signable": false,
"type": "as"
}
}
}
```

First, a `"timeFormat"` must be specified to indicate how dates will be encoded, even if no dates
are used in the format. The supported formats are:

- `MILLIS_NUMERIC` - number of milliseconds since the UNIX epoch
- `SECONDS_NUMERIC` - number of seconds since the UNIX epoch; since Shesmu allows milliseconds,
precision is lost
- `ISO8660_STRING` - store the date as an ISO-8660 string with milliseconds

The `"variables"` property lists every variable available to the olive. Each one must specify:

- `"type"`: the type for this column using the standard Shesmu type descriptor. See [types in the
language description](language.md#types).
- `"signable"`: a Boolean value indicating whether this field should be included in signatures or
not
- `"gangs"`: is a list of gangs this variable belongs to. Each one specifies:
- `"gang"`: the name of the gang
- `"order"`: the position in the gang
- `"dropIfDefault"`: a Boolean value indicating that if this variable is its _default_ value (0,
empty string, epoch), then it should be omitted from the gang

Note that gang ordering is taken as relative; that is, for a gang with `x` and `y`,
specifying `"order"` to be 1 and 2, respectively, is the same as 0 and 1.

Once a _name_`.shesmuschema` file is found on startup, it will be possible to create file names
with `.`_name_`-input` that contains a JSON representation of the input format or `.`_name_`-remote`
containing a JSON object with two attributes `url` indicating where to download the JSON
representation and `ttl` indicating the number of minutes to cache the input. Additionally, once a
Shesmu server is active, it will provide the input in the JSON format at `/input/` followed by the
input format name.

The `.shesmuschema` are read exactly once during startup and will never be read
again nor will additional schema files be scanned. If a schema changes or a
new schema is introduced, the server must be restarted. The reason for this is
that the olive compiler bakes a bunch of input format data into global state,
so if it were to change, olives using the schema would break in unpredictable
ways and olives coupled to other olives through `Export Define` would have
consistency problems. Single loading was fine with the Java input formats,
because there isn't a way to add a new class after start up. Unfortunately,
JSON schemas have to be treated the same way.


113 changes: 93 additions & 20 deletions shesmu-server/src/main/java/ca/on/oicr/gsi/shesmu/Server.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,20 @@

import ca.on.oicr.gsi.Pair;
import ca.on.oicr.gsi.prometheus.LatencyHistogram;
import ca.on.oicr.gsi.shesmu.compiler.*;
import ca.on.oicr.gsi.shesmu.compiler.CallableDefinition;
import ca.on.oicr.gsi.shesmu.compiler.CallableDefinitionRenderer;
import ca.on.oicr.gsi.shesmu.compiler.Compiler;
import ca.on.oicr.gsi.shesmu.compiler.OliveClauseNodeGroupWithGrouper;
import ca.on.oicr.gsi.shesmu.compiler.RefillerDefinition;
import ca.on.oicr.gsi.shesmu.compiler.Target;
import ca.on.oicr.gsi.shesmu.compiler.Target.Flavour;
import ca.on.oicr.gsi.shesmu.compiler.definitions.*;
import ca.on.oicr.gsi.shesmu.compiler.definitions.ActionDefinition;
import ca.on.oicr.gsi.shesmu.compiler.definitions.ConstantDefinition;
import ca.on.oicr.gsi.shesmu.compiler.definitions.ConstantDefinition.ConstantLoader;
import ca.on.oicr.gsi.shesmu.compiler.definitions.DefinitionRepository;
import ca.on.oicr.gsi.shesmu.compiler.definitions.DefinitionRepository.CallableOliveDefinition;
import ca.on.oicr.gsi.shesmu.compiler.definitions.FunctionDefinition;
import ca.on.oicr.gsi.shesmu.compiler.definitions.InputFormatDefinition;
import ca.on.oicr.gsi.shesmu.compiler.description.FileTable;
import ca.on.oicr.gsi.shesmu.compiler.description.OliveTable;
import ca.on.oicr.gsi.shesmu.compiler.description.Produces;
Expand All @@ -24,7 +32,11 @@
import ca.on.oicr.gsi.shesmu.plugin.dumper.Dumper;
import ca.on.oicr.gsi.shesmu.plugin.files.AutoUpdatingDirectory;
import ca.on.oicr.gsi.shesmu.plugin.files.FileWatcher;
import ca.on.oicr.gsi.shesmu.plugin.filter.*;
import ca.on.oicr.gsi.shesmu.plugin.filter.ActionFilter;
import ca.on.oicr.gsi.shesmu.plugin.filter.ActionFilterBuilder;
import ca.on.oicr.gsi.shesmu.plugin.filter.AlertFilter;
import ca.on.oicr.gsi.shesmu.plugin.filter.ExportSearch;
import ca.on.oicr.gsi.shesmu.plugin.filter.SourceOliveLocation;
import ca.on.oicr.gsi.shesmu.plugin.grouper.GrouperDefinition;
import ca.on.oicr.gsi.shesmu.plugin.json.PackJsonObject;
import ca.on.oicr.gsi.shesmu.plugin.types.Imyhat;
Expand All @@ -33,14 +45,42 @@
import ca.on.oicr.gsi.shesmu.runtime.CompiledGenerator;
import ca.on.oicr.gsi.shesmu.runtime.OliveServices;
import ca.on.oicr.gsi.shesmu.runtime.RuntimeSupport;
import ca.on.oicr.gsi.shesmu.server.*;
import ca.on.oicr.gsi.shesmu.server.ActionProcessor;
import ca.on.oicr.gsi.shesmu.server.ActionProcessor.Filter;
import ca.on.oicr.gsi.shesmu.server.BaseHotloadingCompiler;
import ca.on.oicr.gsi.shesmu.server.CommandRequest;
import ca.on.oicr.gsi.shesmu.server.FunctionRequest;
import ca.on.oicr.gsi.shesmu.server.FunctionRunner;
import ca.on.oicr.gsi.shesmu.server.FunctionRunnerCompiler;
import ca.on.oicr.gsi.shesmu.server.GuidedMeditation;
import ca.on.oicr.gsi.shesmu.server.InputSource;
import ca.on.oicr.gsi.shesmu.server.MasterRunner;
import ca.on.oicr.gsi.shesmu.server.MeditationCompilationRequest;
import ca.on.oicr.gsi.shesmu.server.MetroDiagram;
import ca.on.oicr.gsi.shesmu.server.Query;
import ca.on.oicr.gsi.shesmu.server.SavedSearch;
import ca.on.oicr.gsi.shesmu.server.ShesmuThreadFactory;
import ca.on.oicr.gsi.shesmu.server.SimulateExistingRequest;
import ca.on.oicr.gsi.shesmu.server.SimulateRequest;
import ca.on.oicr.gsi.shesmu.server.StaticActions;
import ca.on.oicr.gsi.shesmu.server.TypeParseRequest;
import ca.on.oicr.gsi.shesmu.server.TypeParseResponse;
import ca.on.oicr.gsi.shesmu.server.plugins.AnnotatedInputFormatDefinition;
import ca.on.oicr.gsi.shesmu.server.plugins.BaseHumanTypeParser;
import ca.on.oicr.gsi.shesmu.server.plugins.BaseInputFormatDefinition;
import ca.on.oicr.gsi.shesmu.server.plugins.JarHashRepository;
import ca.on.oicr.gsi.shesmu.server.plugins.JsonInputFormatDefinition;
import ca.on.oicr.gsi.shesmu.server.plugins.PluginManager;
import ca.on.oicr.gsi.shesmu.util.NameLoader;
import ca.on.oicr.gsi.status.*;
import ca.on.oicr.gsi.status.BasePage;
import ca.on.oicr.gsi.status.ConfigurationSection;
import ca.on.oicr.gsi.status.Header;
import ca.on.oicr.gsi.status.NavigationMenu;
import ca.on.oicr.gsi.status.SectionRenderer;
import ca.on.oicr.gsi.status.ServerConfig;
import ca.on.oicr.gsi.status.StatusPage;
import ca.on.oicr.gsi.status.TablePage;
import ca.on.oicr.gsi.status.TableRowWriter;
import com.fasterxml.jackson.core.JsonEncoding;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
Expand All @@ -56,9 +96,17 @@
import io.prometheus.client.Gauge;
import io.prometheus.client.exporter.common.TextFormat;
import io.prometheus.client.hotspot.DefaultExports;
import java.io.*;
import java.io.IOException;
import java.io.PrintStream;
import java.io.PrintWriter;
import java.io.Writer;
import java.lang.management.ManagementFactory;
import java.net.*;
import java.net.InetAddress;
import java.net.InetSocketAddress;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URLDecoder;
import java.net.UnknownHostException;
import java.net.http.HttpClient;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
Expand All @@ -68,10 +116,30 @@
import java.text.SimpleDateFormat;
import java.time.Duration;
import java.time.Instant;
import java.util.*;
import java.util.Base64;
import java.util.concurrent.*;
import java.util.function.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.Properties;
import java.util.Scanner;
import java.util.ServiceLoader;
import java.util.Set;
import java.util.TreeMap;
import java.util.TreeSet;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentSkipListMap;
import java.util.concurrent.Executor;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ScheduledThreadPoolExecutor;
import java.util.concurrent.Semaphore;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import java.util.function.BiFunction;
import java.util.function.Function;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.Stream;
Expand Down Expand Up @@ -284,7 +352,9 @@ public Server(int port, FileWatcher fileWatcher) throws IOException, ParseExcept
final InputSource inputSource =
(format, readStale) ->
ErrorableStream.concatWithErrors(
AnnotatedInputFormatDefinition.formats(), Stream.of(pluginManager, processor))
AnnotatedInputFormatDefinition.formats(),
ErrorableStream.concatWithErrors(
JsonInputFormatDefinition.formats(), Stream.of(pluginManager, processor)))
.flatMap(source -> source.fetch(format, readStale));
master =
new MasterRunner(
Expand Down Expand Up @@ -485,7 +555,7 @@ public Imyhat typeForName(String name) {
}
});

AnnotatedInputFormatDefinition.formats()
Stream.concat(AnnotatedInputFormatDefinition.formats(), JsonInputFormatDefinition.formats())
.sorted(Comparator.comparing(InputFormatDefinition::name))
.forEach(
format ->
Expand Down Expand Up @@ -552,7 +622,9 @@ public Stream<ConfigurationSection> sections() {
staticActions.listConfiguration(),
guidedMeditations.stream().map(GuidedMeditation::configuration),
AnnotatedInputFormatDefinition.formats()
.flatMap(AnnotatedInputFormatDefinition::configuration))
.flatMap(AnnotatedInputFormatDefinition::configuration),
JsonInputFormatDefinition.formats()
.flatMap(JsonInputFormatDefinition::configuration))
.flatMap(Function.identity());
}
}.renderPage(os);
Expand Down Expand Up @@ -905,7 +977,9 @@ public String activeUrl() {

@Override
protected void renderContent(XMLStreamWriter writer) {
AnnotatedInputFormatDefinition.formats()
Stream.concat(
AnnotatedInputFormatDefinition.formats(),
JsonInputFormatDefinition.formats())
.sorted(Comparator.comparing(InputFormatDefinition::name))
.forEach(
format -> {
Expand Down Expand Up @@ -1185,6 +1259,7 @@ protected void writeRows(TableRowWriter row) {
pluginManager.dumpPluginConfig(row);
AnnotatedInputFormatDefinition.formats()
.forEach(format -> format.dumpPluginConfig(row));
JsonInputFormatDefinition.formats().forEach(format -> format.dumpPluginConfig(row));
}
}.renderPage(os);
}
Expand Down Expand Up @@ -1935,7 +2010,8 @@ protected void renderContent(XMLStreamWriter writer) throws XMLStreamException {
"/variables",
(mapper, query) -> {
final var node = mapper.createObjectNode();
AnnotatedInputFormatDefinition.formats()
Stream.concat(
AnnotatedInputFormatDefinition.formats(), JsonInputFormatDefinition.formats())
.forEach(
source -> {
final var sourceNode = node.putObject(source.name());
Expand Down Expand Up @@ -1966,7 +2042,7 @@ protected void renderContent(XMLStreamWriter writer) throws XMLStreamException {
return node;
});

AnnotatedInputFormatDefinition.formats()
Stream.concat(AnnotatedInputFormatDefinition.formats(), JsonInputFormatDefinition.formats())
.forEach(
format -> {
add(
Expand Down Expand Up @@ -2635,10 +2711,7 @@ public void constantDefsJson(ArrayNode array) {
}

public void downloadInputData(
HttpExchange t,
InputSource inputSource,
AnnotatedInputFormatDefinition format,
boolean readStale)
HttpExchange t, InputSource inputSource, BaseInputFormatDefinition format, boolean readStale)
throws IOException {

if (!readStale
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import ca.on.oicr.gsi.shesmu.server.ShesmuThreadFactory;
import ca.on.oicr.gsi.shesmu.server.plugins.AnnotatedInputFormatDefinition;
import ca.on.oicr.gsi.shesmu.server.plugins.CallSiteRegistry;
import ca.on.oicr.gsi.shesmu.server.plugins.JsonInputFormatDefinition;
import ca.on.oicr.gsi.shesmu.util.NameLoader;
import ca.on.oicr.gsi.status.ConfigurationSection;
import ca.on.oicr.gsi.status.SectionRenderer;
Expand Down Expand Up @@ -776,7 +777,10 @@ public void function(
.register();
private static final CallSiteRegistry<String> SCRIPT_REGISTRY = new CallSiteRegistry<>();
public static final NameLoader<InputFormatDefinition> SOURCES =
new NameLoader<>(AnnotatedInputFormatDefinition.formats(), InputFormatDefinition::name);
new NameLoader<>(
Stream.concat(
AnnotatedInputFormatDefinition.formats(), JsonInputFormatDefinition.formats()),
InputFormatDefinition::name);
private static final Gauge compileTime =
Gauge.build(
"shesmu_source_compile_time",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import ca.on.oicr.gsi.shesmu.plugin.json.PackJsonObject;
import ca.on.oicr.gsi.shesmu.plugin.json.UnpackJson;
import ca.on.oicr.gsi.shesmu.plugin.types.Imyhat;
import ca.on.oicr.gsi.shesmu.server.plugins.AnnotatedInputFormatDefinition;
import ca.on.oicr.gsi.shesmu.server.plugins.BaseInputFormatDefinition;
import ca.on.oicr.gsi.shesmu.server.plugins.InvokeDynamicActionParameterDescriptor;
import ca.on.oicr.gsi.shesmu.server.plugins.InvokeDynamicRefillerParameterDescriptor;
import ca.on.oicr.gsi.shesmu.server.plugins.PluginManager;
Expand All @@ -33,11 +33,32 @@
import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.time.*;
import java.time.Duration;
import java.time.Instant;
import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.LocalTime;
import java.time.ZoneId;
import java.time.ZoneOffset;
import java.time.ZonedDateTime;
import java.time.format.DateTimeFormatter;
import java.util.*;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.function.*;
import java.util.Optional;
import java.util.Set;
import java.util.TreeSet;
import java.util.function.BiConsumer;
import java.util.function.BiFunction;
import java.util.function.BiPredicate;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.function.ToIntFunction;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.Stream;
Expand Down Expand Up @@ -216,8 +237,7 @@ public static JsonNode getJson(JsonNode node, String name) {
public static CallSite inputBootstrap(
Lookup lookup, String variableName, MethodType methodType, String inputFormatName) {
// This is redirects to the input format manager; it's here to limit our export interface
return AnnotatedInputFormatDefinition.bootstrap(
lookup, variableName, methodType, inputFormatName);
return BaseInputFormatDefinition.bootstrap(lookup, variableName, methodType, inputFormatName);
}

@RuntimeInterop
Expand Down Expand Up @@ -578,8 +598,8 @@ public static CallSite refillerParameterBootstrap(
}

/**
* This is a boot-strap method for <code>INVOKE DYNAMIC</code> to match a regular expression
* (which is the method name).
* This is a boot-strap method for <tt>INVOKE DYNAMIC</tt> to match a regular expression (which is
* the method name).
*/
@RuntimeInterop
public static CallSite regexBootstrap(
Expand Down
Loading

0 comments on commit 274c407

Please sign in to comment.