Skip to content

Commit

Permalink
draft 3
Browse files Browse the repository at this point in the history
  • Loading branch information
pothiers committed Jan 24, 2025
1 parent 2d923ef commit d482d41
Show file tree
Hide file tree
Showing 2 changed files with 304 additions and 10 deletions.
17 changes: 11 additions & 6 deletions docs/nightly-digest-fresh-start.org
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
This file was last modified:
Time-stamp: <2025-01-24 09:55:12 pothiers>
Time-stamp: <2025-01-24 10:55:11 pothiers>
# TODO!!!
# After the following looks good under GitHub, move it to the logrep repo
# https://github.com/pothiers/notes/blob/master/noirlab/rubin/unified-time-log.org

# MORE: Non times square means maybe accessible on Summit. So
# observers could benefit from "Observers View"
#
# Time Lost?

* Overview
In Aug 2024 we began working on what is now called the Nightly Digest.
Expand All @@ -15,10 +23,6 @@ It is now (<2025-01-23 Thu>) and time to take stock of what we have
learned, where we want to go, and how we might get there. This
document is an attempt to do that.

# TODO!!!
# After the following looks good under GitHub, move it to the logrep repo
# https://github.com/pothiers/notes/blob/master/noirlab/rubin/unified-time-log.org

* About this document
This was written by Steve Pothier. All errors and opinions are
mine. While I've attempted to incorporate ideas from stakeholders
Expand Down Expand Up @@ -170,8 +174,9 @@ created our own.


*** Anti Requirements (we explicitly REJECT having to do these)
- Support print of report (print of web page possible but may give
- Do not Support print of report (print of web page possible but may give
poor results)
- Do Not support real time diagnosing of "what went wrong"



Expand Down
297 changes: 293 additions & 4 deletions docs/unified-time-log.org
Original file line number Diff line number Diff line change
@@ -1,4 +1,293 @@
# This is a placeholder.
#
# Soon ~/sandbox/notes/noirlab/rubin/unified-time-log.org will be
# moved here,
* COMMENT PRESCRIPT
\setlength{\parindent}{0em}
\parskip 7.2pt
* About this Document
# *DRAFT: This will probably ALWAYS be a DRAFT!*
Time-stamp: <2025-01-24 12:29:08 pothiers>

This white-paper describes a partially implemented vision/design
created [2024-11-28 Thu]--[2024-12-02 Mon]. I wrote it after I found
myself excitedly yammering incoherently to a few people about the
concepts. These ideas can be grouped under what I have come to call a
*Single Unified Time Log* /(SUTL="subtle")/. SUTL is currently being
used (in DRAFT form) in the /Night Summary/ being created for /TSSW Logging and
Reporting//

* Abstract
Create a *Single Unified Time Log* using a sequence of steps that
turns a list of records from each data source into a combined report.
The steps are formalized and generalized into: /Retrieve, Merge, Compact, Reduce,
Render/. The /Rendering/ step partitions the combined data from a
single Data-Frame into a structure that is rendered into HTML through a
jinja template. The use of templates admits the possibility of
generating alternate reports for different use-cases.

* Executive Summary
The /Night Summary/ being developed under Logging & Reporting is
intended to provide *upper and middle management* with a *single unified
report* they can read to "/find out what happened last night/".
/Night Summary/ is intended to be used during Commissioning but should be usable during
Operations with minimal code changes. Everything is evolving. Its
impossible to predict what will be important in the future.

This application is expected to be maintained for years but maintenance cost
must be low. The current Night Summary is hosted as a Notebook on Times Square. Our
intent is to move away from Notebooks to a platform that is
more amenable to Software Engineering best practices (e.g. regression
tests).

#+Begin_Latex
\pagebreak
\tableofcontents
#+End_Latex``


* Challenges
1. Every data *source is accessed in its own way*.
2. Some fields are tiny, some over 5,000 characters
3. Much content is entered manually. What was common before is
not common now. We *cannot predict what will be common next week*.
4. *We Need It Now.* No. "Now" is too late.
5. The data available depends on instrument (at least). This means data
retrieval that works with one instrument may not work with others.
/Instruments have personalities./
6. Sometimes we get requests like: +"Show me /everything/ that happened last night"+
That is *beyond our scope* and report would be too long to be useful anyhow.

* Essential Elements

+ *Retrieve:* Source Adapters are the data-facing components to read records from APIs and databases.
+ *Merge:* Combine all sources into one time based structure.
+ *Compact:* Remove redundant info to shrink the merged structure.
+ *Reduce:* Group and aggregate by time-period ("4 hours"). Aids
rendering clarity.
+ *Render:* Create a report from a processed, unified container of
source data.

** Source Adapters
Every data source that we use is wrapped into a subclass that lets
each of them be treated in (mostly) the same way. This allows code
that processes one source to be reused for others. Each adapter is
responsible for retrieving the data we need (or more). Almost always
the adapter reads its data from a web-service API.

** Retrieve, Merge, Compact, Reduce, Render
All data sources are *retrieved* for the application by /Source Adapters/.
They are *merged* into a /Single Unified Time Log/ data-frame according
to a timestamp associated with each source record. The data-frame is
*compacted* by removing unused rows and columns. The data-frame is *reduced* by
grouping by time-period and aggregating over the period. Finally, the
result is *rendered* into HTML via a view that splits the data-frame into
parts that are passed into an /HTML template/. Typically, splitting
partitions the columns into /dense/ (for table rendering) and /sparse/ (for list
rendering) portions.

* Details (TL;DR)
** Retrieve, Merge, Compact, Reduce, Render
+ Retrieve: :: Source Adapters read records from APIs and
databases. They isolate details of retrieving data from our main
task of creating a summary report.
+ Merge: :: lossless.
+ Compact: :: lossless (optional column density filter)
+ Reduce: :: Group and aggregate by time-period ("4 hours"). Gives up
time resolution.
+ Render: :: For each report, use analysis of data to be rendered to
determine which parts are informationaly dense and which are sparse
so they can be rendered differently.

** Sources
Sources without a timestamp per record cannot not currently be
processed unless a timestamp is artificially created.

** Merge
Source records are merged by timestamp into a single Data-Frame (DF).

** Compact
The DF is Compacted by removing columns and rows that are not
used.

Optionally, a "density threshold" can be
provided. When the ratio of Values/Rows for a column is below the
threshold, the column is removed. This is common for fields provided
by APIs but only sporadically used in the field. This a dynamic data
dependent filtering. A field might not be used for awhile (so, column
removed), then start being used (column kept).

** Reduce
The data frame is reduced by grouping by time-period (e.g. "2
hours") and aggregating the values over the period.

** Render
The naive approach to presenting data is in spreadsheet-like table
format. This works great for data that fits in a small cell but not
for wide data (such as text descriptions or lists of elements).

Our sources contain a wide diversity of data. Some fields are simple
scalar values, and might not be populated at all for many days.
Others fields are text fields that may be 5,000 characters or more
long. Its challenging to render data that is common and rare, short
and long, general and specific. We don't know what the data diversity
looks like since it may change radically from week to week.

After many unsuccessful attempts at rendering in this changing data
landscape, we realized that a static solution is doomed to failure.
Instead, we must adapt to the data diversity for every report. This
has lead partitioning data values for a night into a few "type buckets"
and rendering each bucket in a different way. For instance, we render
"common, short, scalers" into a table. But, we render "rare, short,
scalars" as item lists. (below the table, in the same period)

At various times, the target user has been seen as:
1. Upper Management: "What happened last night?"
This is our focus.

2. Operating Specialist: "What did we do a couple days ago? Is it
similar to our current problem? Same Solution?
# Ignore
This is beyond our scope. It might be possible to provide something
relatively easily (as a new page), but only if detailed content is
provided.

3. Engineers: "What broke? What are the details that will help us fix
it?" (diagnosing)
Lynne may be doing doing something useful in this area.

We cannot create single summary that will serve all potential
users. We cannot not predict who the users will be. Therefore, we must
be able to *generate different reports*. We don't want every report
to require a new application.

Solution: The back-end (Retrieve, Merge, Compact, Reduce) creates a
common data structure that can be used by all reports. A different
rendering is created for each user type.

* Assumptions
1. *The screen real estate available for Night Summary is limited.*
I use an iPad (2360 x 1640 pixels) to view it. If Night Summary is
not usable in that amount of space, I consider it a failure. Users
may have big screens but should not need them for the the Night
Summary.

2. *Not for diagnosing problems.*
Diagnostics need more interactivity and the ability to drill down
to fine-grained details. Neither is appropriate for a night
*summary* report.

3. We *cannot predict how the distribution of values will change* in the
data sources over the next weeks or months.

4. We will *not know who the real users are unless we see people using* the
app.

5. One report *cannot satisfy the diversity of all possible users*.
Different use-cases imply different reporting and different content.

6. /"Throw it against the all and see what sticks."/


* Future
** Beyond a Night Summary
The same technique used to summarize a night into periods could be used
to summarize a week into periods (such as nights). The differences
would be in:
- the data density threshold that determines what fields are removed
- the style of rendering

** Beyond Notebooks
We chose implementation via Notebooks so development (prototyping)
could be fast. Notebooks might have been a non-starter except that
Times Square allows them to be presented as a parameterized web page
to end users. Our hope was to factor out the back-end code from the
notebooks so that we could (somehow) later replace the notebook with a "real
GUI" that could offer greater interaction.

With the addition of the template based rendering of HTML, we have
markedly decrease the gap between what we have and what typical web
frameworks (such as Django) need. By storing the pre-rendered data in
a (small) database, we can collect multiple lower level data-frames to
summarized into higher level data-frames (nights to week, weeks to
month). From the stored data-frames, we can provide GUI applications
such as LOVE with web-service access to the pre-rendered data.
Through different HTML templates, we can serve customized reports to
various types of users (provided the data they need is already
somewhere in our sources).

* Cutting room floor (TL;DR) :noexport:

** Design elements
Merge sources by date-time column into a single wide and long
data-frame. The intent is to use this combined full_df for everything
else. A variant of the full_df would be the logical choice for a
small summary-oriented database held by the back-end and served to the
GUI.

The full_df is compacted, reduced, and rendered for use.

** cut
Insight into the Night Summary problem: Pure tables are not great for our
data because some fields are simple scalars, but some are lists or
large chunks of text. This creates uneven usage of white space when
rendering as a table (e.g. the text gets squeezed into a column so
that its too narrow making it take up more vertical spaced). But even
if we remove the text, there are some scalars that are rare. A column
that contains mostly nothing wastes horizontal space. I now have a
way to dynamically move fields from table to a list below the table
and the table just one row per period (eg. 4 hour block). It can
detect that a column is 95% empty, remove the column, and put the few
values in a list. It does this in a data dependent dynamic way using
a template system to generate the HTML.

* POSTSCRIPT :noexport:
/(this section here to keep Document Comments out of the way)/
source: /home/pothiers/orgfiles/designs.org

Something like this can be inserted into doc by invoking export dispatcher
and selected "insert template" (C-c C-e #).


#+TITLE: Night Summary
#+SUBTITLE: using a Single Unified Time Log (SUTL)
#+AUTHOR: Steve Pothier
#+EMAIL: [email protected]
#+DESCRIPTION: Personal design notes
#+KEYWORDS:
#+LANGUAGE: en
#+OPTIONS: H:3 num:1 toc:nil \n:nil @:t ::t |:t ^:nil -:t f:t *:t <:t
#+OPTIONS: TeX:t LaTeX:t skip:nil d:nil todo:t pri:nil tags:nil
#+INFOJS_OPT: view:nil toc:t ltoc:t mouse:underline buttons:0 path:http://orgmode.org/org-info.js
#+EXPORT_SELECT_TAGS: export
#+EXPORT_EXCLUDE_TAGS: noexport
#+LINK_UP:
#+LINK_HOME:
#+XSLT:

#+LATEX_HEADER: \setlength{\parindent}{0em}\parskip 7.2pt


+LATEX_HEADER: \usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
+CAPTION: CCML Model Creation Workflow


#! +LaTeX_HEADER: \usepackage{fancyhdr}
#! +LaTeX_HEADER: \pagestyle{fancy}
#! +LaTeX_HEADER: \fancyhf{}
#! +LaTeX_HEADER: \fancyhead[OC,EC]{DRAFT\\NOIRLab Proprietary}
#! +LaTeX_HEADER: \fancyfoot[OC,EC]{NOIRLab Proprietary\\DRAFT}
#! +LaTeX_HEADER: \fancyfoot[RO, LE] {\thepage}
#! +LaTeX_HEADER: \renewcommand{\headrulewidth}{0.4pt}
#! +LaTeX_HEADER: \renewcommand{\footrulewidth}{0.4pt}
#! #+LaTeX_HEADER: \usepackage{draftwatermark}
#! #+LaTeX_HEADER: \SetWatermarkText{DRAFT}
#! #+LaTeX_HEADER: \SetWatermarkScale{1.5}
#! #+LATEX_HEADER: \usepackage[margin=2.5cm]{geometry}
#!
#! +TEXT: This white paper includes data that shall not be disclosed outside of
#! +TEXT: NOIRLab or NSF and shall not be duplicated, used, or disclosed,
#! +TEXT: in whole or in part, for any purpose other than to evaluate this
#! +TEXT: white paper.

#+TEXT: \newpage
#+TEXT: [TABLE-OF-CONTENTS]
#+TEXT: \newpage
#+LaTeX_HEADER: \newpage
#+LATEX_HEADER: \usepackage[margin=0.5in]{geometry}

0 comments on commit d482d41

Please sign in to comment.