draft 3

lsst-ts · Jan 24, 2025 · d482d41 · d482d41
1 parent 2d923ef
commit d482d41
Show file tree

Hide file tree

Showing 2 changed files with 304 additions and 10 deletions.
diff --git a/docs/nightly-digest-fresh-start.org b/docs/nightly-digest-fresh-start.org
@@ -1,5 +1,13 @@
 This file was last modified:
-Time-stamp: <2025-01-24 09:55:12 pothiers>
+Time-stamp: <2025-01-24 10:55:11 pothiers>
+# TODO!!!
+# After the following looks good under GitHub, move it to the logrep repo
+# https://github.com/pothiers/notes/blob/master/noirlab/rubin/unified-time-log.org
+
+# MORE: Non times square means maybe accessible on Summit. So
+# observers could benefit from "Observers View"
+#
+# Time Lost?
 
 * Overview
 In Aug 2024 we began working on what is now called the Nightly Digest.
@@ -15,10 +23,6 @@ It is now (<2025-01-23 Thu>) and time to take stock of what we have
 learned, where we want to go, and how we might get there.   This
 document is an attempt to do that.
 
-# TODO!!!
-# After the following looks good under GitHub, move it to the logrep repo
-# https://github.com/pothiers/notes/blob/master/noirlab/rubin/unified-time-log.org
-
 * About this document
 This was written by Steve Pothier.  All errors and opinions are
 mine. While I've attempted to incorporate ideas from stakeholders
@@ -170,8 +174,9 @@ created our own.
 
 
 *** Anti Requirements (we explicitly REJECT having to do these)
-- Support print of report (print of web page possible but may give
+- Do not Support print of report (print of web page possible but may give
   poor results)
+- Do Not support real time diagnosing of "what went wrong"
 
 
 

diff --git a/docs/unified-time-log.org b/docs/unified-time-log.org
@@ -1,4 +1,293 @@
-# This is a placeholder.
-#
-# Soon ~/sandbox/notes/noirlab/rubin/unified-time-log.org will be
-# moved here,
+* COMMENT PRESCRIPT
+\setlength{\parindent}{0em}
+\parskip 7.2pt
+* About this Document
+# *DRAFT:   This will probably ALWAYS be a DRAFT!*
+Time-stamp: <2025-01-24 12:29:08 pothiers>
+
+This white-paper describes a partially implemented vision/design
+created [2024-11-28 Thu]--[2024-12-02 Mon]. I wrote it after I found
+myself excitedly yammering incoherently to a few people about the
+concepts. These ideas can be grouped under what I have come to call a
+*Single Unified Time Log* /(SUTL="subtle")/. SUTL is currently being
+used (in DRAFT form) in the /Night Summary/ being created for /TSSW Logging and
+Reporting//
+
+* Abstract
+Create a *Single Unified Time Log* using a sequence of steps that
+turns a list of records from each data source into a combined report.
+The steps are formalized and generalized into: /Retrieve, Merge, Compact, Reduce,
+Render/.  The /Rendering/ step partitions the combined data from a
+single Data-Frame into a structure that is rendered into HTML through a
+jinja template. The use of templates admits the possibility of
+generating alternate reports for different use-cases.
+
+* Executive Summary
+The /Night Summary/ being developed under Logging & Reporting is
+intended to provide *upper and middle management* with a *single unified
+report* they can read to "/find out what happened last night/".
+/Night Summary/ is intended to be used during Commissioning but should be usable during
+Operations with minimal code changes.  Everything is evolving. Its
+impossible to predict what will be important in the future.
+
+This application is expected to be maintained for years but maintenance cost
+must be low.  The current Night Summary is hosted as a Notebook on Times Square. Our
+intent is to move away from Notebooks to a platform that is
+more amenable to Software Engineering best practices (e.g. regression
+tests).
+
+#+Begin_Latex
+\pagebreak
+\tableofcontents
+#+End_Latex``
+
+
+* Challenges
+1. Every data *source is accessed in its own way*.
+2. Some fields are tiny, some over 5,000 characters
+3. Much content is entered manually.  What was common before is
+   not common now.  We *cannot predict what will be common next week*.
+4. *We Need It Now.*  No. "Now" is too late.
+5. The data available depends on instrument (at least). This means data
+   retrieval that works with one instrument may not work with others.
+   /Instruments have personalities./
+6. Sometimes we get requests like: +"Show me /everything/ that happened last night"+
+   That is *beyond our scope* and report would be too long to be useful anyhow.
+
+* Essential Elements
+
++ *Retrieve:* Source Adapters are the data-facing components to read records from APIs and databases.
++ *Merge:* Combine all sources into one time based structure.
++ *Compact:* Remove redundant info to shrink the merged structure.
++ *Reduce:* Group and aggregate by time-period ("4 hours"). Aids
+ rendering clarity.
++ *Render:* Create a report from a processed, unified container of
+ source data.
+
+** Source Adapters
+Every data source that we use is wrapped into a subclass that lets
+each of them be treated in (mostly) the same way.  This allows code
+that processes one source to be reused for others.  Each adapter is
+responsible for retrieving the data we need (or more). Almost always
+the adapter reads its data from a web-service API.
+
+** Retrieve, Merge, Compact, Reduce, Render
+All data sources are *retrieved* for the application by /Source Adapters/.
+They are *merged* into a /Single Unified Time Log/ data-frame according
+to a timestamp associated with each source record. The data-frame is
+*compacted* by removing unused rows and columns. The data-frame is *reduced* by
+grouping by time-period and aggregating over the period. Finally, the
+result is *rendered* into HTML via a view that splits the data-frame into
+parts that are passed into an /HTML template/.  Typically, splitting
+partitions the columns into /dense/ (for table rendering) and /sparse/ (for list
+rendering) portions.
+
+* Details (TL;DR)
+** Retrieve, Merge, Compact, Reduce, Render
++ Retrieve: :: Source Adapters read records from APIs and
+  databases. They isolate details of retrieving data from our main
+  task of creating a summary report.
++ Merge: :: lossless.
++ Compact: :: lossless (optional column density filter)
++ Reduce: :: Group and aggregate by time-period ("4 hours"). Gives up
+  time resolution.
++ Render: :: For each report, use analysis of data to be rendered to
+  determine which parts are informationaly dense and which are sparse
+  so they can be rendered differently.
+
+** Sources
+Sources without a timestamp per record cannot not currently be
+processed unless a timestamp is artificially created.
+
+** Merge
+Source records are merged by timestamp into a single Data-Frame (DF).
+
+** Compact
+The DF is Compacted by removing columns and rows that are not
+used.
+
+Optionally, a "density threshold" can be
+provided.  When the ratio of Values/Rows for a column is below the
+threshold, the column is removed.  This is common for fields provided
+by APIs but only sporadically used in the field.  This a dynamic data
+dependent filtering.  A field might not be used for awhile (so, column
+removed), then start being used (column kept).
+
+** Reduce
+The data frame is reduced by grouping by time-period (e.g. "2
+hours") and aggregating the values over the period.
+
+** Render
+The naive approach to presenting data is in spreadsheet-like table
+format. This works great for data that fits in a small cell but not
+for wide data (such as text descriptions or lists of elements).
+
+Our sources contain a wide diversity of data.  Some fields are simple
+scalar values, and might not be populated at all for many days.
+Others fields are text fields that may be 5,000 characters or more
+long. Its challenging to render data that is common and rare, short
+and long, general and specific.  We don't know what the data diversity
+looks like since it may change radically from week to week.
+
+After many unsuccessful attempts at rendering in this changing data
+landscape, we realized that a static solution is doomed to failure.
+Instead, we must adapt to the data diversity for every report.  This
+has lead partitioning data values for a night into a few "type buckets"
+and rendering each bucket in a different way. For instance, we render
+"common, short, scalers" into a table. But, we render "rare, short,
+scalars" as item lists. (below the table, in the same period)
+
+At various times, the target user has been seen as:
+1. Upper Management: "What happened last night?"
+   This is our focus.
+
+2. Operating Specialist: "What did we do a couple days ago? Is it
+   similar to our current problem? Same Solution?
+   # Ignore
+   This is beyond our scope. It might be possible to provide something
+   relatively easily (as a new page), but only if detailed content is
+   provided.
+
+3. Engineers: "What broke? What are the details that will help us fix
+   it?" (diagnosing)
+   Lynne may be doing doing something useful in this area.
+
+We cannot create single summary that will serve all potential
+users. We cannot not predict who the users will be. Therefore, we must
+be able to *generate different reports*.  We don't want every report
+to require a new application.
+
+Solution: The back-end (Retrieve, Merge, Compact, Reduce) creates a
+common data structure that can be used by all reports. A different
+rendering is created for each user type.
+
+* Assumptions
+1. *The screen real estate available for Night Summary is limited.*
+   I use an iPad (2360 x 1640 pixels) to view it.  If Night Summary is
+   not usable in that amount of space, I consider it a failure.  Users
+   may have big screens but should not need them for the the Night
+   Summary.
+
+2. *Not for diagnosing problems.*
+   Diagnostics need more interactivity and the ability to drill down
+   to fine-grained details.  Neither is appropriate for a night
+   *summary* report.
+
+3. We *cannot predict how the distribution of values will change* in the
+   data sources over the next weeks or months.
+
+4. We will *not know who the real users are unless we see people using* the
+   app.
+
+5. One report *cannot satisfy the diversity of all possible users*.
+   Different use-cases imply different reporting and different content.
+
+6. /"Throw it against the all and see what sticks."/
+
+
+* Future
+** Beyond a Night Summary
+The same technique used to summarize a night into periods could be used
+to summarize a week into periods (such as nights).  The differences
+would be in:
+- the data density threshold that determines what fields are removed
+- the style of rendering
+
+** Beyond Notebooks
+We chose implementation via Notebooks so development (prototyping)
+could be fast.  Notebooks might have been a non-starter except that
+Times Square allows them to be presented as a parameterized web page
+to end users.  Our hope was to factor out the back-end code from the
+notebooks so that we could (somehow) later replace the notebook with a "real
+GUI" that could offer greater interaction.
+
+With the addition of the template based rendering of HTML, we have
+markedly decrease the gap between what we have and what typical web
+frameworks (such as Django) need. By storing the pre-rendered data in
+a (small) database, we can collect multiple lower level data-frames to
+summarized into higher level data-frames (nights to week, weeks to
+month).  From the stored data-frames, we can provide GUI applications
+such as LOVE with web-service access to the pre-rendered data.
+Through different HTML templates, we can serve customized reports to
+various types of users (provided the data they need is already
+somewhere in our sources).
+
+* Cutting room floor (TL;DR)                                       :noexport:
+
+** Design elements
+Merge sources by date-time column into a single wide and long
+data-frame. The intent is to use this combined full_df for everything
+else.  A variant of the full_df would be the logical choice for a
+small summary-oriented database held by the back-end and served to the
+GUI.
+
+The full_df is compacted, reduced, and rendered for use.
+
+** cut
+Insight into the Night Summary problem: Pure tables are not great for our
+data because some fields are simple scalars, but some are lists or
+large chunks of text.  This creates uneven usage of white space when
+rendering as a table (e.g. the text gets squeezed into a column so
+that its too narrow making it take up more vertical spaced). But even
+if we remove the text, there are some scalars that are rare.  A column
+that contains mostly nothing wastes horizontal space.  I now have a
+way to dynamically move fields from table to a list below the table
+and the table just one row per period (eg. 4 hour block). It can
+detect that a column is 95% empty, remove the column, and put the few
+values in a list.  It does this in a data dependent dynamic way using
+a template system to generate the HTML.
+
+* POSTSCRIPT							   :noexport:
+/(this section here to keep Document Comments out of the way)/
+source: /home/pothiers/orgfiles/designs.org
+
+Something like this can be inserted into doc by invoking export dispatcher
+and selected "insert template" (C-c C-e #).
+
+
+#+TITLE:   Night Summary
+#+SUBTITLE:   using a Single Unified Time Log (SUTL)
+#+AUTHOR:    Steve Pothier
+#+EMAIL:     [email protected]
+#+DESCRIPTION: Personal design notes
+#+KEYWORDS:
+#+LANGUAGE:  en
+#+OPTIONS:   H:3 num:1 toc:nil \n:nil @:t ::t |:t ^:nil -:t f:t *:t <:t
+#+OPTIONS:   TeX:t LaTeX:t skip:nil d:nil todo:t pri:nil tags:nil
+#+INFOJS_OPT: view:nil toc:t ltoc:t mouse:underline buttons:0 path:http://orgmode.org/org-info.js
+#+EXPORT_SELECT_TAGS: export
+#+EXPORT_EXCLUDE_TAGS: noexport
+#+LINK_UP:
+#+LINK_HOME:
+#+XSLT:
+
+#+LATEX_HEADER: \setlength{\parindent}{0em}\parskip 7.2pt
+
+
++LATEX_HEADER: \usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
++CAPTION: CCML Model Creation Workflow
+
+
+#! +LaTeX_HEADER: \usepackage{fancyhdr}
+#! +LaTeX_HEADER: \pagestyle{fancy}
+#! +LaTeX_HEADER: \fancyhf{}
+#! +LaTeX_HEADER: \fancyhead[OC,EC]{DRAFT\\NOIRLab Proprietary}
+#! +LaTeX_HEADER: \fancyfoot[OC,EC]{NOIRLab Proprietary\\DRAFT}
+#! +LaTeX_HEADER: \fancyfoot[RO, LE] {\thepage}
+#! +LaTeX_HEADER: \renewcommand{\headrulewidth}{0.4pt}
+#! +LaTeX_HEADER: \renewcommand{\footrulewidth}{0.4pt}
+#! #+LaTeX_HEADER: \usepackage{draftwatermark}
+#! #+LaTeX_HEADER: \SetWatermarkText{DRAFT}
+#! #+LaTeX_HEADER: \SetWatermarkScale{1.5}
+#! #+LATEX_HEADER: \usepackage[margin=2.5cm]{geometry}
+#!
+#! +TEXT: This white paper includes data that shall not be disclosed outside of
+#! +TEXT: NOIRLab or NSF and shall not be duplicated, used, or disclosed,
+#! +TEXT: in whole or in part, for any purpose other than to evaluate this
+#! +TEXT: white paper.
+
+#+TEXT: \newpage
+#+TEXT: [TABLE-OF-CONTENTS]
+#+TEXT: \newpage
+#+LaTeX_HEADER: \newpage
+#+LATEX_HEADER: \usepackage[margin=0.5in]{geometry}