-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
304 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,293 @@ | ||
# This is a placeholder. | ||
# | ||
# Soon ~/sandbox/notes/noirlab/rubin/unified-time-log.org will be | ||
# moved here, | ||
* COMMENT PRESCRIPT | ||
\setlength{\parindent}{0em} | ||
\parskip 7.2pt | ||
* About this Document | ||
# *DRAFT: This will probably ALWAYS be a DRAFT!* | ||
Time-stamp: <2025-01-24 12:29:08 pothiers> | ||
|
||
This white-paper describes a partially implemented vision/design | ||
created [2024-11-28 Thu]--[2024-12-02 Mon]. I wrote it after I found | ||
myself excitedly yammering incoherently to a few people about the | ||
concepts. These ideas can be grouped under what I have come to call a | ||
*Single Unified Time Log* /(SUTL="subtle")/. SUTL is currently being | ||
used (in DRAFT form) in the /Night Summary/ being created for /TSSW Logging and | ||
Reporting// | ||
|
||
* Abstract | ||
Create a *Single Unified Time Log* using a sequence of steps that | ||
turns a list of records from each data source into a combined report. | ||
The steps are formalized and generalized into: /Retrieve, Merge, Compact, Reduce, | ||
Render/. The /Rendering/ step partitions the combined data from a | ||
single Data-Frame into a structure that is rendered into HTML through a | ||
jinja template. The use of templates admits the possibility of | ||
generating alternate reports for different use-cases. | ||
|
||
* Executive Summary | ||
The /Night Summary/ being developed under Logging & Reporting is | ||
intended to provide *upper and middle management* with a *single unified | ||
report* they can read to "/find out what happened last night/". | ||
/Night Summary/ is intended to be used during Commissioning but should be usable during | ||
Operations with minimal code changes. Everything is evolving. Its | ||
impossible to predict what will be important in the future. | ||
|
||
This application is expected to be maintained for years but maintenance cost | ||
must be low. The current Night Summary is hosted as a Notebook on Times Square. Our | ||
intent is to move away from Notebooks to a platform that is | ||
more amenable to Software Engineering best practices (e.g. regression | ||
tests). | ||
|
||
#+Begin_Latex | ||
\pagebreak | ||
\tableofcontents | ||
#+End_Latex`` | ||
|
||
|
||
* Challenges | ||
1. Every data *source is accessed in its own way*. | ||
2. Some fields are tiny, some over 5,000 characters | ||
3. Much content is entered manually. What was common before is | ||
not common now. We *cannot predict what will be common next week*. | ||
4. *We Need It Now.* No. "Now" is too late. | ||
5. The data available depends on instrument (at least). This means data | ||
retrieval that works with one instrument may not work with others. | ||
/Instruments have personalities./ | ||
6. Sometimes we get requests like: +"Show me /everything/ that happened last night"+ | ||
That is *beyond our scope* and report would be too long to be useful anyhow. | ||
|
||
* Essential Elements | ||
|
||
+ *Retrieve:* Source Adapters are the data-facing components to read records from APIs and databases. | ||
+ *Merge:* Combine all sources into one time based structure. | ||
+ *Compact:* Remove redundant info to shrink the merged structure. | ||
+ *Reduce:* Group and aggregate by time-period ("4 hours"). Aids | ||
rendering clarity. | ||
+ *Render:* Create a report from a processed, unified container of | ||
source data. | ||
|
||
** Source Adapters | ||
Every data source that we use is wrapped into a subclass that lets | ||
each of them be treated in (mostly) the same way. This allows code | ||
that processes one source to be reused for others. Each adapter is | ||
responsible for retrieving the data we need (or more). Almost always | ||
the adapter reads its data from a web-service API. | ||
|
||
** Retrieve, Merge, Compact, Reduce, Render | ||
All data sources are *retrieved* for the application by /Source Adapters/. | ||
They are *merged* into a /Single Unified Time Log/ data-frame according | ||
to a timestamp associated with each source record. The data-frame is | ||
*compacted* by removing unused rows and columns. The data-frame is *reduced* by | ||
grouping by time-period and aggregating over the period. Finally, the | ||
result is *rendered* into HTML via a view that splits the data-frame into | ||
parts that are passed into an /HTML template/. Typically, splitting | ||
partitions the columns into /dense/ (for table rendering) and /sparse/ (for list | ||
rendering) portions. | ||
|
||
* Details (TL;DR) | ||
** Retrieve, Merge, Compact, Reduce, Render | ||
+ Retrieve: :: Source Adapters read records from APIs and | ||
databases. They isolate details of retrieving data from our main | ||
task of creating a summary report. | ||
+ Merge: :: lossless. | ||
+ Compact: :: lossless (optional column density filter) | ||
+ Reduce: :: Group and aggregate by time-period ("4 hours"). Gives up | ||
time resolution. | ||
+ Render: :: For each report, use analysis of data to be rendered to | ||
determine which parts are informationaly dense and which are sparse | ||
so they can be rendered differently. | ||
|
||
** Sources | ||
Sources without a timestamp per record cannot not currently be | ||
processed unless a timestamp is artificially created. | ||
|
||
** Merge | ||
Source records are merged by timestamp into a single Data-Frame (DF). | ||
|
||
** Compact | ||
The DF is Compacted by removing columns and rows that are not | ||
used. | ||
|
||
Optionally, a "density threshold" can be | ||
provided. When the ratio of Values/Rows for a column is below the | ||
threshold, the column is removed. This is common for fields provided | ||
by APIs but only sporadically used in the field. This a dynamic data | ||
dependent filtering. A field might not be used for awhile (so, column | ||
removed), then start being used (column kept). | ||
|
||
** Reduce | ||
The data frame is reduced by grouping by time-period (e.g. "2 | ||
hours") and aggregating the values over the period. | ||
|
||
** Render | ||
The naive approach to presenting data is in spreadsheet-like table | ||
format. This works great for data that fits in a small cell but not | ||
for wide data (such as text descriptions or lists of elements). | ||
|
||
Our sources contain a wide diversity of data. Some fields are simple | ||
scalar values, and might not be populated at all for many days. | ||
Others fields are text fields that may be 5,000 characters or more | ||
long. Its challenging to render data that is common and rare, short | ||
and long, general and specific. We don't know what the data diversity | ||
looks like since it may change radically from week to week. | ||
|
||
After many unsuccessful attempts at rendering in this changing data | ||
landscape, we realized that a static solution is doomed to failure. | ||
Instead, we must adapt to the data diversity for every report. This | ||
has lead partitioning data values for a night into a few "type buckets" | ||
and rendering each bucket in a different way. For instance, we render | ||
"common, short, scalers" into a table. But, we render "rare, short, | ||
scalars" as item lists. (below the table, in the same period) | ||
|
||
At various times, the target user has been seen as: | ||
1. Upper Management: "What happened last night?" | ||
This is our focus. | ||
|
||
2. Operating Specialist: "What did we do a couple days ago? Is it | ||
similar to our current problem? Same Solution? | ||
# Ignore | ||
This is beyond our scope. It might be possible to provide something | ||
relatively easily (as a new page), but only if detailed content is | ||
provided. | ||
|
||
3. Engineers: "What broke? What are the details that will help us fix | ||
it?" (diagnosing) | ||
Lynne may be doing doing something useful in this area. | ||
|
||
We cannot create single summary that will serve all potential | ||
users. We cannot not predict who the users will be. Therefore, we must | ||
be able to *generate different reports*. We don't want every report | ||
to require a new application. | ||
|
||
Solution: The back-end (Retrieve, Merge, Compact, Reduce) creates a | ||
common data structure that can be used by all reports. A different | ||
rendering is created for each user type. | ||
|
||
* Assumptions | ||
1. *The screen real estate available for Night Summary is limited.* | ||
I use an iPad (2360 x 1640 pixels) to view it. If Night Summary is | ||
not usable in that amount of space, I consider it a failure. Users | ||
may have big screens but should not need them for the the Night | ||
Summary. | ||
|
||
2. *Not for diagnosing problems.* | ||
Diagnostics need more interactivity and the ability to drill down | ||
to fine-grained details. Neither is appropriate for a night | ||
*summary* report. | ||
|
||
3. We *cannot predict how the distribution of values will change* in the | ||
data sources over the next weeks or months. | ||
|
||
4. We will *not know who the real users are unless we see people using* the | ||
app. | ||
|
||
5. One report *cannot satisfy the diversity of all possible users*. | ||
Different use-cases imply different reporting and different content. | ||
|
||
6. /"Throw it against the all and see what sticks."/ | ||
|
||
|
||
* Future | ||
** Beyond a Night Summary | ||
The same technique used to summarize a night into periods could be used | ||
to summarize a week into periods (such as nights). The differences | ||
would be in: | ||
- the data density threshold that determines what fields are removed | ||
- the style of rendering | ||
|
||
** Beyond Notebooks | ||
We chose implementation via Notebooks so development (prototyping) | ||
could be fast. Notebooks might have been a non-starter except that | ||
Times Square allows them to be presented as a parameterized web page | ||
to end users. Our hope was to factor out the back-end code from the | ||
notebooks so that we could (somehow) later replace the notebook with a "real | ||
GUI" that could offer greater interaction. | ||
|
||
With the addition of the template based rendering of HTML, we have | ||
markedly decrease the gap between what we have and what typical web | ||
frameworks (such as Django) need. By storing the pre-rendered data in | ||
a (small) database, we can collect multiple lower level data-frames to | ||
summarized into higher level data-frames (nights to week, weeks to | ||
month). From the stored data-frames, we can provide GUI applications | ||
such as LOVE with web-service access to the pre-rendered data. | ||
Through different HTML templates, we can serve customized reports to | ||
various types of users (provided the data they need is already | ||
somewhere in our sources). | ||
|
||
* Cutting room floor (TL;DR) :noexport: | ||
|
||
** Design elements | ||
Merge sources by date-time column into a single wide and long | ||
data-frame. The intent is to use this combined full_df for everything | ||
else. A variant of the full_df would be the logical choice for a | ||
small summary-oriented database held by the back-end and served to the | ||
GUI. | ||
|
||
The full_df is compacted, reduced, and rendered for use. | ||
|
||
** cut | ||
Insight into the Night Summary problem: Pure tables are not great for our | ||
data because some fields are simple scalars, but some are lists or | ||
large chunks of text. This creates uneven usage of white space when | ||
rendering as a table (e.g. the text gets squeezed into a column so | ||
that its too narrow making it take up more vertical spaced). But even | ||
if we remove the text, there are some scalars that are rare. A column | ||
that contains mostly nothing wastes horizontal space. I now have a | ||
way to dynamically move fields from table to a list below the table | ||
and the table just one row per period (eg. 4 hour block). It can | ||
detect that a column is 95% empty, remove the column, and put the few | ||
values in a list. It does this in a data dependent dynamic way using | ||
a template system to generate the HTML. | ||
|
||
* POSTSCRIPT :noexport: | ||
/(this section here to keep Document Comments out of the way)/ | ||
source: /home/pothiers/orgfiles/designs.org | ||
|
||
Something like this can be inserted into doc by invoking export dispatcher | ||
and selected "insert template" (C-c C-e #). | ||
|
||
|
||
#+TITLE: Night Summary | ||
#+SUBTITLE: using a Single Unified Time Log (SUTL) | ||
#+AUTHOR: Steve Pothier | ||
#+EMAIL: [email protected] | ||
#+DESCRIPTION: Personal design notes | ||
#+KEYWORDS: | ||
#+LANGUAGE: en | ||
#+OPTIONS: H:3 num:1 toc:nil \n:nil @:t ::t |:t ^:nil -:t f:t *:t <:t | ||
#+OPTIONS: TeX:t LaTeX:t skip:nil d:nil todo:t pri:nil tags:nil | ||
#+INFOJS_OPT: view:nil toc:t ltoc:t mouse:underline buttons:0 path:http://orgmode.org/org-info.js | ||
#+EXPORT_SELECT_TAGS: export | ||
#+EXPORT_EXCLUDE_TAGS: noexport | ||
#+LINK_UP: | ||
#+LINK_HOME: | ||
#+XSLT: | ||
|
||
#+LATEX_HEADER: \setlength{\parindent}{0em}\parskip 7.2pt | ||
|
||
|
||
+LATEX_HEADER: \usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry} | ||
+CAPTION: CCML Model Creation Workflow | ||
|
||
|
||
#! +LaTeX_HEADER: \usepackage{fancyhdr} | ||
#! +LaTeX_HEADER: \pagestyle{fancy} | ||
#! +LaTeX_HEADER: \fancyhf{} | ||
#! +LaTeX_HEADER: \fancyhead[OC,EC]{DRAFT\\NOIRLab Proprietary} | ||
#! +LaTeX_HEADER: \fancyfoot[OC,EC]{NOIRLab Proprietary\\DRAFT} | ||
#! +LaTeX_HEADER: \fancyfoot[RO, LE] {\thepage} | ||
#! +LaTeX_HEADER: \renewcommand{\headrulewidth}{0.4pt} | ||
#! +LaTeX_HEADER: \renewcommand{\footrulewidth}{0.4pt} | ||
#! #+LaTeX_HEADER: \usepackage{draftwatermark} | ||
#! #+LaTeX_HEADER: \SetWatermarkText{DRAFT} | ||
#! #+LaTeX_HEADER: \SetWatermarkScale{1.5} | ||
#! #+LATEX_HEADER: \usepackage[margin=2.5cm]{geometry} | ||
#! | ||
#! +TEXT: This white paper includes data that shall not be disclosed outside of | ||
#! +TEXT: NOIRLab or NSF and shall not be duplicated, used, or disclosed, | ||
#! +TEXT: in whole or in part, for any purpose other than to evaluate this | ||
#! +TEXT: white paper. | ||
|
||
#+TEXT: \newpage | ||
#+TEXT: [TABLE-OF-CONTENTS] | ||
#+TEXT: \newpage | ||
#+LaTeX_HEADER: \newpage | ||
#+LATEX_HEADER: \usepackage[margin=0.5in]{geometry} |