Data Model

Implementation

Since data is received monthly, an admin would automate processes for that data to drop (securely) to S3 so that hydration can begin.
Scales well since each added transformation task is trivial to monitor.
Authentication/security is important here since we don't want just anyone sending data to our S3 buckets.
This admin owns all backend failure recovery.

S3 & Redshift are inexpensive relative to the power they provide. Both together can support thousands of individual data sources and transformations
Storing the data as closely to the way it came in makes debugging and data integrity issues easier to spot than if we front-load aggregated calculations into hydrate_model.py.
Minimal failures should come from here since the data is prepared in the prior step.
Authentication treated the same here as Data Transformation

The frontend would start with selectors for genres and production companies. These selectors would allow users to choose one or many of each.
The final two selectors would decide which metric is being aggregated and how (average budget, total revenue, etc).
These selectors would be the base for a SQL Builder which spits out a query based on the selectors.
This does add the need for SQL experts and analysts to maintain and keep things neat/according to spec, but with self serve data reporting I like the philosophy of 'Have as many eyes as possible on things' over 'Set and forget it.'
All together, the four selectors form a 'question' which runs the built SQL on Redshift
After running a 'question' and fetching data, Ideally offer raw data to download by the user as well as simple visualization for quick analyses
Ideally lives in Looker, but can be engineered simply using a tornado server or similar
Authentication/security less of a problem here due to lack of PII and 'on-rails' nature of building SQL queries based on selectors with limited choices.