-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize Bottle Labels #210
Comments
For the variants, I think Ardbeg is a really strong litmust test, specifically this bottle: https://peated.com/bottles/40876 Another important thing we must consider: Casks are sometimes important sometimes not. Every SMWS bottle is single cask, so we need to make sure the case of "a single variant" is really smooth and looks no different than a single non-variant bottle flow (e.g. Laphroiag 10). One last thing that needs to be thought about now that I'm writing this out: What about special releases of typical labelings? IMO they should probably go under a variant. e.g. the 225th Anniversary Edition. |
a note about proof/abv/etc: These are not primary variant factors, so we may also need to determine makes a variant unique, and what data should be approximate. For example, if the proof is "120-130", we dont actually care. Thats not enough to create a new variant, its just variable (as is expected, tbqh). So do we even store proof? ABV is an important thing to some degree, but how do we deal w/ the fact that it varies? |
The now most critically important question: what do I call them? Thinking 'edition' for now. |
Working branch is Going to keep the 'bottle' table for the aggregations, and use the new bottle_edition` table - first to copy all the existing data into it - and eventually to be the canonical reference for full bottle details. Every exist bottle will at minimum have one row in bottle_edition, and then we'll collapse a bunch of bottles into each other. |
Rethinking this with fresh eyes this morning.
This should make it cleaner to actually get this change done, as right now looking at renaming things, and breaking up variants from bottles.. its just too many changes and its not completely objective. For the bottles details page, this means youll still be able to permalink every bottle, and we'll simply add an "Editions" (tbd) section on it that shows the other bottles. That will show both for the parent bottles as well as all other editions. We'll also still need the 'edition' (nullable) string column on the bottles table. |
Pushed singleCask and caskStrength flags (and name detections). Working on getting edition in now, and migration BottleAlias.name to be a mirror of Bottle.fullName, which means Bottle.fullName will become less used in the UI (e.g. when we want to break up "Laphroiag" "12-year-old" and "225th Anniversary" components). |
Im realizing my primary issue is likely from trying to generate a unique label as a string. Let's take this random 40 year: Tomatin 40-year-old The vintage year matters, more so than it does with many others. Do you have to duplicate the vintage year in to the edition now? Thats silly. What you want to do is just fill out the bottle information in as much detail as you can, and have the system understand if its a duplicate or not. The problem is two things:
The first issue I think can be addressed through generated names. We can look at all bottles in a series on a write, and generate. description name (particular with the subtext field). Or we can be dumb about it for now and just do some rule-based heuristics for the display name. The second issue is likely just going to need dupe detection. There's various techniques we can use to identify duplicates, help merge them, and help avoid future duplicates. Mostly this comes down to making the bottle search and add bottle flows very easy to identify potential matches. So I think the next step, after I clean up some data, is likely to figure out the unique constraint solution. I'll try to keep edition one field for now, and continue to overload it with batch/series/etc information. |
Fresh eyes this morning, I have a mental model for how to deemphasize editions in the database (thus removing a lot of the noise to beginners). The core concern that I need to solve to pull this off yet though is the approach to naming editions. Right now there are a lots of variables in play, but effectively we need the Bottle.name to become the bottling series, and the Bottle.edition to become the descriptor of the individual bottle. I want to take a common scenario that poses the UX problem I'm having: Angel's Envy Cask Strength 2020
However, in this case, 2020 is also the Release Year. I wanted to avoid filling in duplicate details - we already have some silliness with the name vs statedAge. Maybe we should just ignore the release year field as a goal right now though? Force filling in the edition for these variable details, try to pusht he user to enter the right information, and then build some tooling to improve over time. |
Two open scenarios that are more tricky:
|
Here's a thought exercise:
Exercise:
Ok those two work, but we're still stuck here:
|
Some more challenges: Kilchoman Spring Release 2010 Whats the series name? Spring Release? |
One obvious rule we can add:
This doesn't solve for "what about the release/batch/edition". We could continue to keep that as a separate field. None of this helps us with deduping yet, or creating those series concepts. |
Need to determine if |
After living with this for a couple weeks, I'm not sold on this editions field. Sure its hypothetically better to dedupe things, but it feels more tedious from a manual input point of view. You're sitting there, staring a bottle label, and you just have to ask yourself "wtf is the name and wtf is the edition". That's not fun. I may revert it and combine edition back into name. Doesn't mean we can still pull out some of the things above. |
I think the best approach here is going to be to do the following:
Some thoughts:
We'll need to be concious of when we use the full label as they're quite long and aren't going to render well in a LOT of scenarios. An example of a full label is The Macallan 12-Year 225th Anniversary Single Malt Scotch Whisky. First passes at building a prompt having been going super well as the outputs are still unreliable (e.g. sometimes itll remove flavor text, like the spirit type, but other times itll leave part of it in, like "Islay"). I still think we want some kind of aggregate, particularly so you can record something like a tasting when you dont know which release you're trying. Take any of these editions where they release a stated year bottling thats effectively the same as other years. |
Still struggling on this one, and probably the biggest blocker for progression right now. I've done a bit of work on the moderation queue, and have been testing some prompts with OpenAI's LLMs. The two main things that have to be solved for are the overal normalization schema (which is going to require this ruleset), and then another secondary approach that takes an abstract bottle label and is able to plug it into a search scheme. Example of prompt I was using to test some simple normalization technique to help me on the moderation queue:
(The main thing is being able to actually identify the brand vs the rest of the bottle label) |
We're going to take a clean pass at bottling and solve this once and for all.
The plan is the following:
In general, I think what we're trying to do here is implicitly create "series" but in a more structured manner than something like Whiskybase does.
For bottle attributes, one of the biggest things we have to determine is which attributes enforce that its a variant. This is primarily going to be single cask focused.
Here's a list with the help of ChatGPT. Some of these are deterministic via the name, others are not whatsoever.
Things that show up in the name and are non deterministic:
Some unknowns:
Other things we should consider:
Realistically most of these are going to be focused on the variants. The parent probably only includes a few key items (which cannot change between children):
Stated Age- some distillers have different ages for each releaseThe text was updated successfully, but these errors were encountered: