Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: automatic output file naming #80

Open
anthonywu opened this issue Oct 18, 2024 · 7 comments
Open

proposal: automatic output file naming #80

anthonywu opened this issue Oct 18, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@anthonywu
Copy link
Collaborator

Currently the pattern for automatic image naming is image.* and the subsequent images are smartly indexed as image-N.* etc.

However, I think we can make some improvements:

  1. always add the dimensions to the image name like image-512x512.png
  2. automatically summarize the prompt into a file name - so if you had a prompt about a dog playing in a park, the filename can be summarized by some tool as dog-play-in-park.*. The tool can be something traditional/fast like nltk or some other modern embedding/ranking tool for finding the most relevant keywords.
  3. allow users to opt-in to automatic placeholders such as
  • {iso_date} ISO date yyyy-mm-dd`
  • {unix_timestamp} via date +%s or python str fmt
  • {seed}
  • other placeholders that represent the various params: guidance, quantize, etc

not trying to scope creep on what we support - just enough that every file name can be reasonably expected to be unique (seed and prompt summary, at the minimum)

We can also make a API for a OutputFileNamer - we'll provide reasonable defaults, and any users of the library can inject their own customization as needed.

@azrahello
Copy link
Contributor

azrahello commented Oct 19, 2024

I would like to ask if it is possible to embed the prompt as EXIF data in a PNG file, so that when the image is imported into photographic software, the prompt appears as a description. In general, could the JSON created with the –metadata option also be embedded as EXIF data?

@filipstrand
Copy link
Owner

@azrahello We do export this information with the created png file since a while back (even if you do not include the -metadata flag). For example, for an image image.png generated by mflux, if you run

exiftool image.png

then you will see this kind of information (including the prompt):

File Type                       : PNG
File Type Extension             : png
MIME Type                       : image/png
Image Width                     : 1024
Image Height                    : 1024
Bit Depth                       : 8
Color Type                      : RGB
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
Exif Byte Order                 : Big-endian (Motorola, MM)
Warning                         : Invalid EXIF text encoding for UserComment
User Comment                    : {'mflux_version': '0.2.1', 'model': 'schnell', 'seed': '1728244816', 'steps': '6', 'guidance': 'None', 'precision': 'mlx.core.bfloat16', 'quantization': 'None', 'generation_time': '610.10 seconds', 'lora_paths': 'None', 'lora_scales': 'None', 'prompt': 'blue bird', 'controlnet_image': 'None', 'controlnet_strength': 'None'}
Image Size                      : 1024x1024
Megapixels                      : 1.0

I did not spend too much time on the this feature, and there are probably better ways to structure this information so that it can be read by various image applications. E.g would be nice if it was shown in the info section in macOS etc (right now it is not shown)

Screenshot 2024-10-19 at 15 09 52

@filipstrand
Copy link
Owner

@anthonywu I like your suggestions here

@anthonywu
Copy link
Collaborator Author

anthonywu commented Jan 21, 2025

The intent here is to turn a long prompt paragraph into a few keywords for purposes of generating a file name from a user prompt.

After some research, I considered using the keybert library: https://github.com/MaartenGr/KeyBERT but here I will pause to consider the complexities that would be introduced as config expectations from maintainers and users:

  • choose an embedding model
  • if choosing a model such as sentence_transformer, then choose a non-default model from its model list - if we just choose a default but allow overrides, that adds several flags to clutter up the CLI interface
  • choose default output file max length, which informs how many "top N" keywords or key phrases we select
  • pip install of any of these models adds dependencies to mflux project, for example when pip install keybert we'd add joblib, scikit-learn, scipy, sentence-transformers, and threadpoolctl

Feedback and ideas welcome from the community here. Add some convenience at the cost of complexity?

@gmorain
Copy link

gmorain commented Jan 22, 2025

Hello @anthonywu

I fully understand your concern about the pulled dependencies. Let me just suggest an alternative solution that would potentially "externalise" the complexity and dependencies since naming the files is not really part of the generation process: as all the metadata (including the prompts) are embedded in the image, could that feature be just part of a secondary tool to "batch rename" the outputs (ie. iterating all the files in a folder, implementing your solution to extract keywords from the prompt recovered from the image metadata to rename the file)? It could become either another repository like the Streamlit frontend, or an optional component you could install with pip install mflux[renamer]. A little less user-friendly of course...

Best,
Gilles

@anthonywu
Copy link
Collaborator Author

I think an opt-in renamer would be a good option if metadata is available.

If the user did not elect to install the optional libraries the renamer can do a no-op and print a message.

Taking this further, using uv to run such a tool in its own venv is something I can explore. Perhaps this can be done in a single standalone .py script even.

@anthonywu
Copy link
Collaborator Author

Even though #120 satisfies some of this original request from myself, I'll keep this open because I think I can get the date/timestamp and the dimensions into the standard naming conventions as well as allow some third party programmatic control via output namer hooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants