Focused Generation with Attention不如预期 #11

YunjieYu · 2024-11-28T08:08:01Z

首先，非常感谢作者做了一个如此有趣的工作，尝试把VLM与Flux结合，从而解锁Flux的图片理解能力。
文章中提到的GridDot Panel for Semantic-Aware Generation非常有趣——实现了参考图的区域级控制。例如，你给出的盔甲士兵的例子。
可是我测试的结果不如预期，我在想是不是center_x、center_y、radius设置错误导致的。因此，我想确定一下main.py入参里的--center_x, --center_y和--radius具体含义是什么。假设我们确定了参考图中感兴趣的区域中心，center_x、center_y是该区域中心分别以图像w、h为基准进行归一化吗？

erwold · 2024-11-28T16:32:04Z

griddot这个东西，如果没有一个前端界面的话确实有些难理解，这是我自己弄的前端界面，希望能帮助你理解；另外如果你看代码的话，其实不难发现center_x，--center_y，--radius这几个参数都是针对qwen2_hidden_state来做的，然后center_x/y是你选的图片的中心点，radius是一个范围参数，意思是从中心点到周围以一种什么样的权重来扩散/选择，如果radius很小，那就会选取qwen2_hidden_state中中心点周围很小的一部分hidden_state，而其他地方几乎都被0乘了

erwold · 2024-11-28T16:34:11Z

所以这个Focused Generation with Attention机制，本质上是在通过选取qwen2_hidden_state的某一区域，例如你想选择reference图片中哪一部分来做为后续的生成

YunjieYu · 2024-11-29T03:11:03Z

感谢作者的回复。在发布该issue之后，我也看了下Focused Generation with Attention的代码实现。确实如你所说，center_x, center_y, radius这几个参数都是在qwen2_hidden_state空间来做的。起初，我看到这个机制时，想的是这玩意儿能不能当作一个LoRA来使用——选定一个感兴趣的区域（例如：某个人像），利用Focused Generation with Attention机制来稳定生成该人像。现在也清楚了，center_x, center_y, radius是在qwen2_hidden_state空间来控制的，但我们日常编辑/交互都是在图像空间上做的，而qwen2_hidden_state与图像不是一个空间。所以我的问题是，center_x, center_y是不是代表感兴趣中心位置(wc, hc)以图像宽高(w, h)为基准进行归一化：

center_x = wc / w
center_y = yc / y

例如，对于一个w×h为720×1280的图像，假设我感兴趣中心位置是(180, 640)。那么center_x应该设置为0.25, center_y应该设置为0.5?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Focused Generation with Attention不如预期 #11

Focused Generation with Attention不如预期 #11

YunjieYu commented Nov 28, 2024 •

edited

Loading

erwold commented Nov 28, 2024

erwold commented Nov 28, 2024

YunjieYu commented Nov 29, 2024 •

edited

Loading

Focused Generation with Attention不如预期 #11

Focused Generation with Attention不如预期 #11

Comments

YunjieYu commented Nov 28, 2024 • edited Loading

erwold commented Nov 28, 2024

erwold commented Nov 28, 2024

YunjieYu commented Nov 29, 2024 • edited Loading

YunjieYu commented Nov 28, 2024 •

edited

Loading

YunjieYu commented Nov 29, 2024 •

edited

Loading