Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Focused Generation with Attention不如预期 #11

Open
YunjieYu opened this issue Nov 28, 2024 · 3 comments
Open

Focused Generation with Attention不如预期 #11

YunjieYu opened this issue Nov 28, 2024 · 3 comments

Comments

@YunjieYu
Copy link

YunjieYu commented Nov 28, 2024

首先,非常感谢作者做了一个如此有趣的工作,尝试把VLM与Flux结合,从而解锁Flux的图片理解能力。
文章中提到的GridDot Panel for Semantic-Aware Generation非常有趣——实现了参考图的区域级控制。例如,你给出的盔甲士兵的例子。
可是我测试的结果不如预期,我在想是不是center_x、center_y、radius设置错误导致的。因此,我想确定一下main.py入参里的--center_x, --center_y和--radius具体含义是什么。假设我们确定了参考图中感兴趣的区域中心,center_x、center_y是该区域中心分别以图像w、h为基准进行归一化吗?

@erwold
Copy link
Owner

erwold commented Nov 28, 2024

announcing-blendic-ai-the-most-intuitive-image-text-to-v0-40vxpesd2mwd1 png
announcing-blendic-ai-the-most-intuitive-image-text-to-v0-62qbhwvr1mwd1 png
announcing-blendic-ai-the-most-intuitive-image-text-to-v0-bn93c0so2mwd1 png
announcing-blendic-ai-the-most-intuitive-image-text-to-v0-r4oa7it32mwd1 png

griddot这个东西,如果没有一个前端界面的话确实有些难理解,这是我自己弄的前端界面,希望能帮助你理解;另外如果你看代码的话,其实不难发现center_x,--center_y,--radius这几个参数都是针对qwen2_hidden_state来做的,然后center_x/y是你选的图片的中心点,radius是一个范围参数,意思是从中心点到周围以一种什么样的权重来扩散/选择,如果radius很小,那就会选取qwen2_hidden_state中 中心点周围很小的一部分hidden_state,而其他地方几乎都被0乘了

@erwold
Copy link
Owner

erwold commented Nov 28, 2024

所以这个Focused Generation with Attention机制,本质上是在通过选取qwen2_hidden_state的某一区域,例如你想选择reference图片中哪一部分来做为后续的生成

@YunjieYu
Copy link
Author

YunjieYu commented Nov 29, 2024

感谢作者的回复。在发布该issue之后,我也看了下Focused Generation with Attention的代码实现。确实如你所说,center_x, center_y, radius这几个参数都是在qwen2_hidden_state空间来做的。起初,我看到这个机制时,想的是这玩意儿能不能当作一个LoRA来使用——选定一个感兴趣的区域(例如:某个人像),利用Focused Generation with Attention机制来稳定生成该人像。现在也清楚了,center_x, center_y, radius是在qwen2_hidden_state空间来控制的,但我们日常编辑/交互都是在图像空间上做的,而qwen2_hidden_state与图像不是一个空间。所以我的问题是,center_x, center_y是不是代表感兴趣中心位置(wc, hc)以图像宽高(w, h)为基准进行归一化:

center_x = wc / w
center_y = yc / y

例如,对于一个w×h为720×1280的图像,假设我感兴趣中心位置是(180, 640)。那么center_x应该设置为0.25, center_y应该设置为0.5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants