You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Congratulations on your amazing work! This open-source project is truly a significant contribution to the community. I have a few questions about certain aspects of the paper and would greatly appreciate any clarification from you or anyone else who might have answers:
Equation 5: Since x_1 represents the full-resolution latent and x_0 is the lowest-resolution latent, could you explain the rationale behind applying a downsampling function to x_0? This aspect is a bit unclear to me
Equation 6: Why do you downsample x_{s_k} by 2^{k+1} and then upsample it afterward? Would there be a specific reason for not directly downsampling x_{s_k} by 2^k instead?
Selection of s_k and e_k: Could you elaborate on how s_k and e_k are chosen? I read your ICLR 2025 rebuttal regarding the time windows, but I'm still unclear about the normalized timestep. Specifically, if your framework comprises four stages, could you specify the range for each time window?
I appreciate any support from you all!!
The text was updated successfully, but these errors were encountered:
$x_0$ is full-resolution noise (with the same resolution as $x_1$), we apply downsampling to obtain low-resolution noise.
It's to align with the inference. It first inferences at lower-resolution pyramid stage, and then performs some kind of upsampling, resulting in a pixelated latent.
Great question! For $K=4$, we specify the time windows as $[0, \frac{1}{4}], [\frac{1}{7}, \frac{1}{2}], [\frac{1}{3}, \frac{3}{4}], [\frac{3}{5}, 1]$
Oh, right, thanks for your clarification. Figure 1b confused me into thinking that $$x_0$$ is originally the lowest-resolution noise, rather than the full-resolution one being downsampled.
Congratulations on your amazing work! This open-source project is truly a significant contribution to the community. I have a few questions about certain aspects of the paper and would greatly appreciate any clarification from you or anyone else who might have answers:
I appreciate any support from you all!!
The text was updated successfully, but these errors were encountered: