Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data augmentation path for Kinetics dataset #737

Open
LSKhappychild opened this issue Oct 25, 2024 · 0 comments
Open

Data augmentation path for Kinetics dataset #737

LSKhappychild opened this issue Oct 25, 2024 · 0 comments

Comments

@LSKhappychild
Copy link

Thanks for your great works.
I am currently focusing on the augmentation path that utilized inside Kinetics dataset.

        for i in range(num_decode):
                for _ in range(num_aug):
                    idx += 1
                    f_out[idx] = frames_decoded[i].clone()
                    time_idx_out[idx] = time_idx_decoded[i, :]

                    f_out[idx] = f_out[idx].float()
                    f_out[idx] = f_out[idx] / 255.0

                    if self.mode in ["train"] and self.cfg.DATA.SSL_COLOR_JITTER:
                        f_out[idx] = transform.color_jitter_video_ssl(
                            f_out[idx],
                            bri_con_sat=self.cfg.DATA.SSL_COLOR_BRI_CON_SAT,
                            hue=self.cfg.DATA.SSL_COLOR_HUE,
                            p_convert_gray=self.p_convert_gray,
                            moco_v2_aug=self.cfg.DATA.SSL_MOCOV2_AUG,
                            gaussan_sigma_min=self.cfg.DATA.SSL_BLUR_SIGMA_MIN,
                            gaussan_sigma_max=self.cfg.DATA.SSL_BLUR_SIGMA_MAX,
                        )

                    if self.aug and self.cfg.AUG.AA_TYPE:
                        aug_transform = create_random_augment(
                            input_size=(f_out[idx].size(1), f_out[idx].size(2)),
                            auto_augment=self.cfg.AUG.AA_TYPE,
                            interpolation=self.cfg.AUG.INTERPOLATION,
                        )
                        # T H W C -> T C H W.
                        f_out[idx] = f_out[idx].permute(0, 3, 1, 2)
                        list_img = self._frame_to_list_img(f_out[idx])
                        list_img = aug_transform(list_img)
                        f_out[idx] = self._list_img_to_frames(list_img)
                        f_out[idx] = f_out[idx].permute(0, 2, 3, 1)

Above is the flow inside Kinetics' getitem function.
After decoding backend returns decoded frames as unsigned int pixel values, above shows each frame is typecasted to float and normalized.
However, inside randaugment implementation (rand_augment.py), it uses several PIL image operations for augmentation, and some of these assume that target frame for augmentation is in form of unsigned int value (which will range from 0 to 255), for example autocontrast function.
Is this intended procedure? I'm quite confused whether it is valid to apply PIL's ImageOps to float-casted frames.

Thanks, always

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant