Data augmentation path for Kinetics dataset #737

LSKhappychild · 2024-10-25T13:49:33Z

Thanks for your great works.
I am currently focusing on the augmentation path that utilized inside Kinetics dataset.

        for i in range(num_decode):
                for _ in range(num_aug):
                    idx += 1
                    f_out[idx] = frames_decoded[i].clone()
                    time_idx_out[idx] = time_idx_decoded[i, :]

                    f_out[idx] = f_out[idx].float()
                    f_out[idx] = f_out[idx] / 255.0

                    if self.mode in ["train"] and self.cfg.DATA.SSL_COLOR_JITTER:
                        f_out[idx] = transform.color_jitter_video_ssl(
                            f_out[idx],
                            bri_con_sat=self.cfg.DATA.SSL_COLOR_BRI_CON_SAT,
                            hue=self.cfg.DATA.SSL_COLOR_HUE,
                            p_convert_gray=self.p_convert_gray,
                            moco_v2_aug=self.cfg.DATA.SSL_MOCOV2_AUG,
                            gaussan_sigma_min=self.cfg.DATA.SSL_BLUR_SIGMA_MIN,
                            gaussan_sigma_max=self.cfg.DATA.SSL_BLUR_SIGMA_MAX,
                        )

                    if self.aug and self.cfg.AUG.AA_TYPE:
                        aug_transform = create_random_augment(
                            input_size=(f_out[idx].size(1), f_out[idx].size(2)),
                            auto_augment=self.cfg.AUG.AA_TYPE,
                            interpolation=self.cfg.AUG.INTERPOLATION,
                        )
                        # T H W C -> T C H W.
                        f_out[idx] = f_out[idx].permute(0, 3, 1, 2)
                        list_img = self._frame_to_list_img(f_out[idx])
                        list_img = aug_transform(list_img)
                        f_out[idx] = self._list_img_to_frames(list_img)
                        f_out[idx] = f_out[idx].permute(0, 2, 3, 1)

Above is the flow inside Kinetics' getitem function.
After decoding backend returns decoded frames as unsigned int pixel values, above shows each frame is typecasted to float and normalized.
However, inside randaugment implementation (rand_augment.py), it uses several PIL image operations for augmentation, and some of these assume that target frame for augmentation is in form of unsigned int value (which will range from 0 to 255), for example autocontrast function.
Is this intended procedure? I'm quite confused whether it is valid to apply PIL's ImageOps to float-casted frames.

Thanks, always

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data augmentation path for Kinetics dataset #737

Data augmentation path for Kinetics dataset #737

LSKhappychild commented Oct 25, 2024

Data augmentation path for Kinetics dataset #737

Data augmentation path for Kinetics dataset #737

Comments

LSKhappychild commented Oct 25, 2024