Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In MVIT, why does the accuracy drop significantly after I add the DETECTION.ENABLE: TRUE option? #741

Open
Sakai-developer opened this issue Nov 25, 2024 · 0 comments

Comments

@Sakai-developer
Copy link

Sakai-developer commented Nov 25, 2024

This is my config file:MVITv2_S_16x4.yaml

TRAIN:
ENABLE: False
DATASET: kinetics
BATCH_SIZE: 16
EVAL_PERIOD: 10
CHECKPOINT_PERIOD: 10
CHECKPOINT_FILE_PATH: "/home/sakai/data/sourcecode/py/mvit-v2/demo/Kinetics/MViTv2_S_16x4_k400_f302660347.pyth"
AUTO_RESUME: True
DATA:
USE_OFFSET_SAMPLING: True
DECODING_BACKEND: torchvision
NUM_FRAMES: 16
SAMPLING_RATE: 4
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 224
INPUT_CHANNEL_NUM: [3]
TRAIN_JITTER_SCALES_RELATIVE: [0.08, 1.0]
TRAIN_JITTER_ASPECT_RELATIVE: [0.75, 1.3333]
MVIT:
ZERO_DECAY_POS_CLS: False
USE_ABS_POS: False
REL_POS_SPATIAL: True
REL_POS_TEMPORAL: True
DEPTH: 16
NUM_HEADS: 1
EMBED_DIM: 96
PATCH_KERNEL: (3, 7, 7)
PATCH_STRIDE: (2, 4, 4)
PATCH_PADDING: (1, 3, 3)
MLP_RATIO: 4.0
QKV_BIAS: True
DROPPATH_RATE: 0.2
NORM: "layernorm"
MODE: "conv"
CLS_EMBED_ON: True
DIM_MUL: [[1, 2.0], [3, 2.0], [14, 2.0]]
HEAD_MUL: [[1, 2.0], [3, 2.0], [14, 2.0]]
POOL_KVQ_KERNEL: [3, 3, 3]
POOL_KV_STRIDE_ADAPTIVE: [1, 8, 8]
POOL_Q_STRIDE: [[0, 1, 1, 1], [1, 1, 2, 2], [2, 1, 1, 1], [3, 1, 2, 2], [4, 1, 1, 1], [5, 1, 1, 1], [6, 1, 1, 1], [7, 1, 1, 1], [8, 1, 1, 1], [9, 1, 1, 1], [10, 1, 1, 1], [11, 1, 1, 1], [12, 1, 1, 1], [13, 1, 1, 1], [14, 1, 2, 2], [15, 1, 1, 1]]
DROPOUT_RATE: 0.0
DIM_MUL_IN_ATT: True
RESIDUAL_POOLING: True
AUG:
NUM_SAMPLE: 2
ENABLE: True
COLOR_JITTER: 0.4
AA_TYPE: rand-m7-n4-mstd0.5-inc1
INTERPOLATION: bicubic
RE_PROB: 0.25
RE_MODE: pixel
RE_COUNT: 1
RE_SPLIT: False
MIXUP:
ENABLE: True
ALPHA: 0.8
CUTMIX_ALPHA: 1.0
PROB: 1.0
SWITCH_PROB: 0.5
LABEL_SMOOTH_VALUE: 0.1
SOLVER:
ZERO_WD_1D_PARAM: True
BASE_LR_SCALE_NUM_SHARDS: True
CLIP_GRAD_L2NORM: 1.0
BASE_LR: 0.0001
COSINE_AFTER_WARMUP: True
COSINE_END_LR: 1e-6
WARMUP_START_LR: 1e-6
WARMUP_EPOCHS: 30.0
LR_POLICY: cosine
MAX_EPOCH: 200
MOMENTUM: 0.9
WEIGHT_DECAY: 0.05
OPTIMIZING_METHOD: adamw
COSINE_AFTER_WARMUP: True
MODEL:
NUM_CLASSES: 400
ARCH: mvit
MODEL_NAME: MViT
LOSS_FUNC: soft_cross_entropy
DROPOUT_RATE: 0.5
TEST:
ENABLE: False
DATASET: kinetics
BATCH_SIZE: 64
NUM_SPATIAL_CROPS: 1
NUM_ENSEMBLE_VIEWS: 5
DATA_LOADER:
NUM_WORKERS: 8
PIN_MEMORY: True
NUM_GPUS: 1
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: .
DETECTION:
ENABLE: True

DEMO:
ENABLE: True
VIS_MODE: "top-k"
THREAD_ENABLE: False
LABEL_FILE_PATH: "/home/sakai/data/sourcecode/py/mvit-v2/demo/Kinetics/kinetics_classnames.json" # Add local label file path here.
OUTPUT_FILE:
INPUT_VIDEO:
DETECTRON2_CFG: "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
DETECTRON2_WEIGHTS: detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl

this is the demo
截图 2024-11-25 09-42-11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant