Skip to content

gshashank84/RegNet

Repository files navigation

RegNet

Implementation of RegNet (In Tensorflow)

Reference Paper: "Designing Network Design Spaces"

Below is the short summary of the reference paper:


  • Network:

    • stem
    • body
    • head
  • body:

    • stage1
    • stage2
    • stage3
    • stage4
  • stagei

    • block1
    • block2
    • block3
    • ...
    • blockdi
  • stem

    • 3x3 Conv stride=2 filters=w0 (32)
  • head

    • AvgPool
    • Dense units=n (for n classes)
  • stage parameter

    • number of blocks di
  • block parameters

    • width wi
    • bottleneck ration bi
    • group width gi
  • All the blocks are identical except the first block

  • The first block uses stride=2 Conv.

  • wi refers to number of channels (in a block)

  • r,r refers resolution/ width and height of feature map outputs

  • body contains only 4 stages


  • AnyNetXa:

    • any possible model within its parameters combinations
  • AnyNetXb:

    • bottleneck ratio bi is fixed across all stages
  • AnyNetXc:

    • group width gi is fixed across all stages
  • AnyNetXd:

    • stage width wi+1 is greater than previous width wi
  • AnyNetXe:

    • stage depth di+1 is greater than previous depth di
  • RegNet:

    • per block width wj, where j is index of blocks.
    • observations:
      • found that good models in design space have linear fit for block width wj with their position j
      • wj = 48*(j+1) for 0<=j<=20
    • Proposed approach:
      • d total depth, j index of block position
      • uj = w0 + wa*j for 0<=j<d (Eqn1)
      • w0 is initial width (>0)
      • wa slope (>0)
      • we introduce another additional parameter wm (>0)
      • given uj, wm now find value of sj such that it satisfies the following eqn
      • uj = w0* (wm)**sj (Eq2)
      • compute sj for each block j
      • to quantize wj we round off sj
      • i.e. [sj] (rounded off)
      • Now we compute per block width wj by
      • wj = w0* (wm)**[sj] (Eqn3)
    • 6 parameters:
      • d, w0, wa, wm, b, g
    • Sampled models have constraints:
      • d < 64
      • w0, wa < 256
      • 1.5 <= wm <= 3
      • b <= 2
      • g > 1
    • good model observed parameters:
      • wm =2
      • w0 = wa
      • observation that the third stage has higher number of blocks whereas the last stage has smaller number of blocks.
      • g increases with more large models, whereas the d saturates for large models.
  • RegNetX-200MF

    • di = [1,1,4,7]
    • wi = [24,56,152,368]
    • g = 8
    • b = 1
    • wa = 36, w0 = 24, wm =2.5
    • 2.7 Million Parameters
    • error rate 30.8%
  • RegNetX-400MF

    • di = [1,2,7,12]
    • wi = [32,64,160,384]
    • g = 16
    • b = 1
    • wa = 24, w0 = 24, wm =2.5
    • 5.2 Million Parameters
    • error rate 27.2%
  • RegNetX-600MF

    • di = [1,3,5,7]
    • wi = [48,96,240,528]
    • g = 24
    • b = 1
    • wa = 37, w0 = 48, wm =2.2
    • 6.2 Million Parameters
    • error rate 25.5%
  • RegNetX-800MF

    • di = [1,3,7,5]
    • wi = [64,128,288,672]
    • g = 16
    • b = 1
    • wa = 36, w0 = 56, wm =2.3
    • 7.3 Million Parameters
    • error rate 24.8%

About

Implementation of RegNet (Tensorflow)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published