Considerations:

  1. Detecting moving object outside the label space of the object detector. (ex: a moose walking on the road, large debris rolling across the road due to wind, etc.)
    1. Unsupervised approach does not give this capability
    2. Some supervised approaches provide instance segmentations and bounding boxes(which can be fed into the tracker) to account for these objects
  2. Which static objects to filter out using motion segmentation?
    1. All
      • The most naïve method and the most dangerous one, we do not want to exclude the traffic lights, road signs, etc.
      • Easiest to implement
    2. Pedestrians + Cars:
      • Argus++ made this design choice.
      • It should make the ACAR’s job easier and there are not many instances with static pedestrians, but does have some safety concerns imo.
    3. Cars only
      • Seems the most logical option

Potential Solutions for us:

  1. Unsupervised motion segmentation
  2. Supervised motion segmentation: generally uses two inputs, RGB and optical flow
    1. Architecture one (YOLACT based); code not available

      1. YOLOACT code publicly available
        1. changes needed to get desired architecture
          1. add the motion input
          2. fuse the feature
          3. add the motion mask
        2. tweak the training
          1. train semantic head for k steps
          2. train motion head for k step
          3. alternatively replace the semantic head with the motion head
            1. no need to alter the training loop

      Untitled

    2. Architecture two (SOLO based)

      • No image
      • Similar to one; has code available
      • 5fps tho; with better accuracy
    3. Architecture three (SMS Net; fully convolutional)

      1. Code publicly available; uses tf
      2. Simplest to implement (7 fps)
      3. only detects moving cars.
      4. Can use it as a litmus test with the detector and tracker to see if motion segmentation is worth it.
      5. Trivial task of converting semantic masks to bounding boxes. Is it?

      Untitled

Action Plan:

  1. Try SMSNet
  2. Use off the shelf flownet2 for optical flow computation
  3. Create a custom evaluation script
    1. compares the masks with the bboxes and eliminates boxes which are static
      1. iou threshold
      2. also have semantic labels for further identification
    2. compare with ground truth detection on ROAD
  4. If good, start with YOLACT based architecture
    1. input and output same
    2. faster speed