docs: update 0412.md

서민정
Commit a9b9f8a90178da33c78c937cd5f03dfdb80bafbe a9b9f8a9 1 parent a8b3e922
Showing 4 changed files with 100 additions and 4 deletions
post/0412.md
post/images/backbone.png
post/images/d2_detail.png
post/images/output_of_fpn.png
--- a/post/0412.md
View file @a9b9f8a
+++ b/post/0412.md
View file @a9b9f8a
@@ -168,7 +168,9 @@ Input: Images
 Output: Multi-Scaled Feature Maps
-Backbone 네트워크에는 이미지들이 인풋으로 들어간다. 그리고 결과물로서는 피라미드 형태의 서로 다른 스케일을 갖는 피쳐맵을 얻을 수 있다. 백본 네트워크의 결과물로 얻어진 피쳐맵은 다음 단계인 Region Proposal Network와, ROI Heads 둘 모두의 Input으로 이용된다. 
+Backbone 네트워크에는 이미지들이 인풋으로 들어간다. 그리고 결과물로서는 서로 다른 스케일을 갖는 이미지로부터 피라미드 형태의 피쳐맵을 얻을 수 있다. Base-RCNN-FPN의 결과 피쳐들은 각각 P2(1/4 크기), P3(1/8), P4(1/16), P5(1/32), P6(1/64)라 불린다. 백본 네트워크의 결과물로 얻어진 피쳐맵은 다음 단계인 Region Proposal Network와, ROI Heads 둘 모두의 Input으로 이용된다. 
+
+`Q. Note that non-FPN (‘C4’) architecture’s output feature is only from the 1/16 scale. 는 어떤 의미인지 이해가 잘 되지 않는다.`
 * Region Proposal Network
@@ -176,7 +178,7 @@ Input: Mutli-Scaled Feature Maps
 Output: Object Regions(Region Proposals)
-Region Proposal Network에서는 Multi-Scaled Feature Maps를 바탕으로 Object Region을 얻는 과정을 거친다. 해당 물체가 위치한 박스를 얻는다고 보면 될 것 같다. 이 단계에서 얻은 결과는 ROI Heads의 Input으로도 이용된다.
+Region Proposal Network에서는 Multi-Scaled Feature Maps를 바탕으로 Object Region을 얻는 과정을 거친다. 1000개의 box proposals를 confidence score와 함께 얻는다. 이 단계에서 얻은 결과는 ROI Heads의 Input으로도 이용된다.
 * ROI Heads
@@ -187,11 +189,105 @@ Output: Box
 RPN과 매우 유사하지만 더 fine-tuned 된 박스를 얻어내는 과정이다. 
 ![d2_detail](./images/d2_detail.png)
-https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd 여기서부터 내일!
+## Structure of the detectron2 repository
+
+Detectron2의 레포지토리 구조는 다음과 같이 이루어져있다.
+```
+// copy from Digging into Detectron2
+detectron2
+├─checkpoint  <- checkpointer and model catalog handlers
+├─config      <- default configs and handlers
+├─data        <- dataset handlers and data loaders
+├─engine      <- predictor and trainer engines
+├─evaluation  <- evaluator for each dataset
+├─export      <- converter of detectron2 models to caffe2 (ONNX)
+├─layers      <- custom layers e.g. deformable conv.
+├─model_zoo   <- pre-trained model links and handler
+├─modeling   
+│  ├─meta_arch <- meta architecture e.g. R-CNN, RetinaNet
+│  ├─backbone  <- backbone network e.g. ResNet, FPN
+│  ├─proposal_generator <- region proposal network
+│  └─roi_heads <- head networks for pooled ROIs e.g. box, mask heads
+├─solver       <- optimizer and scheduler builders
+├─structures   <- structure classes e.g. Boxes, Instances, etc
+└─utils        <- utility modules e.g. visualizer, logger, etc
+```
+
+1. Backbone Network:
+    FPN (backbone/fpn.py)
+    └ ResNet (backbone/resnet.py)
+2. Region Proposal Network:
+    RPN(proposal_generator/rpn.py)
+    ├ StandardRPNHead (proposal_generator/rpn.py)
+    └ RPNOutput (proposal_generator/rpn_outputs.py)
+3. ROI Heads (Box Head):
+    StandardROIHeads (roi_heads/roi_heads.py)
+    ├ ROIPooler (poolers.py)
+    ├ FastRCNNConvFCHead (roi_heads/box_heads.py)
+    ├ FastRCNNOutputLayers (roi_heads/fast_rcnn.py)
+    └ FastRCNNOutputs (roi_heads/fast_rcnn.py)
+
+
+## Deeper into the Backbone Network
+> Backbone 네트워크의 역할은 input image로부터 feature를 추출하는 것이다.
+![backbone](./images/backbone.png)
+
+Backbone 네트워크의 **input**은 (B, 3, H, W) image이다. B, H, W는 batch 크기, 이미지의 높이 및 너비를 각각 나타낸다. 주의해야할 것은 input color channel의 순서가 RGB가 아닌 BGR이라는 점이다. RGB이미지를 넣었을 때, BGR에 비해 성능이 더 좋지 않은 결과를 얻을 것이다.
+
+**output**은 (B,C,H/S, W/S) feature maps이다. C와 S는 각각 채널의 크기와 stride를 의미한다.
+
+예를 들어, 높이 800, 너비 1280의 input image를 backbone 네트워크에 넣었을때, 결과물은 다음과 같이 나타난다.
+```
+# By default, C = 256
+output["p2"].shape -> torch.Size([1, 256, 200, 320]) # stride = 4 
+output["p3"].shape -> torch.Size([1, 256, 100, 160]) # stride = 8
+output["p4"].shape -> torch.Size([1, 256, 50, 80])   # stride = 16
+output["p5"].shape -> torch.Size([1, 256, 25, 40])   # stride = 32
+output["p6"].shape -> torch.Size([1, 256, 13, 20])   # stride = 64
+```
+
+FPN을 통해 얻어진 feature를 시각화한 결과는 다음과 같다. P6는 P2에 비해 더 큰 receptive field를 가진다는 것을 결과를 통해 확인할 수 있다. 즉, FPN은 multi-scale의 feature maps를 뽑아낼 수 있다.
+![output of fpn](./images/output_of_fpn.png)
+
+Backbone 네트워크를 구성하고 있던 ResNet과, ResNet을 통한 FPN의 자세한 구조는 우선은 다음으로 미루겠다. (너무 어렵다 😥)
+
+## How to load ground truth from a dataset and how the loaded data are processed before being fed to the network
+Base-RCNN-FPN에서 ground truth data는 RPN과 Box Head에 사용된다. 
+Annotation된 데이터는 Box label(물체의 위치와 사이즈를 나타낸다.)과 Category label(object class id)를 포함한다. 여기서 Category label은 ROI Heads를 위해 사용된다. 그 이유는 RPN은 object의 카테고리 분류를 학습하지 않기 때문이다.
+
+#### Loading annotation data
+
+#### Mapping data
+
+## Deeper into the Region Proposal Network
+
+## Deeper into the ROI(Box) Head
+
 # Tutorial of Detectron2
-https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5#scrollTo=h9tECBQCvMv3
+하나 하나 설명을 적지 않으면 나중에 다시 까먹곤 해서, 우선 [Detectron2에서 공식적으로 제공하는 튜토리얼](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5#scrollTo=QHnVupBBn9eR)을 코드 블럭마다 쪼개어 설명을 달아보려고 한다.
+
+우선, dependency를 설치해준다. detectron2는 pytorch base이므로, 필요한 패키지들을 import 한다.
+```python
+# install dependencies: 
+!pip install pyyaml==5.1
+import torch, torchvision
+print(torch.__version__, torch.cuda.is_available())
+!gcc --version
+# opencv is pre-installed on colab
+```
+
+이제, detectron2를 설치해주자. 
+```python
+# install detectron2: (Colab has CUDA 10.1 + torch 1.8)
+# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
+import torch
+assert torch.__version__.startswith("1.8")   # need to manually install torch 1.8 if Colab changes its default version
+!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
+# exit(0)  # After installation, you need to "restart runtime" in Colab. This line can also restart runtime
+```
+
 ### Reference
--- a/post/images/backbone.png 0 → 100644
View file @a9b9f8a
+++ b/post/images/backbone.png 0 → 100644
View file @a9b9f8a
--- a/post/images/d2_detail.png
View file @a9b9f8a
+++ b/post/images/d2_detail.png
View file @a9b9f8a
--- a/post/images/output_of_fpn.png 0 → 100644
View file @a9b9f8a
+++ b/post/images/output_of_fpn.png 0 → 100644
View file @a9b9f8a