Faster RCNN

Deep Learning/Object Detection

Faster RCNN

Naranjito 2024. 1. 30. 18:27

Faster RCNN

1. CNN(VGG16) : Input the image to ConvNet, get the Feature map

2. RPN : Extract the Region Proposals through Anchor Box

3. RoI Projection : Get the RoI Feature map through RoI Pooling

4. Fast R-CNN : Input the RoI Feature map by Alternating Training

Anchor box

A method for capturing objects of various sizes, same concept as a bounding box but with a predefined different scale and axpect ratio.

It predefines a total of nine different anchor boxes with three scales ([128, 256, 512]) and three aspect ratios ([1:1, 1:2, 2:1]).

Predefined Anchor box

w \times h = s^{2}

w = \frac{1}{2} \times h

\frac{1}{2} \times h^{2} = s^{2}

h = \sqrt{2 s^{2}}

w = \frac{\sqrt{2 s^{2}}}{2}

w

=width

h

=height

s

=scale

The anchor box is created based on the center of each grid cell in the original image. It fixs the anchor, based on the sub-sampling ratio in the original image. It creates nine predefined anchor boxes based on this anchor.

In the figure above, the size of the original image is 600x800, and the sub-sampling ratio=1/16. At this time, the number of anchors is 1900 (=600/16 x 800/16), and the anchor box produced a total of 17100 (=1900 x 9).

Using this method, it creates nine times more bounding boxes than using conventionally fixed-sized bounding boxes, and it is possible to capture objects of a wider variety of sizes.

RPN(Region Proposal Network)

A network that extracts region proposals from the original image.

When you create an anchor box from the original image, a lot of region proposals are created.

It gives a Class Score for region proposals and output Bounding Box Coefficient. The Class Score classifies whether or not an object is contained only.

Example of the image above.

1) VGG16 : Get the Feature Map(8 x 8) after apply the Sub-Sampling Ratio(1/100) of original image(800x800), the number of channels is 512. 8x8x512

2) Convolution : 3x3 Conv, Convolution operate on the Feature Map(1) with padding as to keep the original feature map size.

3) Get the Feature Map : 8x8x512

4) Convolution for a Class Score : 1x1 Conv, Convolution operate on the Feature Map(3). 8x8x2x9

5) Convolution for a Bounding Box Coefficient : 1x1 Conv, Convolution operate on the Feature Map(3). 8x8x4x9

Classifier

p^{*} = {\begin{cases} 1 & if I o U > 0.7 \\ - 1 & if I o U < 0.3 \\ 0 & if otherwise \end{cases}

1 : Positive, there is an object

-1 : Negative, there is no object

0 : neglect

Bounding Box Regression

- Bounding box regression

t

It uses 4 coordinate values(

t

t

is a vector.

\begin{aligned} t_{x} & = (x - x_{a}) / w_{a} \\ t_{y} & = (y - y_{a}) / h_{a} \\ t_{w} & = \log (w / w_{a}) \\ t_{h} & = \log (h / h_{a}) \end{aligned}

t_{x}, t_{y}

: 박스의 center coordinates

t_{w}, t_{h}

: 박스의 width, height

x, y, w, h

: predicted box

x_{a}, y_{a}, w_{a}, h_{a}

: anchor box

- Ground-truth vector

t^{*}

\begin{aligned} t_{x}^{*} & = (x^{*} - x_{a}) / w_{a} \\ t_{y}^{*} & = (y^{*} - y_{a}) / h_{a} \\ t_{w}^{*} & = \log (w^{*} / w_{a}) \\ t_{h}^{*} & = \log (h^{*} / h_{a}) \end{aligned}

x^{*}, y^{*}, w^{*}, h^{*}

: ground-truth box

Multi-task loss

L ({p_{i}}, {t_{i}}) = \frac{1}{N_{c l s}} \sum_{i} L_{c l s} (p_{i}, p_{i}^{*}) + λ \frac{1}{N_{r e g}} \sum_{i} p_{i}^{*} L_{r e g} (t_{i}, t_{i}^{*})

i

: mini-batch 내의 anchor의 index

p_{i}

: anchor

i

에 객체가 포함되어 있을 예측 확률

p_{i}^{*}

: anchor가 양성일 경우 1, 음성일 경우 0을 나타내는 index parameter

t_{i}

: 예측 bounding box의 파라미터화된 좌표(coefficient)

t_{i}^{*}

: ground truth box의 파라미터화된 좌표

L_{c l s}

: Loss loss

L_{r e g}

: 경계 박스 회귀 손실, 회귀 손실값은 positive 앵커 박스일 때만(객체 일 때만) 활성화됩니다. negative일 때, 곧 배경일 때는 경계 박스를 구할 필요 없으니까요.

N_{c l s}

: mini-batch의 크기(논문에서는 256으로 지정)

N_{r e g}

: anchor 위치의 수(논문에서는 2400으로 지정)

λ

: balancing parameter(default=10)

Training Faster R-CNN

1) Feature extraction by pre-trained VGG16

- Input : 800x800x3 sized image

- Process : feature extraction by pre-trained VGG16, sub-sampling ratio is 1/16

- Output : 50x50x512 sized feature map

2) Generate Anchors by Anchor generation layer

Before extracting region profiles, the process of creating an anchor box for the original image is required.

- Input : 800x800x3 sized image

- Process : generate anchors

- Output : 22500(=50x50x9) anchor boxes

3) Class scores and Bounding box regressor by RPN

- Input : 50x50x512 sized feature map

- Process : Region proposal by RPN

- Output : class scores(50x50x2x9 sized feature map) and bounding box regressors(50x50x4x9 sized feature map)

4) Select anchors for training Fast R-CNN

- Input : top-N ranked anchor boxes(after apply Non maximum suppression to remove inappropriate objects), ground truth boxes(positive if IoU is over 0.5, negative if IoU is between 0.1 and 0.5)

- Process : select region proposals for training Fast R-CNN

- Output : positive/negative samples with target regression coefficients

5) Max pooling by RoI pooling

- Input : 50x50x512 sized feature map

- positive/negative samples with target regression coefficients

- Process : RoI pooling

- Output : 7x7x512 sized feature map

6) Train Fast R-CNN by Multi-task loss

- Input : 7x7x512 sized feature map

- Process

- feature extraction by fc layer

- classification by Classifier

- bounding box regression by Bounding box regressor

- Train Fast R-CNN by Multi-task loss

- Output : loss(Loss loss + Smooth L1 loss)

https://herbwood.tistory.com/10

https://bkshin.tistory.com/entry/%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0-Faster-R-CNN-%ED%86%BA%EC%95%84%EB%B3%B4%EA%B8%B0

https://incredible.ai/deep-learning/2018/03/17/Faster-R-CNN/#training-rpn

https://towardsdatascience.com/understanding-and-implementing-faster-r-cnn-a-step-by-step-guide-11acfff216b0

저작자표시

'Deep Learning > Object Detection' 카테고리의 다른 글

(prerequisite-YOLO) One Stage Object Detection (0)	2024.02.11
(prerequisite-YOLO) DarkNet (0)	2024.02.11
Fast RCNN (0)	2024.01.29
(prerequisite-Fast R-CNN) Truncated SVD (0)	2024.01.26
SPPNet (0)	2024.01.24

현재글Faster RCNN

Sigmoid function, kafka, Regular Expression, Filter, zeros, docker-compose, textdistance, nvidia-smi, cross-entropy, global variable, abstractmethod, yield from, classmethod, selectall, batch size, Step Function, forward propagation, d3js, axis, randn,

¡Hola, Mundo!

Faster RCNN

'Deep Learning > Object Detection' 카테고리의 다른 글

'Deep Learning/Object Detection'의 다른글

티스토리툴바

« 2024/07 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Faster RCNN

'Deep Learning > Object Detection' 카테고리의 다른 글

'Deep Learning/Object Detection'의 다른글

관련글

티스토리툴바