- R-CNN(Regions with Convolutional Neural Networks features)
The first model to apply deep learning of object detection.
- Region Proposal : With Selective search to identify a number of bounding-box object region candidates (“regions of interest”)
- CNN : Extracts feature vectors via CNN
- SVM : Classify the feature vectors with SVM
- Bounding Box Regression : Apply the Bounding Box Regression to predict regions
Step 1) Region Proposal
R-CNN starts by dividing the input image into multiple regions or subregions. These regions are referred to as "region proposals" or "region candidates." The region proposal step is responsible for generating a set of potential regions in the image that are likely to contain objects.
R-CNN does not generate these proposals itself; instead, it relies on external methods like Selective Search or EdgeBoxes to generate region proposals.
Selective Search, for example, operates by merging or splitting segments of the image based on various image cues like color, texture, and shape to create a diverse set of region proposals.
Step 2) Warping
Approximately 2,000 regions are extracted and anisotropically warped to a consistent input size. The region size is expanded to a new size that will result in 16 pixels of context in the warped frame.
Step 3) Feature Extraction
It is passed through the CNN to extract features. The CNN used is AlexNet and it is typically fine-tuned on a large dataset like ImageNet for generic feature representation.
The output of the CNN is a high-dimensional feature vector representing the content of the region proposal.
Step 4) Object Classification
The extracted feature vectors from the region proposals are fed into Support Vector Machines (SVMs) for classification. For each class, a unique SVM is trained to determine whether or not the region proposal contains an instance of that class.
During training, positive samples are regions that contain an instance of the class. Negative samples are regions that do not.
Step 5) Bounding Box Regression
For each class, a separate regression model is trained to refine the location and size of the bounding box around the detected object. The bounding box regression helps improve the accuracy of object localization by adjusting the initially proposed bounding box to better fit the object's actual boundaries.
Step 6) Non-Maximum Suppression (NMS)
R-CNN applies non-maximum suppression to eliminate duplicate or highly overlapping bounding boxes.
https://blog.roboflow.com/what-is-r-cnn/
https://velog.io/@whiteamericano/R-CNN-%EC%9D%84-%EC%95%8C%EC%95%84%EB%B3%B4%EC%9E%90
'Deep Learning > Object Detection' 카테고리의 다른 글
SPPNet (0) | 2024.01.24 |
---|---|
(prerequisite-SPPNet) BOVW(Bag Of Visual Words) (0) | 2024.01.24 |
(prerequisite-R-CNN) Non-maximum Suppression(NMS) (0) | 2024.01.23 |
(prerequisite-R-CNN) Bounding Box Regression (0) | 2024.01.23 |
(prerequisite-R-CNN) RoI(Region of Interest), Quantization, RoI Align (0) | 2024.01.20 |