PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Naranjito 2026. 4. 22. 16:50

Point Cloud

- It is a set of points sprerad in a 3-dimensional space.

- A point can be expressed as x, y, and z values.

- It has irregular characteristics meaning that the density of points is not uniform.

Various ways to express the 3D object.

Various ways to express the 3D object.

- Point Cloud : A set of points sprerad in a 3-dimensional space.

- Voxel(Volumetric Pixel) : 3D model that is expressed on the Voxel Grid.

- Mesh : 3D model that is composed with 2D triangles.

Due to the irregular nature of Point Cloud, most studies use Point Cloud by converting it into 3D Voxel or 2D images. But there are problems with the conversion process.

1. It is inefficient to convert to 3D Voxel since there is a lot of empty space in the Voxel Grid.

As shown in the figure below, it uses a 3D matrix to represent a rabbit as Voxel. However, since most of the space is empty, there is a problem that it wastes a lot of space.

2. Information is lost in the process of changing from Point Cloud to Voxel.

- This is because Point Cloud represents the rabbit more specifically than Voxel.

- This is easy to understand as if it is similar to the loss of data caused by the conversion of analog information into digital information.

What led to the development of PointNet

Due to these problems, PointNet has been developed. PointNet is a network that receives Point Cloud as it is.

What is PointNet?

A network that extracts the overall global feature of the point cloud.

The structure of PointNet

1. Fully Connected Layer(FC Layer)

The red dotted line box in the pipeline.

input points : n * 3
⬇
Multifly 3 * 3 transform(T-Net) matrix : Affine Transformation Matrix
⬇
output points : n * 3
⬇
h(Hidden Layer, Fully Connected Layer, Multi Layer Perceptron), row-by-row
⬇
output points : n * 64
⬇
h(Hidden Layer, Fully Connected Layer, Multi Layer Perceptron), column-by-column
⬇
output points : n * 1024
⬇
h(Hidden Layer, Fully Connected Layer, Multi Layer Perceptron)
⬇
Classification

2. Symmetry Function for Unordered Input

The red box in the pipeline, that is for the Object Classification.

The points in the Point Coloud set are unordered. If the PointNet has N numbers of data, the network must always output the same result for N! permutations.

In order to do that, this network uses the symmetry function that gives the same result, even if the order of the input values is different.

⬇

h : Hidden Layer(Fully Connected Layer, Multi Layer Perceptron)

g : Max Pooling (Extract representative values),

f : Approximation using h, g

⬇

Even if the order of the 4 data set is changed, the result values are always the same.

2. Local and Global Information Aggregation

The red box in the pipeline, that is for the Segmentation.

In order to segment, it should know the surrounding information. So it needs the local feature.

So, it aggregates the global (n * 1024) and local (n * 64) features. (n * 1088)

Then, it creates the vector (n * 128) through h(Hidden Layer, Fully Connected Layer, Multi Layer Perceptron) that includes both Global and Local information.

3. Joint Alignment Network

Even if the position of the object changes or the object rotates, the same result must be output.

⬇

In order to do that, PointNet uses the Joint Alignment Network.

3-1. T-Net with Input Transformation

- The first T-Net : The purpose is to put all points in a standardized space through T-Net.

input points : n * 3
⬇
Multifly 3 * 3 transform(T-Net) matrix : Affine Transformation Matrix
⬇
output points : n * 3
⬇
h(Hidden Layer, Fully Connected Layer, Multi Layer Perceptron)
⬇
output points : n * 64

∴ 3d ➔ 64d
It became 64d after the feature was extracted from 3d.

- The second T-Net : The purpose is to standardize the 64d Feature space.

input points : n * 64
⬇
Multifly 64 * 64 transform(T-Net) matrix : Feature Transformation Matrix
⬇
Apply the Softmax Loss Regularization due to the heavy matrix.
⬇
output points : n * 64

∴ 64d ➔ 64d

- Softmax Loss Regularization

https://medium.com/@parkie0517/pointnet-deep-learning-on-point-sets-for-3d-classification-and-segmentation-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0-a623ee58b359

https://www.youtube.com/watch?v=Zrn-g_LCHTc

https://arxiv.org/pdf/1612.00593

저작자표시 (새창열림)

'3D' 카테고리의 다른 글

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds (0)	2026.05.04
PointNet++: Deep Hierarchical Feature Learning onPoint Sets in a Metric Space (0)	2026.04.27

현재글PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

docker-compose, forward propagation, abstractmethod, selectall, batch size, classmethod, zeros, kafka, d3js, textdistance, yield from, randn, 3D Rotation Matrix, global variable, randint, axis, Sigmoid function, Regular Expression, nvidia-smi, Filter,

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

¡Hola, Mundo!