[Paper Reading 4] Dynamic Routing between Capsules (2)

2022. 3. 21. 22:44

1. Dynamic Routing

i) Algorithm

capsule이 vector라는 사실은 아주 강력한 dynamic routing을 가능하게 함으로써, 위 layer parent capsule에 적절히 assign 되도록 한다. routing 시작 단계에서는 output이 모든 parent에게 routing 되고, 이 weight인 coupling coefficient는 softmax에 의해 sum to 1으로 scale down 된다. 각 parent마다 capsule은 "prediction vector"와 그 자신의 weight matrix를 곱한다. 이는 prediction vector이고, 이 값이 parent capsule과의 곱을 통해 큰 scalar 값을 갖는다면, coupling coefficient weight 값을 높여, child-parent의 연결 관계를 강화하도록 top-down feedback이 내려진다. 이는 "routing-by-agreement" 기작으로 max-pooling보다 훨씬 효율적이다(max-pooling이 local pool로서 가장 active feature만 남기고 나머지는 무시해 버리기 때문).

다시 수식을 따라 이해해 보자.

s_j는 "prediction vector"인 u^의 weighted sum(c_ij)이다. 그리고 이 prediction vector u^은 W*u, 즉 transformation matrix와 below layer casule(u)의 곱으로 계산된다.

routing 전 coupling coefficient값은 모두 softmax(0, 0,... , 0)으로, 모두 동일 값을 갖는다. 하지만 이는 routing을 돌면서 agreement u^*v값에 의해 update 된다. 이 agreement는 log likelihood와 같이 dot product를 통해 similarity가 높을수록 큰 값을 갖는다.

ii) Squash function

우리는 vector의 length는 entity가 current input에 존재하는가를 probability로 나타내고, vector의 orientation은 entity properties를 나타냄과 동시에, vector 값 자체가 exploding(vector 내의 값들이 너무 큰 경우) 하지 않도록 하기 위한 non-linear function이 필요하다.

"squashing" function은 vector의 고유한 특성을 보존함과 동시에, scale-down 하는 역할을 하는 non-linear function이다. short vector의 length는 0으로 수렴하며, long vector는 아무리 커도 1을 넘지 않는다.

2. Margin Loss

capsule vector의 length가 entity의 존재 유무를 확률적으로 나타내기 때문에, top-level capsule digit class k는 (image에 class k의 entity가 존재한다는 가정 하에) long instantiation vector를 가지게 된다. T_k는 class k가 존재할 때 1이고, m+=0.9, m-=0.1이다. 람다 값은 존재하지 않는 digit class에 대한 down weighting이며, 이는 모든 digit capsule의 vector length가 소멸하지 않도록 막는다.

3. Experiment

i) MNIST

Capsule Representation

Robustness to Affine Transformations (affNIST)

ii) multi-MNIST

겹쳐 있는 여러 digit을 parallel attention mechanism을 통해 식별할 수 있다 (segmenting).

iii) Other Datasets

CIFAR10이나 smallNORB(MNIST)에도 CapsNet을 적용해 보았고, 초기 연구 단계임에도 결과는 긍정적이다.

4. Conclusion & Summary

Neural Net의 전통적 output인 scalar값을 vector로 바꿈으로써 더 많은 정보를 담을 수 있다.
CNN-MaxPool의 최대 단점인 local high active neuron만 취하는 것을 dynamic routing을 통해 neuron 전체 정보를 반영함으로써 정보 손실을 최소화하였다.
part-whole relationship을 "routing by agreement"를 통해 학습함으로써 hierarchical spatial relationship을 학습한다.

저작자표시 비영리 (새창열림)

'Image > Capsule Network' 카테고리의 다른 글

[Paper Reading 5] A Capsule Network for Traffic Speed Prediction in Complex Road Networks (3)	2022.03.22
[Paper Reading 4] Dynamic Routing between Capsules (1) (0)	2022.03.21
[Paper Reading 2] Transforming Auto-Encoder (0)	2022.03.09
[Paper Reading2] Transforming Auto-Encoder Background; Translation Equivariance vs Invariance (0)	2022.03.09
[Paper Reading 1]TRAFFIC: Recognizing Objects Using Hierarchical Reference Frame Transformations (0)	2022.03.07

인공지능 브로커