[논문 리뷰] On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

섹션 6.1에서는 tempering, noisy Dirichlet model, 2D 분류 문제에서의 BNN을 이용한 data augmentation에 대해 설명하고
섹션 6.2에서는 Gaussian process regression model을 상요한 SG-MCMC의 제한된 dist.에서의 data augmentation의 효과에 대해 시각화하고
섹션 6.3에서는 이미지 분류 문에서 BNN에 대한 결론을 말한다.

(a)는 BNN을 활용한 data와 decision boundary를 나타낸 것이다. $BNN \overset{\underset{\mathrm{i.i.d}}{}}{\sim} N(0, 0.3^2) $를 따르고, Hamiltonian Monte Carlo(HMC) 방법론을 이용해 full batch를 돌려 posterior에 대한 sample을 얻은 것이다. 보이는 것처럼 data에 정확히 fit한 모델은 아니지만, training data에 대해서는 그럴싸한 fit을 보여준다.
(b)는 $x_1, x_2$축에 대해 random한 대칭이동을 적용하여 data augmentation의 효과를 나타내고 있다.(augmented된 데이터 point는 투명하게 해놓음.) Data augmentation에서 network를 평가하기 위해, 수식 (16)에서 posterior dist.로부터 sampling하기 위해 HMC을 돌렸다. BNN은 data aug.가 없었던 (a)보다 더 낮은 수준의 성능을 보였다.
tempered likelihood를 적용한 (c), 혹은 noisy Drichlet을 적용한 (d) train data에 훨씬 더 성능이 좋게끔 만들었다. 두 방법론은 거의 완벽하게 traindata에 대해 성능을 내었고, 이것은 low aleatoric uncertainty를 가정했던 것과 부합하는 결과이다.(더 자세한 내용은 appendix H.1 참고)

그림 4는 섹션 5에서 언급되었던 수식 (16)에서 $K=1,4,10$일때의 posterior를 시각화한 것으로, data augmentation을 하지 않고, 원래 기존 data에서의 posterior를 나타낸 그림이다.
학습 데이터에서의 prediction은 augmented datapoint들과 독립이기 때문에, 수식 (16)에서의 posterior는 augmentation의 K값과 같이 warm temperature로 훈련 데이터의 posterior를 tempering하는 것과 같다.
결론적으로, likelihood를 soften하는 augmentation의 수를 크게 할수록, 훈련 데이터에 대한 confidence는 더 적어진다.

해당 절에서는 image classification에 대해서,

noisy Dirichlet model은 최적의 성능을 내는데 tempering하는것을 필요로 하지 않는다는 것과
noisy Dirichlet model과 tempered softmax likelihood는 데이터에서 label noise에 대한 우리의 믿음을 성공적으로 표현할 수 있다는 것과
data augmentation BNN의 likelihood를 부드럽게하고, 최적의 temperature는 data augmentation 정책의 복잡성에 의존한다는 것을 보였다.

No cold posterior effect in the noisy Dirichlet model

그림 5는 standard softmax classification likelihood에서 tempering posterior effect와 노이즈 파라미터를 $\alpha_{\epsilon}=10^{-6}$로 갖는 noisy Dirichlet model에 대해 나타낸 그림이다.(CIFAR-10)
How good is the bayes posterior in deep neural networks really? 논문에서 밝힌바와 같이 softmax likeihood에서는 $T=10^-3$일때, 최적의 성능을 내었지만, noisy Drichlet model에서는 tempering이 그닥 유의미한 결과를 내지 못했다.
결론적으로, 섹션 4에서 분석했듯이, tempering과 Dirichlet 모델이 aleatoric uncertainty에 대한 우리의 믿음을 표현하는 대안으로 볼 수 있다. 따라서, tempering 없이도 강력한 성능을 낼 수 있게 되었다.

Modeling label noise

분류 task에서 aleatoric uncertainty는 데이터에 존재하는 label noise의 양에 해당한다. 본 논문에서도 말했듯이, aleatoric uncertainty는 standard BNN에 의해 잘못 표시되고 있었고, likelihood tempering과 noisy Dirichlet model은 label noise의 양에 대한 정보를 알려주는 강력한 방법이다.
그림 6은 standard softmax likelihood와 tempered softmax likelihood와 noisy Dirichlet model에 대한 BMA test accuracy와 negative log-likelihood를 나타낸 것이다.
CIFAR-10 & Tiny Imagenet 데이터를 썼고, tempered softmax likelihood model에는 $T \in \{10^{-5},~10^{-4},~10^{-3},~10^{-2},~10^{-1},~1,~3,~10\}$를, noisy Dirichlet model에는 $\alpha_{\epsilon} \in \{10^{-6},~10^{-5},~10^{-4},~10^{-3},~10^{-2},~10^{-1}\}$를 적용한 그림이다.
CIFAR-10과 Tiny Imagenet 그리고 모든 label noise에 대해, tempering 또는 noisy Dirichlet model을 통해 aleatoric uncertainty를 명확히 modeling함으로써 standard softmax likelihood를 훨씬 능가하는 성능을 보였다. 게다가, 다른 양의 label noise는 다른 temerature 또는 $\alpha_{\epsilon}$으로 적용할 수 있었다.

Data augmentation leads to underfitting on train data

data augmentation을 적용했을 때와 적용하지 않았을 때에 대한 NLL을 나타낸 것인데, tempering과 $\alpha_{\epsilon}$ 파라미터 값이 클수록, data augmentation을 적용한 실험의 성능이 더 좋지 않음을 보이고 있다.
data augmentaion이 likelihood를 더 부드럽게 만들고, 이것이 fitting하는데 더 혼란을 준다는 것을 말하고 있다.

728x90

[논문 리뷰] Prediction-Oriented Bayesian Active Learning - 1. Introduction (0)	2023.04.22
[논문 리뷰] Prediction-Oriented Bayesian Active Learning - 0. Abstract (0)	2023.04.22
[논문 리뷰] On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification - 4.Aleatoric Uncertainty in Bayesian Classification (0)	2023.04.01
[논문 리뷰] On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification - 3.Background (0)	2023.02.14
[논문 리뷰] On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification - 2.Related Work (0)	2022.12.27

'개인 공부 정리/Bayesian' Related Articles

Comments

끄적거림