vimco_estimator¶
-
tfsnippet.
vimco_estimator
(log_values, latent_log_joint, axis=None, keepdims=False, name=None)¶ Derive the gradient estimator for \(\mathbb{E}_{q(\mathbf{z}^{(1:K)}|\mathbf{x})}\Big[\log \frac{1}{K} \sum_{k=1}^K f\big(\mathbf{x},\mathbf{z}^{(k)}\big)\Big]\), by VIMCO (Minh and Rezende, 2016) algorithm.
\[\begin{aligned}\]- &nabla,mathbb{E}_{q(mathbf{z}^{(1:K)}|mathbf{x})}Big[log frac{1}{K} sum_{k=1}^K fbig(mathbf{x},mathbf{z}^{(k)}big)Big] \
- &quad = mathbb{E}_{q(mathbf{z}^{(1:K)}|mathbf{x})}bigg[{sum_{k=1}^K hat{L}(mathbf{z}^{(k)}|mathbf{z}^{(-k)}) , nabla log q(mathbf{z}^{(k)}|mathbf{x})}bigg] +
- mathbb{E}_{q(mathbf{z}^{(1:K)}|mathbf{x})}bigg[{sum_{k=1}^K widetilde{w}_k,nablalog f(mathbf{x},mathbf{z}^{(k)})}bigg]
end{aligned}
where \(w_k = f\big(\mathbf{x},\mathbf{z}^{(k)}\big)$, $\widetilde{w}_k = w_k / \sum_{i=1}^K w_i\), and:
\[\begin{split}\begin{aligned} \hat{L}(\mathbf{z}^{(k)}|\mathbf{z}^{(-k)}) &= \hat{L}(\mathbf{z}^{(1:K)}) - \log \frac{1}{K} \bigg(\hat{f}(\mathbf{x},\mathbf{z}^{(-k)})+\sum_{i \neq k} f(\mathbf{x},\mathbf{z}^{(i)})\bigg) \\ \hat{L}(\mathbf{z}^{(1:K)}) &= \log \frac{1}{K} \sum_{k=1}^K f(\mathbf{x},\mathbf{z}^{(k)}) \\ \hat{f}(\mathbf{x},\mathbf{z}^{(-k)}) &= \exp\big(\frac{1}{K-1} \sum_{i \neq k} \log f(\mathbf{x},\mathbf{z}^{(i)})\big) \end{aligned}\end{split}\]- Args:
- log_values: Log values of the target function given z and x, i.e.,
- \(\log f(\mathbf{z},\mathbf{x})\).
latent_log_joint: Values of \(\log q(\mathbf{z}|\mathbf{x})\). axis: The sampling axes to be reduced in outputs. keepdims (bool): When axis is specified, whether or not to keep
the reduced axes? (defaultFalse
)- name (str): Default name of the name scope.
- If not specified, generate one according to the method name.
- Returns:
- tf.Tensor: The surrogate for optimizing the original target.
- Maximizing/minimizing this surrogate via gradient descent will effectively maximize/minimize the original target.