vimco_estimator

tfsnippet.vimco_estimator(log_values, latent_log_joint, axis=None, keepdims=False, name=None)

Derive the gradient estimator for \(\mathbb{E}_{q(\mathbf{z}^{(1:K)}|\mathbf{x})}\Big[\log \frac{1}{K} \sum_{k=1}^K f\big(\mathbf{x},\mathbf{z}^{(k)}\big)\Big]\), by VIMCO (Minh and Rezende, 2016) algorithm.

\[\begin{aligned}\]
&nabla,mathbb{E}_{q(mathbf{z}^{(1:K)}|mathbf{x})}Big[log frac{1}{K} sum_{k=1}^K fbig(mathbf{x},mathbf{z}^{(k)}big)Big] \
&quad = mathbb{E}_{q(mathbf{z}^{(1:K)}|mathbf{x})}bigg[{sum_{k=1}^K hat{L}(mathbf{z}^{(k)}|mathbf{z}^{(-k)}) , nabla log q(mathbf{z}^{(k)}|mathbf{x})}bigg] +
mathbb{E}_{q(mathbf{z}^{(1:K)}|mathbf{x})}bigg[{sum_{k=1}^K widetilde{w}_k,nablalog f(mathbf{x},mathbf{z}^{(k)})}bigg]

end{aligned}

where \(w_k = f\big(\mathbf{x},\mathbf{z}^{(k)}\big)$, $\widetilde{w}_k = w_k / \sum_{i=1}^K w_i\), and:

\[\begin{split}\begin{aligned} \hat{L}(\mathbf{z}^{(k)}|\mathbf{z}^{(-k)}) &= \hat{L}(\mathbf{z}^{(1:K)}) - \log \frac{1}{K} \bigg(\hat{f}(\mathbf{x},\mathbf{z}^{(-k)})+\sum_{i \neq k} f(\mathbf{x},\mathbf{z}^{(i)})\bigg) \\ \hat{L}(\mathbf{z}^{(1:K)}) &= \log \frac{1}{K} \sum_{k=1}^K f(\mathbf{x},\mathbf{z}^{(k)}) \\ \hat{f}(\mathbf{x},\mathbf{z}^{(-k)}) &= \exp\big(\frac{1}{K-1} \sum_{i \neq k} \log f(\mathbf{x},\mathbf{z}^{(i)})\big) \end{aligned}\end{split}\]
Args:
log_values: Log values of the target function given z and x, i.e.,
\(\log f(\mathbf{z},\mathbf{x})\).

latent_log_joint: Values of \(\log q(\mathbf{z}|\mathbf{x})\). axis: The sampling axes to be reduced in outputs. keepdims (bool): When axis is specified, whether or not to keep

the reduced axes? (default False)
name (str): Default name of the name scope.
If not specified, generate one according to the method name.
Returns:
tf.Tensor: The surrogate for optimizing the original target.
Maximizing/minimizing this surrogate via gradient descent will effectively maximize/minimize the original target.