vimco_estimator¶

tfsnippet.vimco_estimator(log_values, latent_log_joint, axis=None, keepdims=False, name=None)¶

Derive the gradient estimator for $\mathbb{E}_{q(\mathbf{z}^{(1:K)}|\mathbf{x})}\Big[\log \frac{1}{K} \sum_{k=1}^K f\big(\mathbf{x},\mathbf{z}^{(k)}\big)\Big]$, by VIMCO (Minh and Rezende, 2016) algorithm.

\[\begin{aligned}\]

&nabla,mathbb{E}_{q(mathbf{z}^{(1:K)}|mathbf{x})}Big[log frac{1}{K} sum_{k=1}^K fbig(mathbf{x},mathbf{z}^{(k)}big)Big] \

&quad = mathbb{E}_{q(mathbf{z}^{(1:K)}|mathbf{x})}bigg[{sum_{k=1}^K hat{L}(mathbf{z}^{(k)}|mathbf{z}^{(-k)}) , nabla log q(mathbf{z}^{(k)}|mathbf{x})}bigg] +: mathbb{E}_{q(mathbf{z}^{(1:K)}|mathbf{x})}bigg[{sum_{k=1}^K widetilde{w}_k,nablalog f(mathbf{x},mathbf{z}^{(k)})}bigg]

end{aligned}

where $w_k = f\big(\mathbf{x},\mathbf{z}^{(k)}\big)$, $\widetilde{w}_k = w_k / \sum_{i=1}^K w_i$, and:

\[\begin{split}\begin{aligned} \hat{L}(\mathbf{z}^{(k)}|\mathbf{z}^{(-k)}) &= \hat{L}(\mathbf{z}^{(1:K)}) - \log \frac{1}{K} \bigg(\hat{f}(\mathbf{x},\mathbf{z}^{(-k)})+\sum_{i \neq k} f(\mathbf{x},\mathbf{z}^{(i)})\bigg) \\ \hat{L}(\mathbf{z}^{(1:K)}) &= \log \frac{1}{K} \sum_{k=1}^K f(\mathbf{x},\mathbf{z}^{(k)}) \\ \hat{f}(\mathbf{x},\mathbf{z}^{(-k)}) &= \exp\big(\frac{1}{K-1} \sum_{i \neq k} \log f(\mathbf{x},\mathbf{z}^{(i)})\big) \end{aligned}\end{split}\]

Args:

log_values: Log values of the target function given z and x, i.e.,

$\log f(\mathbf{z},\mathbf{x})$.

latent_log_joint: Values of $\log q(\mathbf{z}|\mathbf{x})$. axis: The sampling axes to be reduced in outputs. keepdims (bool): When axis is specified, whether or not to keep

the reduced axes? (default False)

name (str): Default name of the name scope.

If not specified, generate one according to the method name.

Returns:

tf.Tensor: The surrogate for optimizing the original target.

Maximizing/minimizing this surrogate via gradient descent will effectively maximize/minimize the original target.