geomstats: a Python Package for Riemannian Geometry in Machine Learning
iVQA: Inverse Visual Question Answering
This paper proposes the inverse problem of Visual question answering (iVQA), and explore its suitability as a benchmark for visuo-linguistic understanding. The iVQA task is to generate a question that corresponds to a given image and answer pair. Since the answers are less informative than the questions, and the questions have less learnable bias, an iVQA model needs to better understand the image to be successful than a VQA model. The authors pose question generation as a multi-modal dynamic inference process and propose an iVQA model that can gradually adjust its focus of attention guided by both a partially generated question and the answer. For evaluation, apart from existing linguistic metrics, we propose a new ranking metric. This metric compares the ground truth question’s rank among a list of distractors, which allows the drawbacks of different algorithms and sources of error to be studied. Experimental results show that this model can generate diverse, grammatically correct and content correlated questions that match the given answer.
A Spline Theory of Deep Networks
This paper builds a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. The key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization.
The unreasonable effectiveness of the forget gate
Given the success of the gated recurrent unit, a natural question is whether all the gates of the long short-term memory (LSTM) network are necessary. Previous research has shown that the forget gate is one of the most important gates in the LSTM. This paper shows that a forget-gate-only version of the LSTM with chrono-initialized biases, not only provides computational savings but outperforms the standard LSTM on multiple benchmark datasets and competes with some of the best contemporary models. The proposed network, the JANET, achieves accuracies of 99% and 92.5% on the MNIST and pMNIST data sets, outperforming the standard LSTM which yields accuracies of 98.5% and 91%.
Spectral Normalization for Generative Adversarial Networks
One of the challenges in the study of generative adversarial networks is the instability of its training. This paper proposes a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. A new normalization technique is computationally light and easy to incorporate into existing implementations. The authors tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 data set, and experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.
Sign up for the free insideBIGDATA newsletter.