A New Backpropagation Algorithm without Gradient Descent
FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling
This paper comes to us from researchers at IBM Research. The graph convolutional networks (GCN) recently proposed by Kipf and Welling are an effective graph model for semi-supervised learning. This model, however, was originally designed to be learned with the presence of both training and test data. Moreover, the recursive neighborhood expansion across layers poses time and memory challenges for training with large, dense graphs. To relax the requirement of simultaneous availability of test data, the authors interpret graph convolutions as integral transforms of embedding functions under probability measures. Such an interpretation allows for the use of Monte Carlo approaches to consistently estimate the integrals, which in turn leads to a batched training scheme as we propose in this work—FastGCN. Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference. The paper includes a comprehensive set of experiments to demonstrate its effectiveness compared with GCN and related models. In particular, training is orders of magnitude more efficient while predictions remain comparably accurate.
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Researchers from the CMU Computer Science department formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. The authors propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively.
Twin Networks: Matching the Future for Sequence Generation
A group of researchers including Yoshua Bengio propose a simple technique for encouraging generative RNNs to plan ahead. They train a “backward” recurrent network to generate a given sequence in reverse order, and they encourage states of the forward model to predict cotemporal states of the backward model. The backward network is used only during training, and plays no role during sampling or inference. The authors hypothesize that the approach eases modeling of long-term dependencies by implicitly forcing the forward states to hold information about the longer-term future (as contained in the backward states). The paper shows empirically that the approach achieves 9% relative improvement for a speech recognition task, and achieves significant improvement on a COCO caption generation task.
Deep Learning for Sentiment Analysis : A Survey
Deep Learning: An Introduction for Applied Mathematicians
Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective.
Measuring the tendency of CNNs to Learn Surface Statistical Regularities
Deep CNNs are known to exhibit the following peculiarity: on the one hand they generalize extremely well to a test set, while on the other hand they are extremely sensitive to so-called adversarial perturbations. The extreme sensitivity of high performance CNNs to adversarial examples casts serious doubt that these networks are learning high level abstractions in the data set. This paper is concerned with the following question: How can a deep CNN that does not learn any high level semantics of the data set manage to generalize so well? The goal of this article is to measure the tendency of CNNs to learn surface statistical regularities of the data set.
On the Origin of Deep Learning
Recent Advances in Recurrent Neural Networks
Recurrent neural networks (RNNs) are capable of learning features and long term dependencies from sequential and time-series data. The RNNs have a stack of non-linear units where at least one connection between units forms a directed cycle. A well-trained RNN can model any dynamical system; however, training RNNs is mostly plagued by issues in learning long-term dependencies. This paper presents a survey on RNNs and several new advances for newcomers and professionals in the field. The fundamentals and recent advances are explained and the research challenges are introduced.
Deep Learning: A Critical Appraisal
Although deep learning has historical roots going back decades, neither the term “deep learning” nor the approach was popular just over five years ago, when the field was reignited by papers such as Krizhevsky, Sutskever and Hinton’s now classic (2012) deep network model of Imagenet. What has the field discovered in the five subsequent years? Against a background of considerable progress in areas such as speech recognition, image recognition, and game playing, and considerable enthusiasm in the popular press, AI contrarian Gary Marcus presents ten concerns for deep learning, and suggest that deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.