# Learning Machine Learning and NLP from 187 Quora Questions

*If you like this article, check out another by Robbie:*

*My Curated List of AI and Machine Learning Resources*

Quora has become a great resource for machine learning. Many top researchers are active on the site answering questions on a regular basis.

Here are some of the main AI-related topics on Quora. If you have a Quora account, you can subscribe to these topics to customize your feed.

- Computer-Science (5.6M followers)
- Machine-Learning (1.1M followers)
- Artificial-Intelligence (635K followers)
- Deep-Learning (167K followers)
- Natural-Language-Processing (155K followers)
- Classification-machine-learning (119K followers)
- Artificial-General-Intelligence (82K followers)
- Convolutional-Neural-Networks-CNNs (25K followers)
- Computational-Linguistics (23K followers)
- Recurrent-Neural-Networks (17.4K followers)

While Quora has FAQ pages for many topics (e.g. FAQ for Machine Learning), they are far from comprehensive. In this post, I’ve tried to provide a more thorough Quora FAQ for several machine learning and NLP topics.

Quora doesn’t have much structure, and many questions you find on the site are either poorly answered or extremely specific. I’ve tried to include only popular questions that have good answers on general interest topics.

- How do I learn machine learning?
- What is machine learning?
- What is machine learning in layman’s terms?
- What is the difference between statistics and machine learning?
- What machine learning theory do I need to know in order to be a successful machine learning practitioner?
- What are the top 10 data mining or machine learning algorithms?
- What exactly is a “hyperparameter” in machine learning terminology?
- How does a machine-learning engineer decide which neural network architecture (feed-forward, recurrent or CNN) to use to solve their problem?
- What’s the difference between gradient descent and stochastic gradient descent?
- How can I avoid overfitting?
- What is the role of the activation function in a neural network?
- What is the difference between a cost function and a loss function in machine learning?
- What is the difference between a parametric learning algorithm and a nonparametric learning algorithm?
- What is regularization in machine learning?
- What is the difference between L1 and L2 regularization?
- What is the difference between Dropout and Batch Normalization?
- What is an intuitive explanation for PCA?
- When and where do we use SVD?
- What is an intuitive explanation of the relation between PCA and SVD?
- Which is your favorite Machine Learning algorithm?
- What is the future of machine learning?
- What are the Top 10 problems in Machine Learning for 2017?

- What are the advantages of different classification algorithms?
- What are the advantages of using a decision tree for classification?
- What are the disadvantages of using a decision tree for classification?
- What are the advantages of logistic regression over decision trees?
- How does randomization in a random forest work?
- Which algorithm is better for non linear classification?
- What is the difference between Linear SVMs and Logistic Regression?
- How can l apply an SVM for categorical data?
- How do I select SVM kernels?
- How is root mean square error (RMSE) and classification related?
- Why is “naive Bayes” naive?

- How would linear regression be described and explained in layman’s terms?
- What is an intuitive explanation of a multivariate regression?
- Why is logistic regression considered a linear model?
- Logistic Regression: Why sigmoid function?
- When should we use logistic regression and Neural Network?
- How are linear regression and gradient descent related?
- What is the intuition behind SoftMax function?
- What is softmax regression?

- What is supervised learning?
- What does “supervision” exactly mean in the context of supervised machine learning?
- Why isn’t supervised machine learning more automated?
- What are the advantages and disadvantages of a supervised learning machine?
- What are the main supervised machine learning methods?
- What is the difference between supervised and unsupervised learning algorithms?

- How do I learn reinforcement learning?
- What’s the best way and what are the best resources to start learning about deep reinforcement learning?
- What is the difference between supervised learning and reinforcement learning?
- How does one learn a reward function in Reinforcement Learning (RL)?
- What is the Future of Deep Reinforcement Learning (DL + RL)?
- Is it possible to use reinforcement learning to solve any supervised or unsupervised problem?
- What are some practical applications of reinforcement learning?
- What is the difference between Q-learning and R-learning?
- In what way can Q-learning and neural networks work together?

- Why is unsupervised learning important?
- What is the future of deep unsupervised learning?
- What are some issues with Unsupervised Learning?
- What is unsupervised learning with example?
- Why could generative models help with unsupervised learning?
- What are some recent and potentially upcoming breakthroughs in unsupervised learning?
- Can neural networks be used to solve unsupervised learning problems?
- What is the state of the art of Unsupervised Learning, and is human-likeUnsupervised Learning possible in the near future?
- Why is reinforcement learning not considered unsupervised learning?

- What is deep learning?
- What is the difference between deep learning and usual machine learning?
- As a beginner, how should I study deep learning?
- What are the best resources to learn about deep learning?
- What is the difference between deep learning and usual machine learning?
- What’s the most effective way to get started with Deep Learning?
- Is there something that Deep Learning will never be able to learn?
- What are the limits of deep learning?
- What is next for deep learning?
- What other ML areas can replace deep learning in the future?
- What is the best back propagation (deep learning) presentation for dummies?
- Does anyone ever use a softmax layer mid-neural network rather than at the end?
- What’s the difference between backpropagation and backpropagation through time?
- What is the best visual explanation for the back propagation algorithm for neural networks?
- What is the practical usage of batch normalization in neural networks?
- In layman’s terms, what is batch normalisation, what does it do, and why does it work so well?
- Does using Batch Normalization reduce the capacity of a deep neural network?
- What is an intuitive explanation of Deep Residual Networks?
- Is fine tuning a pre-trained model equivalent to transfer learning?
- What would be a practical use case for Generative models?
- Is cross-validation heavily used in Deep Learning or is it too expensive to be used?
- What is the importance of Deep Residual Networks?
- Where is Sparsity important in Deep Learning?
- Why are Autoencoders considered a failure?
- In deep learning, why don’t we use the whole training set to compute the gradient?

- What is a convolutional neural network?
- What is an intuitive explanation for convolution?
- How do convolutional neural networks work?
- How long will it take for me to go from machine learning basics to convolutional neural network?
- Why are convolutional neural networks well-suited for image classification problems?
- Is a pooling layer necessary in CNN? Can it be replaced by convolution?
- How can the filters used in Convolutional Neural Networks be optimized or reduced in size?
- Is the number of hidden layers in a convolutional neural network dependent on size of data set?
- How can convolutional neural networks be used for non-image data?
- Can I use Convolution neural network to classify small number of data, 668 images?
- Why are CNNs better at classification than RNNs?
- What is the difference between a convolutional neural network and a multilayer perceptron?
- What makes convolutional neural network architectures different?
- What’s an intuitive explanation of 1x1 convolution in ConvNets?
- Why does the convolutional neural network have higher accuracy, precision, and recall rather than other methods like SVM, KNN, and Random Forest?
- How can I train Convolutional Neural Networks (CNN) with non symmetric images of different sizes?
- How can l choose the dimensions of my convolutional filters and pooling in convolutional neural network?
- Why would increasing the amount of training data decrease the performance of a convolutional neural network?
- How can l explain that applying max-pooling/subsampling in CNN doesn’t cause information loss?
- How do Convolutional Neural Networks develop more complex features?
- Why don’t they use activation functions in some CNNs for some last convolution layers?
- What methods are used to increase the inference speed of convolutional neural networks?
- What is the usefulness of batch normalization in very deep convolutional neural network?
- Why do we use fully connected layer at the end of a CNN instead of convolution layers?
- What may be the cause of this training loss curve for a convolution neural network?
- The convolutional neural network I’m trying to train is settling at a particular training loss value and a training accuracy just after a few epochs. What can be the possible reasons?
- Why do we use shared weights in the convolutional layers of CNN?
- What are the advantages of Fully Convolutional Networks over CNNs?
- How is Fully Convolutional Network (FCN) different from the original Convolutional Neural Network (CNN)?

- Artificial Intelligence: What is an intuitive explanation for recurrent neural networks?
- How are RNNs storing ‘memory’?
- What are encoder-decoder models in recurrent neural networks?
- Why do Recurrent Neural Networks (RNN) combine the input and hidden state together and not seperately?
- What is an intuitive explanation of LSTMs and GRUs?
- Are GRU (Gated Recurrent Unit) a special case of LSTM?
- How many time-steps can LSTM RNNs remember inputs for?
- How does attention model work using LSTM?
- How do RNNs differ from Markov Chains?
- For modelling sequences, what are the pros and cons of using Gated Recurrent Units in place of LSTMs?
- What is exactly the attention mechanism introduced to RNN (recurrent neural network)? It would be nice if you could make it easy to understand!
- Is there any intuitive or simple explanation for how attention works in the deep learning model of an LSTM, GRU, or neural network?
- Why is it a problem to have exploding gradients in a neural net (especially in an RNN)?
- For a sequence-to-sequence model in RNN, does the input have to contain only sequences or can it accept contextual information as well?
- Can “generative adversarial networks” be used in sequential data in recurrent neural networks? How effective would they be?
- What is the difference between states and outputs in LSTM?
- What is the advantage of combining Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN)?
- Which is better for text classification: CNN or RNN?
- How are recurrent neural networks different from convolutional neural networks?

- As a beginner in Natural Language processing, from where should I start?
- What is the relation between sentiment analysis, natural language processing and machine learning?
- What is the current state of the art in natural language processing?
- What is the state of the art in natural language understanding?
- Which publications would you recommend reading for someone interested in natural language processing?
- What are the basics of natural language processing?
- Could you please explain the choice constraints of the pros/cons while choosing Word2Vec, GloVe or any other thought vectors you have used?
- How do you explain NLP to a layman?
- How do I explain NLP, text mining, and their difference in layman’s terms?
- What is the relationship between N-gram and Bag-of-words in natural language processing?
- Is deep learning suitable for NLP problems like parsing or machine translation?
- What is a simple explanation of a language model?
- What is the definition of word embedding (word representation)?
- How is Computational Linguistics different from Natural Language Processing?
- Natural Language Processing: What is a useful method to generate vocabulary for large corpus of data?
- How do I learn Natural Language Processing?
- Natural Language Processing: What are good algorithms related to sentiment analysis?
- What makes natural language processing difficult?
- What are the ten most popular algorithms in natural language processing?
- What is the most interesting new work in deep learning for NLP in 2017?
- How is word2vec different from the RNN encoder decoder?
- How does word2vec work?
- What’s the difference between word vectors, word representations and vector embeddings?
- What are some interesting Word2Vec results?
- How do I measure the semantic similarity between two documents?
- What is the state of the art in word sense disambiguation?
- What is the main difference between word2vec and fastText?
- In layman terms, how would you explain the Skip-Gram word embedding model in natural language processing (NLP)?
- In layman’s terms, how would you explain the continuous bag of words (CBOW) word embedding technique in natural language processing (NLP)?
- What is natural language processing pipeline?
- What are the available APIs for NLP (Natural Language Processing)?
- How does perplexity function in natural language processing?
- How is deep learning used in sentiment analysis?

- Was Jürgen Schmidhuber right when he claimed credit for GANs at NIPS 2016?
- Can “generative adversarial networks” be used in sequential data in recurrent neural networks? How effective would they be?
- What are the (existing or future) use cases where using Generative Adversarial Network is particularly interesting?
- Can autoencoders be considered as generative models?
- Why are two separate neural networks used in Generative Adversarial Networks?
- What is the advantage of generative adversarial networks compared with other generative models?
- What are some exciting future applications of Generative Adversarial Networks?
- Do you have any ideas on how to get GANs to work with text?
- In what way are Adversarial Networks related or different to Adversarial Training?
- What are the pros and cons of using generative adversarial networks (a type of neural network)?
- Can Generative Adversarial networks use multi-class labels?

If you like this post, give it a ❤️ below so others may see it. Thank you!