Category Archives: Natural Language Processing

Topic 24 – Introduction to Nature Language Processing

Why do I need to learn about nature language processing?

Natural language processing (NLP) has become more and more interesting. Speech recognition, speech synthesis, autonomous driving and chat bots are examples of breakthrough achievements in the field.

Nowadays a key skill of software developer is the ability to use nature language processing algorithms and tools to solve real-world problems related to text, audio, natural language sentences and speech.

What can I do after finishing learning about nature language processing?

You will be to create software that could recognize speech, translate text to speech, translate a sentence from English to French, answer a customer’s question.

That sounds fun! What should I do now?

Please read
– this Daniel Jurafsky and James H. Martin (2014). Speech and Language Processing. Pearson book, and
– this Christopher D. Manning and Hinrich Schiitze (1999). Foundations of Statistical Natural Language Processing. MIT Press book first.

After that please audit these Natural Language Processing Specialization courses and this Stanford CS224N – NLP with Deep Learning, Winter 2023 course (Lecture Notes).

Terminology Review:

  • Natural Language Processing.
  • Text Classification (e.g. Spam Detection).
  • Named Entity Recognition.
  • Chatbots.
  • Speech Processing.
  • Speech Recognition.
  • Speech Synthesis.
  • Machine Translation.
  • Corpus: A body of texts.
  • Token: a word or a number or a punctuation mark.
  • Collocation: compounds (e.g. disk drive), phrasal verbs (e.g. make up), and other stock phrases (e.g. bacon and eggs).
  • Unigram: word.
  • Bigrams: pairs of words that occur commonly.
  • Trigrams: 3 words that occur commonly.
  • N-grams: n words that occur commonly.
  • Hypothesis Testing.
  • t-Test.
  • Likelihood Ratios.
  • Language Model: statistical model of word sequences.
  • Naive Bayes.
  • Hidden Markov Models.
  • Bag-of-Words Model.
  • Term Frequency–Inverse Document Frequency (TF–IDF).
  • Bag-of-n-Grams.
  • One-Hot Representation: You have a vocabulary of n words and you represent each word using a vector that is n bits long, in which all bits are zero except for one bit that is set to 1.
  • Word Embedding (Featurized Representation) is the transformation from words to dense vector.
  • Euclidean Distance, Dot Product Similarity, Cosine Similarity.
  • Embedding Matrix.
  • Neural Language Model.
  • Word2Vec: Skip-Gram Model, Bag-of-Words Model.
  • Negative Sampling.
  • GloVe, Global Vectors.
  • Recurrent Neural Networks.
  • Backpropagation Through Time.
  • Recurrent Neural Net Language Model (RNNLM).
  • Gated Recurrent Unit (GRU).
  • Long Short Term Memory (LSTM).
  • Bidirectional RNN.
  • Deep RNNs.
  • Sequence to Sequence Model.
  • Teacher Forcing.
  • Image Captioning.
  • Greedy Search.
  • Beam Search, Length Normalization.
  • BLEU (BiLingual Evaluation Understudy) Score.
  • ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Score.
  • F1 Score.
  • Minimum Bayes-Risk.
  • Attention Mechanism.
  • Self-Attention (Scaled and Dot-Product Attention): Queries, Keys and Values.
  • Positional Encoding.
  • Masked Self-Attention.
  • Multi-Head Attention.
  • Residual Dropout.
  • Label Smoothing.
  • Transformer Encoder.
  • Transformer Decoder.
  • Transformer Encoder-Decoder.
  • Cross-Attention.
  • Byte Pair Encoding.
  • BERT (Bidirectional Encoder Representations from Transformers).

After finishing learning about natural language processing please click Topic 25 – Introduction to Distributed Systems to continue.