Natural Language Processing | Software Development

Why do I need to learn about nature language processing?

Natural language processing (NLP) has become increasingly interesting, with breakthrough achievements such as speech recognition, speech synthesis, autonomous driving, and chatbots.

Nowadays, a key skill for software developers is the ability to use NLP algorithms, foundation models, and tools to solve real-world problems involving text, audio, natural language sentences, and speech.

What can I do after finishing learning about nature language processing?

You will be to create software that could recognize speech, translate text to speech, translate a sentence from English to French, answer a customer’s question.

That sounds fun! What should I do now?

First, please take a quick look at the following two books to grasp the core concepts and classical methods in natural language processing:

After that, please audit this course, Sequence Models, to obtain the core concepts and hands-on experience with sequence models.

After that please watch these videos to learn about audio signal processing for machine learning.

After that, please audit the courses below to learn how to understand and generate natural language using deep learning models:

After that, please read these books below to learn how to use large lanugage models to build NLP applications:

Terminology Review:

Natural Language Processing.
Text Classification (e.g. Spam Detection).
Named Entity Recognition.
Chatbots.
Speech Processing.
Speech Recognition.
Speech Synthesis.
Machine Translation.
Corpus: A body of texts.
Token: a word or a number or a punctuation mark.
Collocation: compounds (e.g. disk drive), phrasal verbs (e.g. make up), and other stock phrases (e.g. bacon and eggs).
Unigram: word.
Bigrams: pairs of words that occur commonly.
Trigrams: 3 words that occur commonly.
N-grams: n words that occur commonly.
∞×∞
Hypothesis Testing.
t-Test.
Likelihood Ratios.
Language Model: statistical model of word sequences.
Naive Bayes.
Hidden Markov Models.
∞×∞
Bag-of-Words Model.
Term Frequency–Inverse Document Frequency (TF–IDF).
Bag-of-n-Grams.
One-Hot Representation: You have a vocabulary of n words and you represent each word using a vector that is n bits long, in which all bits are zero except for one bit that is set to 1.
Word Embedding (Featurized Representation) is the transformation from words to dense vector.
Euclidean Distance, Dot Product Similarity, Cosine Similarity.
Embedding Matrix.
Neural Language Model.
Word2Vec: Skip-Gram Model, Bag-of-Words Model.
Negative Sampling.
GloVe, Global Vectors.
∞×∞
Recurrent Neural Networks.
Backpropagation Through Time.
Recurrent Neural Net Language Model (RNNLM).
Gated Recurrent Unit (GRU).
Long Short Term Memory (LSTM).
Bidirectional RNN.
Deep RNNs.
Sequence to Sequence Model.
Teacher Forcing.
Image Captioning.
Greedy Search.
Beam Search, Length Normalization.
BLEU (BiLingual Evaluation Understudy) Score.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Score.
F1 Score.
Minimum Bayes-Risk.
∞×∞
Attention Mechanism.
Self-Attention (Scaled and Dot-Product Attention): Queries, Keys and Values.
Positional Encoding.
Masked Self-Attention.
Multi-Head Attention.
Layer Normalization.
Residual Dropout.
Sparse Attention.
Label Smoothing.
Transformer Encoder.
Transformer Decoder.
Transformer Encoder-Decoder.
Cross-Attention.
Byte Pair Encoding.
BERT (Bidirectional Encoder Representations from Transformers).
Mixture-of-Experts Layer.
∞×∞
Natural Language Generation.
Random Sampling.
Top-k Sampling.
Nucleus Sampling.
Temperature.
∞×∞
Chinchilla Scaling Laws.
Distributed Data Parallel.
Zero Redundancy Optimizer (ZeRO).
FlashAttention.
Mixed Precision Training.
∞×∞
Supervised Fine-Tuning.
Instruction Tuning.
LoRA (Low-Rank Adaptation): Freezing the pretrained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture.
Quantized LoRA.
∞×∞
Preference Tuning.
RLHF (Reinforcement Learning from Human Feedback).
Generalized Advantage Estimation.
Proximal Policy Optimization: PPO-Clip, PPO-KL Penalty.
Direct Preference Optimization (DPO).
∞×∞
Pass@k.
Group Relative Policy Optimization (GRPO).
Dynamic Sampling Policy Optimization (DAPO).
∞×∞

After finishing natural language processing, please click on Topic 25 – Introduction to Distributed Systems to continue.

Software Development

Category Archives: Natural Language Processing

Topic 24 – Introduction to Nature Language Processing

Software development and software engineering research