HuggingFace: LLM Course

0. SETUP

Commonly used libraries

  • transformers
  • datasets
  • torch

1. TRANSFORMER MODELS

Pipeline() Function

The pipeline() function in the 🤗 Transformers library simplifies using models by integrating preprocessing and postprocessing steps.

from transformers import pipeline

text = "Huggingface is awesome!"

# sentiment analysis:
e2e_model = pipeline("sentiment-analysis")
e2e_model(text)
Tip

We can pass several sentences in one go!

e2e_model([
	"I've been waiting for a HuggingFace course my whole life.", 
	"I hate this so much!"
])

Tasks and Pipeline() Compatibility

  • NLP Pipelines
Task Description Pipeline()
feature-extraction Extract vector representations of text. ✓
fill-mask Fills masked text data. ✓
question-answering Retrieve the answer to a question from a given text. ✓
sentence-similarity Determine how similar two texts are. ✗
summarization Create a shorter version of a text while preserving key information. ✓
table-question-answering Answering a question about an information on a given table. ✓
text-classification Classify text into predefined categories. ✓
text-generation Generate text from a prompt. ✓
text-ranking Rank a set of texts based on their relevance to a query. ✗
token-classification NLU task in which a label is assigned to some tokens in a text. ✓
translation Convert text from one language to another. ✓
zero-shot-classification Classify text without prior training on specific labels. ✓
  • Vision pipelines
Task Description Pipeline()
depth-estimation Estimate the depth of different objects present in an image. ✓
image-classification Identify objects in an image. ✓
image-feature-extraction Extract semantically meaningful features given an image. ✓
image-segmentation Divides an image into segments where each pixel in the image is mapped to an object ✓
image-to-image Transform image. (eg. inpainting, colorization, Super Resolution) ✓
image-to-text Generate text descriptions of images. ✓
image-to-video Generate a video influenced by text prompts. ✗
keypoint-detection Identify meaningful distinctive points or features in an image. ✗
mask-generation Generate masks that identify a specific object or region of interest in a given image. ✓
object-detection Locate and identify objects in images. ✓
video-classification Assign a label or class to an entire video. ✓
text-to-image Generating images from input text. ✗
text-to-video Generating consistent sequence of images from text. ✗
unconditional-image-generation Generating images with no condition in any context. ✗
video-to-video Transform input video into a new video with altered visual styles, motion, or content. ✗
zero-shot-image-classification Classify previously unseen classes during training of a model. ✓
zero-shot-object-detection Detect objects and their classes in images, without any prior training or knowledge of the classes. ✓
text-to-3d Text-to-3D models take in text input and produce 3D output. ✗
image-to-3d Image-to-3D models take in image input and produce 3D output. ✗
  • Audio pipelines
Task Description Pipeline()
audio-classification Classify audio into categories. ✓
audio-to-audio family of tasks in which the input is an audio and the output is one or multiple generated audios. (eg. speech enhancement, source separation, ...) ✗
automatic-speech-recognition Convert speech to text. ✓
text-to-audio Convert text to spoken audio. ✓
  • Multimodal pipelines
Task Description Pipeline()
any-to-any Understand two or more modalities and output two or more modalities. ✗
audio-text-to-text Generate textual responses or summaries based on both audio input and text prompts. ✗
document-question-answering Take a (document, question) pair as input and return an answer in natural language. ✗
visual-document-retrieval Searching for relevant image-based documents, such as PDFs based on input text prompt. ✓
image-text-to-text Take in an image and text prompt and output text. ✓
video-text-to-text Take in a video and a text prompt and output text. ✗
visual-question-answering Answering open-ended questions based on an image depending on text prompt. ✓

Classification of LLM Models

1: Encoder Based (Auto-Encoding)

  • Focus: Understanding context, generating embeddings.
  • Mechanism: Bidirectional attention (sees past & future tokens).
  • Training: Masked Language Modeling (MLM).
  • Use Cases: Text classification, sentiment analysis, Named Entity Recognition (NER), question answering (understanding).
  • Examples: BERT, RoBERTa.

2: Decoder Based (Auto-Regressive)

  • Focus: Generating new text, predicting next token.
  • Mechanism: Unidirectional attention (sees only past tokens).
  • Training: Predicts next word in a sequence.
  • Use Cases: Text generation, summarization, chatbots, code generation.
  • Examples: GPT series, LLaMA, Claude.

3: Encoder + Decoder Based

  • Focus: Transforming one sequence into another.
  • Mechanism: Encoder-Decoder architecture. Encoder processes input, Decoder generates output having influenced by input.
  • Training: Maps input sequence to output sequence.
  • Use Cases: Machine translation, abstractive summarization, text style transfer.
  • Examples: T5, BART, NMT models.

Auto-Encoding Models

Model: BaseAutoEncodingModel

# Model: BaseAutoEncodingModel
BaseAutoEncodingModel(
	'embedder': BaseAutoEncodingModelEmbedderModule(...),
	'encoder': BaseAutoEncodingModelEncoderModule(...),
	'pooler': BaseAutoEncodingModelPoolerModule(...)
)

Module: BaseAutoEncodingModelEmbedder

# Module: BaseAutoEncodingModelEmbedder
BaseAutoEncodingModelEmbedder(
    'word_emb': WordEmbedder(...), 
    'pos_emb': PositionEmbedder(...), 
    'tok_type_emb': TokenTypeEmbedder(...), 
    'layer_norm': LayerNorm(...), 
    'dropout': Dropout(...)
)

Module: BaseAutoEncodingModelEncoder

# Module: BaseAutoEncodingModelEncoder
BaseAutoEncodingModelEncoder(
    'layers': ModuleList(
		'layer': N x BaseAutoEncodingModelEncoderLayer(
		    'attention': BaseAutoEncodingModelAttention(...), 
	        'intermediate': BaseAutoEncodingModelIntermediate(...), 
	        'output': BaseAutoEncodingModelOutput(...)
		)
	)
)

# SubModule: BaseAutoEncodingModelAttention
BaseAutoEncodingModelAttention(
	'self_attention': BaseAutoEncodingModelSelfAttention(
		'Q': Linear(...), 
		'K': Linear(...), 
		'V': Linear(...), 
		'dropout': Dropout(...)
	),
	'self_output': BaseAutoEncodingModelSelfOutput(
		'dense': Linear(...), 
		'layer_norm': LayerNorm(...), 
		'dropout': Dropout(...)
	),
)

# SubModule: BaseAutoEncodingModelIntermediate
BaseAutoEncodingModelIntermediate(
	'dense': Linear(...), 
	'activation': Activation(...)
)

# SubModule: BaseAutoEncodingModelOutput
BaseAutoEncodingModelOutput(
	'dense': Linear(...), 
	'layer_norm': LayerNorm(...), 
	'dropout': Dropout(...) 
)

Module: BaseAutoEncodingModelPooler

BaseAutoEncodingModelPooler(
	'dense': Linear(...), 
	'activation': Activation(...)
)

Timeline: Transformers

Time Model Novelty
2017 Transformer Architecture Introduced Encoder + Decoder based transformers for machine learning translation task and outperformed SOTA.
2018 GPT First pretrained transformer model. Finetuned for various NLP tasks obtaining SOTA performance.