title: "HuggingFace: LLM Course"
tags:
  - HuggingFace
  - LLMs
  - Course

HuggingFace: LLM Course

0. SETUP

Commonly used libraries

transformers
datasets
torch

1. TRANSFORMER MODELS

Pipeline() Function

The pipeline() function in the 🤗 Transformers library simplifies using models by integrating preprocessing and postprocessing steps.

from transformers import pipeline

text = "Huggingface is awesome!"

# sentiment analysis:
e2e_model = pipeline("sentiment-analysis")
e2e_model(text)

Tip

We can pass several sentences in one go!

e2e_model([
	"I've been waiting for a HuggingFace course my whole life.", 
	"I hate this so much!"
])

Tasks and Pipeline() Compatibility

NLP Pipelines

Task	Description	Pipeline()
feature-extraction	Extract vector representations of text.	✓
fill-mask	Fills masked text data.	✓
question-answering	Retrieve the answer to a question from a given text.	✓
sentence-similarity	Determine how similar two texts are.	✗
summarization	Create a shorter version of a text while preserving key information.	✓
table-question-answering	Answering a question about an information on a given table.	✓
text-classification	Classify text into predefined categories.	✓
text-generation	Generate text from a prompt.	✓
text-ranking	Rank a set of texts based on their relevance to a query.	✗
token-classification	NLU task in which a label is assigned to some tokens in a text.	✓
translation	Convert text from one language to another.	✓
zero-shot-classification	Classify text without prior training on specific labels.	✓

Vision pipelines

Task	Description	Pipeline()
depth-estimation	Estimate the depth of different objects present in an image.	✓
image-classification	Identify objects in an image.	✓
image-feature-extraction	Extract semantically meaningful features given an image.	✓
image-segmentation	Divides an image into segments where each pixel in the image is mapped to an object	✓
image-to-image	Transform image. (eg. inpainting, colorization, Super Resolution)	✓
image-to-text	Generate text descriptions of images.	✓
image-to-video	Generate a video influenced by text prompts.	✗
keypoint-detection	Identify meaningful distinctive points or features in an image.	✗
mask-generation	Generate masks that identify a specific object or region of interest in a given image.	✓
object-detection	Locate and identify objects in images.	✓
video-classification	Assign a label or class to an entire video.	✓
text-to-image	Generating images from input text.	✗
text-to-video	Generating consistent sequence of images from text.	✗
unconditional-image-generation	Generating images with no condition in any context.	✗
video-to-video	Transform input video into a new video with altered visual styles, motion, or content.	✗
zero-shot-image-classification	Classify previously unseen classes during training of a model.	✓
zero-shot-object-detection	Detect objects and their classes in images, without any prior training or knowledge of the classes.	✓
text-to-3d	Text-to-3D models take in text input and produce 3D output.	✗
image-to-3d	Image-to-3D models take in image input and produce 3D output.	✗

Audio pipelines

Task	Description	Pipeline()
audio-classification	Classify audio into categories.	✓
audio-to-audio	family of tasks in which the input is an audio and the output is one or multiple generated audios. (eg. speech enhancement, source separation, ...)	✗
automatic-speech-recognition	Convert speech to text.	✓
text-to-audio	Convert text to spoken audio.	✓

Multimodal pipelines

Task	Description	Pipeline()
any-to-any	Understand two or more modalities and output two or more modalities.	✗
audio-text-to-text	Generate textual responses or summaries based on both audio input and text prompts.	✗
document-question-answering	Take a (document, question) pair as input and return an answer in natural language.	✗
visual-document-retrieval	Searching for relevant image-based documents, such as PDFs based on input text prompt.	✓
image-text-to-text	Take in an image and text prompt and output text.	✓
video-text-to-text	Take in a video and a text prompt and output text.	✗
visual-question-answering	Answering open-ended questions based on an image depending on text prompt.	✓

Classification of LLM Models

1: Encoder Based (Auto-Encoding)

Focus: Understanding context, generating embeddings.
Mechanism: Bidirectional attention (sees past & future tokens).
Training: Masked Language Modeling (MLM).
Use Cases: Text classification, sentiment analysis, Named Entity Recognition (NER), question answering (understanding).
Examples: BERT, RoBERTa.

2: Decoder Based (Auto-Regressive)

Focus: Generating new text, predicting next token.
Mechanism: Unidirectional attention (sees only past tokens).
Training: Predicts next word in a sequence.
Use Cases: Text generation, summarization, chatbots, code generation.
Examples: GPT series, LLaMA, Claude.

3: Encoder + Decoder Based

Focus: Transforming one sequence into another.
Mechanism: Encoder-Decoder architecture. Encoder processes input, Decoder generates output having influenced by input.
Training: Maps input sequence to output sequence.
Use Cases: Machine translation, abstractive summarization, text style transfer.
Examples: T5, BART, NMT models.

Auto-Encoding Models

Model: BaseAutoEncodingModel

# Model: BaseAutoEncodingModel
BaseAutoEncodingModel(
	'embedder': BaseAutoEncodingModelEmbedderModule(...),
	'encoder': BaseAutoEncodingModelEncoderModule(...),
	'pooler': BaseAutoEncodingModelPoolerModule(...)
)

Module: BaseAutoEncodingModelEmbedder

# Module: BaseAutoEncodingModelEmbedder
BaseAutoEncodingModelEmbedder(
    'word_emb': WordEmbedder(...), 
    'pos_emb': PositionEmbedder(...), 
    'tok_type_emb': TokenTypeEmbedder(...), 
    'layer_norm': LayerNorm(...), 
    'dropout': Dropout(...)
)

Module: BaseAutoEncodingModelEncoder

# Module: BaseAutoEncodingModelEncoder
BaseAutoEncodingModelEncoder(
    'layers': ModuleList(
		'layer': N x BaseAutoEncodingModelEncoderLayer(
		    'attention': BaseAutoEncodingModelAttention(...), 
	        'intermediate': BaseAutoEncodingModelIntermediate(...), 
	        'output': BaseAutoEncodingModelOutput(...)
		)
	)
)

# SubModule: BaseAutoEncodingModelAttention
BaseAutoEncodingModelAttention(
	'self_attention': BaseAutoEncodingModelSelfAttention(
		'Q': Linear(...), 
		'K': Linear(...), 
		'V': Linear(...), 
		'dropout': Dropout(...)
	),
	'self_output': BaseAutoEncodingModelSelfOutput(
		'dense': Linear(...), 
		'layer_norm': LayerNorm(...), 
		'dropout': Dropout(...)
	),
)

# SubModule: BaseAutoEncodingModelIntermediate
BaseAutoEncodingModelIntermediate(
	'dense': Linear(...), 
	'activation': Activation(...)
)

# SubModule: BaseAutoEncodingModelOutput
BaseAutoEncodingModelOutput(
	'dense': Linear(...), 
	'layer_norm': LayerNorm(...), 
	'dropout': Dropout(...) 
)

Module: BaseAutoEncodingModelPooler

BaseAutoEncodingModelPooler(
	'dense': Linear(...), 
	'activation': Activation(...)
)

Timeline: Transformers

Time	Model	Novelty
2017	Transformer Architecture	Introduced Encoder + Decoder based transformers for machine learning translation task and outperformed SOTA.
2018	GPT	First pretrained transformer model. Finetuned for various NLP tasks obtaining SOTA performance.