HuggingFace: LLM Course
0. SETUP
Commonly used libraries
- transformers
- datasets
- torch
1. TRANSFORMER MODELS
Pipeline() Function
The pipeline() function in the 🤗 Transformers library simplifies using models by integrating preprocessing and postprocessing steps.
from transformers import pipeline
text = "Huggingface is awesome!"
# sentiment analysis:
e2e_model = pipeline("sentiment-analysis")
e2e_model(text)
Tip
We can pass several sentences in one go!
e2e_model([
"I've been waiting for a HuggingFace course my whole life.",
"I hate this so much!"
])
Tasks and Pipeline() Compatibility
- NLP Pipelines
| Task | Description | Pipeline() |
|---|---|---|
| feature-extraction | Extract vector representations of text. | ✓ |
| fill-mask | Fills masked text data. | ✓ |
| question-answering | Retrieve the answer to a question from a given text. | ✓ |
| sentence-similarity | Determine how similar two texts are. | ✗ |
| summarization | Create a shorter version of a text while preserving key information. | ✓ |
| table-question-answering | Answering a question about an information on a given table. | ✓ |
| text-classification | Classify text into predefined categories. | ✓ |
| text-generation | Generate text from a prompt. | ✓ |
| text-ranking | Rank a set of texts based on their relevance to a query. | ✗ |
| token-classification | NLU task in which a label is assigned to some tokens in a text. | ✓ |
| translation | Convert text from one language to another. | ✓ |
| zero-shot-classification | Classify text without prior training on specific labels. | ✓ |
- Vision pipelines
| Task | Description | Pipeline() |
|---|---|---|
| depth-estimation | Estimate the depth of different objects present in an image. | ✓ |
| image-classification | Identify objects in an image. | ✓ |
| image-feature-extraction | Extract semantically meaningful features given an image. | ✓ |
| image-segmentation | Divides an image into segments where each pixel in the image is mapped to an object | ✓ |
| image-to-image | Transform image. (eg. inpainting, colorization, Super Resolution) | ✓ |
| image-to-text | Generate text descriptions of images. | ✓ |
| image-to-video | Generate a video influenced by text prompts. | ✗ |
| keypoint-detection | Identify meaningful distinctive points or features in an image. | ✗ |
| mask-generation | Generate masks that identify a specific object or region of interest in a given image. | ✓ |
| object-detection | Locate and identify objects in images. | ✓ |
| video-classification | Assign a label or class to an entire video. | ✓ |
| text-to-image | Generating images from input text. | ✗ |
| text-to-video | Generating consistent sequence of images from text. | ✗ |
| unconditional-image-generation | Generating images with no condition in any context. | ✗ |
| video-to-video | Transform input video into a new video with altered visual styles, motion, or content. | ✗ |
| zero-shot-image-classification | Classify previously unseen classes during training of a model. | ✓ |
| zero-shot-object-detection | Detect objects and their classes in images, without any prior training or knowledge of the classes. | ✓ |
| text-to-3d | Text-to-3D models take in text input and produce 3D output. | ✗ |
| image-to-3d | Image-to-3D models take in image input and produce 3D output. | ✗ |
- Audio pipelines
| Task | Description | Pipeline() |
|---|---|---|
| audio-classification | Classify audio into categories. | ✓ |
| audio-to-audio | family of tasks in which the input is an audio and the output is one or multiple generated audios. (eg. speech enhancement, source separation, ...) | ✗ |
| automatic-speech-recognition | Convert speech to text. | ✓ |
| text-to-audio | Convert text to spoken audio. | ✓ |
- Multimodal pipelines
| Task | Description | Pipeline() |
|---|---|---|
| any-to-any | Understand two or more modalities and output two or more modalities. | ✗ |
| audio-text-to-text | Generate textual responses or summaries based on both audio input and text prompts. | ✗ |
| document-question-answering | Take a (document, question) pair as input and return an answer in natural language. | ✗ |
| visual-document-retrieval | Searching for relevant image-based documents, such as PDFs based on input text prompt. | ✓ |
| image-text-to-text | Take in an image and text prompt and output text. | ✓ |
| video-text-to-text | Take in a video and a text prompt and output text. | ✗ |
| visual-question-answering | Answering open-ended questions based on an image depending on text prompt. | ✓ |
Classification of LLM Models
1: Encoder Based (Auto-Encoding Transformers)
- Focus: Understanding context, generating embeddings.
- Mechanism: Bidirectional attention (sees past & future tokens).
- Training: Masked Language Modeling (MLM).
- Use Cases: Text classification, sentiment analysis, Named Entity Recognition (NER), question answering (understanding).
- Examples: BERT, RoBERTa.
2: Decoder Based (Auto-Regressive Transformers)
- Focus: Generating new text, predicting next token.
- Mechanism: Unidirectional attention (sees only past tokens).
- Training: Predicts next word in a sequence.
- Use Cases: Text generation, summarization, chatbots, code generation.
- Examples: GPT series, LLaMA, Claude.
3: Encoder + Decoder Based Transformers
- Focus: Transforming one sequence into another.
- Mechanism: Encoder-Decoder architecture. Encoder processes input, Decoder generates output having influenced by input.
- Training: Maps input sequence to output sequence.
- Use Cases: Machine translation, abstractive summarization, text style transfer.
- Examples: T5, BART, NMT models.
Auto-Encoding Models
Model: BaseAutoEncodingModel
# Model: BaseAutoEncodingModel
BaseAutoEncodingModel(
'embedder': BaseAutoEncodingModelEmbedderModule(...),
'encoder': BaseAutoEncodingModelEncoderModule(...),
'pooler': BaseAutoEncodingModelPoolerModule(...)
)
Module: BaseAutoEncodingModelEmbedder
# Module: BaseAutoEncodingModelEmbedder
BaseAutoEncodingModelEmbedder(
'word_emb': WordEmbedder(...),
'pos_emb': PositionEmbedder(...),
'tok_type_emb': TokenTypeEmbedder(...),
'layer_norm': LayerNorm(...),
'dropout': Dropout(...)
)
Module: BaseAutoEncodingModelEncoder
# Module: BaseAutoEncodingModelEncoder
BaseAutoEncodingModelEncoder(
'layers': ModuleList(
'layer': N x BaseAutoEncodingModelEncoderLayer(
'attention': BaseAutoEncodingModelAttention(...),
'intermediate': BaseAutoEncodingModelIntermediate(...),
'output': BaseAutoEncodingModelOutput(...)
)
)
)
# SubModule: BaseAutoEncodingModelAttention
BaseAutoEncodingModelAttention(
'self_attention': BaseAutoEncodingModelSelfAttention(
'Q': Linear(...),
'K': Linear(...),
'V': Linear(...),
'dropout': Dropout(...)
),
'self_output': BaseAutoEncodingModelSelfOutput(
'dense': Linear(...),
'layer_norm': LayerNorm(...),
'dropout': Dropout(...)
),
)
# SubModule: BaseAutoEncodingModelIntermediate
BaseAutoEncodingModelIntermediate(
'dense': Linear(...),
'activation': Activation(...)
)
# SubModule: BaseAutoEncodingModelOutput
BaseAutoEncodingModelOutput(
'dense': Linear(...),
'layer_norm': LayerNorm(...),
'dropout': Dropout(...)
)
Module: BaseAutoEncodingModelPooler
BaseAutoEncodingModelPooler(
'dense': Linear(...),
'activation': Activation(...)
)