title: "HuggingFace: LLM Course"
tags:
- HuggingFace
- LLMs
- Course
The pipeline()
function in the 🤗 Transformers library simplifies using models by integrating preprocessing and postprocessing steps.
from transformers import pipeline
text = "Huggingface is awesome!"
# sentiment analysis:
e2e_model = pipeline("sentiment-analysis")
e2e_model(text)
We can pass several sentences in one go!
e2e_model([
"I've been waiting for a HuggingFace course my whole life.",
"I hate this so much!"
])
Task | Description | Pipeline() |
---|---|---|
feature-extraction | Extract vector representations of text. | ✓ |
fill-mask | Fills masked text data. | ✓ |
question-answering | Retrieve the answer to a question from a given text. | ✓ |
sentence-similarity | Determine how similar two texts are. | ✗ |
summarization | Create a shorter version of a text while preserving key information. | ✓ |
table-question-answering | Answering a question about an information on a given table. | ✓ |
text-classification | Classify text into predefined categories. | ✓ |
text-generation | Generate text from a prompt. | ✓ |
text-ranking | Rank a set of texts based on their relevance to a query. | ✗ |
token-classification | NLU task in which a label is assigned to some tokens in a text. | ✓ |
translation | Convert text from one language to another. | ✓ |
zero-shot-classification | Classify text without prior training on specific labels. | ✓ |
Task | Description | Pipeline() |
---|---|---|
depth-estimation | Estimate the depth of different objects present in an image. | ✓ |
image-classification | Identify objects in an image. | ✓ |
image-feature-extraction | Extract semantically meaningful features given an image. | ✓ |
image-segmentation | Divides an image into segments where each pixel in the image is mapped to an object | ✓ |
image-to-image | Transform image. (eg. inpainting, colorization, Super Resolution) | ✓ |
image-to-text | Generate text descriptions of images. | ✓ |
image-to-video | Generate a video influenced by text prompts. | ✗ |
keypoint-detection | Identify meaningful distinctive points or features in an image. | ✗ |
mask-generation | Generate masks that identify a specific object or region of interest in a given image. | ✓ |
object-detection | Locate and identify objects in images. | ✓ |
video-classification | Assign a label or class to an entire video. | ✓ |
text-to-image | Generating images from input text. | ✗ |
text-to-video | Generating consistent sequence of images from text. | ✗ |
unconditional-image-generation | Generating images with no condition in any context. | ✗ |
video-to-video | Transform input video into a new video with altered visual styles, motion, or content. | ✗ |
zero-shot-image-classification | Classify previously unseen classes during training of a model. | ✓ |
zero-shot-object-detection | Detect objects and their classes in images, without any prior training or knowledge of the classes. | ✓ |
text-to-3d | Text-to-3D models take in text input and produce 3D output. | ✗ |
image-to-3d | Image-to-3D models take in image input and produce 3D output. | ✗ |
Task | Description | Pipeline() |
---|---|---|
audio-classification | Classify audio into categories. | ✓ |
audio-to-audio | family of tasks in which the input is an audio and the output is one or multiple generated audios. (eg. speech enhancement, source separation, ...) | ✗ |
automatic-speech-recognition | Convert speech to text. | ✓ |
text-to-audio | Convert text to spoken audio. | ✓ |
Task | Description | Pipeline() |
---|---|---|
any-to-any | Understand two or more modalities and output two or more modalities. | ✗ |
audio-text-to-text | Generate textual responses or summaries based on both audio input and text prompts. | ✗ |
document-question-answering | Take a (document, question) pair as input and return an answer in natural language. | ✗ |
visual-document-retrieval | Searching for relevant image-based documents, such as PDFs based on input text prompt. | ✓ |
image-text-to-text | Take in an image and text prompt and output text. | ✓ |
video-text-to-text | Take in a video and a text prompt and output text. | ✗ |
visual-question-answering | Answering open-ended questions based on an image depending on text prompt. | ✓ |
# Model: BaseAutoEncodingModel
BaseAutoEncodingModel(
'embedder': BaseAutoEncodingModelEmbedderModule(...),
'encoder': BaseAutoEncodingModelEncoderModule(...),
'pooler': BaseAutoEncodingModelPoolerModule(...)
)
# Module: BaseAutoEncodingModelEmbedder
BaseAutoEncodingModelEmbedder(
'word_emb': WordEmbedder(...),
'pos_emb': PositionEmbedder(...),
'tok_type_emb': TokenTypeEmbedder(...),
'layer_norm': LayerNorm(...),
'dropout': Dropout(...)
)
# Module: BaseAutoEncodingModelEncoder
BaseAutoEncodingModelEncoder(
'layers': ModuleList(
'layer': N x BaseAutoEncodingModelEncoderLayer(
'attention': BaseAutoEncodingModelAttention(...),
'intermediate': BaseAutoEncodingModelIntermediate(...),
'output': BaseAutoEncodingModelOutput(...)
)
)
)
# SubModule: BaseAutoEncodingModelAttention
BaseAutoEncodingModelAttention(
'self_attention': BaseAutoEncodingModelSelfAttention(
'Q': Linear(...),
'K': Linear(...),
'V': Linear(...),
'dropout': Dropout(...)
),
'self_output': BaseAutoEncodingModelSelfOutput(
'dense': Linear(...),
'layer_norm': LayerNorm(...),
'dropout': Dropout(...)
),
)
# SubModule: BaseAutoEncodingModelIntermediate
BaseAutoEncodingModelIntermediate(
'dense': Linear(...),
'activation': Activation(...)
)
# SubModule: BaseAutoEncodingModelOutput
BaseAutoEncodingModelOutput(
'dense': Linear(...),
'layer_norm': LayerNorm(...),
'dropout': Dropout(...)
)
BaseAutoEncodingModelPooler(
'dense': Linear(...),
'activation': Activation(...)
)
Time | Model | Novelty |
---|---|---|
2017 | Transformer Architecture | Introduced Encoder + Decoder based transformers for machine learning translation task and outperformed SOTA. |
2018 | GPT | First pretrained transformer model. Finetuned for various NLP tasks obtaining SOTA performance. |