Introduction to LLMs in Python
Iván Palomares Carrascosa, PhD
Senior Data Science & AI Manager
Pipelines: pipeline()

Auto classes (AutoModel class)

import torch.nn as nn from transformers import AutoModel, AutoTokenizer model_name = "bert-base-uncased" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(model_name) text = "I am an example sequence for text classification."class SimpleClassifier(nn.Module): def __init__(self, input_size, num_classes): super(SimpleClassifier, self).__init__() self.fc = nn.Linear(input_size, num_classes) def forward(self, x): return self.fc(x)
from_pretrained()
model_namemodel_name: model checkpoint:AutoModel does not provide task-specific headinputs = tokenizer( text, return_tensors="pt", padding=True, truncation=True, max_length=64)outputs = model(**inputs) pooled_output = outputs.pooler_output print("Hidden states size: ", outputs.last_hidden_state.shape) print("Pooled output size: ", pooled_output.shape)classifier_head = SimpleClassifier( pooled_output.size(-1), num_classes=2) logits = classifier_head(pooled_output) probs = torch.softmax(logits, dim=1) print("Predicted Class Probabilities:", probs)
Hidden states size: torch.Size([1, 11, 768])
Pooled output size: torch.Size([1, 768])
Predicted Class Probabilities:
tensor([[0.4334, 0.5666]], grad_fn=<SoftmaxBackward0>)
outputspooler_output: high-level, aggregated representation of the sequencelast_hidden_states: raw unaggregated hidden statesfrom transformers import AutoModelForSequenceClassification, AutoTokenizer model_name = "nlptown/bert-base-multilingual-uncased-sentiment" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained( model_name)text = "The quality of the product was just okay." inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits predicted_class = torch.argmax(logits, dim=1).item() print(f"Predicted class index: {predicted_class + 1} star.")
Predicted class index: 3 star.
AutoModelForSequenceClassification class:
outputs already passed through head's linear layerfrom transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "gpt2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "This is a simple example for text generation," inputs = tokenizer.encode( prompt, return_tensors="pt") output = model.generate(inputs, max_length=26)generated_text = tokenizer.decode( output[0], skip_special_tokens=True) print("Generated Text:") print(generated_text)
Generated Text:
This is a simple example for text generation, but it's also
a good way to get a feel for how the text is generated.
AutoModelForCausalLM class:
"gpt2"Model head for next-word prediction
generate() takes prompt and generates up to max_length subsequent tokens
Raw outputs are decoded before printing extended prompt with generated text
from datasets import load_dataset from torch.utils.data import DataLoader dataset = load_dataset("imdb") train_data = dataset["train"] dataloader = DataLoader(train_data, batch_size=2, shuffle=True)for batch in dataloader: for i in range(len(batch["text"])): print(f"Example {i + 1}:") print("Text:", batch["text"][i]) print("Label:", batch["label"][i])
Example 1:
Text: Much worse than the original. It was actually *painf(...)
Label: tensor(0)
Example 2:
Text: I have to agree with Cal-37 it's a great movie, spec(...)
Label: tensor(1)
load_dataset(): loads a dataset from Hugging Face hub
DataLoader class: simplifies iterating, batch processing and parallelismfrom datasets import load_dataset dataset = load_dataset("stanfordnlp/shp", "askculinary") train_data = dataset["train"] print(train_data[i])for i in range(5): example = train_data[i] print(f"Example {i + 1}:") print("Title:", example["post_id"]) print("Paragraph:", example["history"]) print()
Example 1:
Title: himc90
Paragraph: In an interview right before receiving the 2013
Nobel prize in physics, Peter Higgs stated that he (...)
Example 2 (...)
Using a dataset from standfordnlp catalogue
Display some text information in data instances
Input + target (labels) pairs

Input + target (labels) pairs

Introduction to LLMs in Python