Mengecilkan model dengan kuantisasi

Fine-Tuning dengan Llama 3

Francesca Donadoni

Curriculum Manager, DataCamp

Apa itu kuantisasi?

 

  • Mengurangi presisi model
  • Float 32-bit ke:
    • Integer 8-bit
    • Integer 4-bit
  • Pelatihan sadar kuantisasi

blok abstrak.jpg

Fine-Tuning dengan Llama 3

Jenis kuantisasi

 

  • Kuantisasi bobot: mengurangi presisi bobot
  • Kuantisasi aktivasi: mengurangi presisi nilai aktivasi
  • Kuantisasi pascapelatihan: mengurangi presisi model setelah pelatihan
Fine-Tuning dengan Llama 3

Mengonfigurasi kuantisasi dengan bitsandbytes

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
  • atur presisi (load_in_4_bit, load_in_8_bit)
    load_in_4bit=True,
  • atur jenis kuantisasi ('fp4' atau float 4-bit, 'nf4' atau float 4-bit ternormalisasi)
    bnb_4bit_quant_type="nf4",
  • atur presisi komputasi (float 32-bit atau bfloat 16-bit)
    bnb_4bit_compute_dtype=torch.bfloat16)
Fine-Tuning dengan Llama 3

Memuat model dengan kuantisasi

from transformers import BitsAndBytesConfig, AutoModelForCausalLM

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained( "nvidia/Llama3-ChatQA-1.5-8B",
quantization_config=bnb_config
)
Fine-Tuning dengan Llama 3

Menggunakan model terkuantisasi

promptstr = """System: You are a helpful chatbot who answers questions about planets.
User: Explain the history of Mars
Assistant: """

inputs = tokenizer.encode(promptstr, return_tensors="pt")
outputs = model.generate(inputs, max_length=200)
decoded_outputs = tokenizer.decode(outputs[0, inputs.shape[1]:], skip_special_tokens = True)
print(decoded_outputs)
Here is a brief history of Mars:
- 4.6 billion years ago: Mars formed as part of the solar system.
- 3.8 billion years ago: Mars had a thick atmosphere and liquid water on its surface.
- 3.8 billion years ago to 3.5 billion years ago: Mars lost its magnetic field and atmosphere, 
and became a cold, dry planet.
- 3.5 billion years ago to present: Mars has been cold and dry, with a thin atmosphere.
Fine-Tuning dengan Llama 3

Fine-tuning model terkuantisasi

  • Kuantisasi penuh tidak mendukung fine-tuning
  • Adaptasi LoRA
trainer = SFTTrainer(
    model=model,

peft_config=peft_config,
train_dataset=ds, max_seq_length=250, dataset_text_field='conversation', tokenizer=tokenizer, args=training_arguments
)
trainer.train()
Fine-Tuning dengan Llama 3

Ayo berlatih!

Fine-Tuning dengan Llama 3

Preparing Video For Download...