huggingface-gemma-recipes
is an open-source project officially maintained by Hugging Face, designed to provide users with minimized example code and tutorials related to the Google Gemma series models. The core goal of this project is to help developers quickly get started with Gemma model inference, fine-tuning, and various practical application scenarios.
This project supports the multi-modal capabilities of the Gemma 3 series models:
The project provides a unified model inference interface, supporting quick loading and use of Gemma models:
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
model_id = "google/gemma-3n-e4b-it" # or google/gemma-3n-e2b-it
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(model_id).to(device)
def model_generation(model, messages):
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
)
input_len = inputs["input_ids"].shape[-1]
inputs = inputs.to(model.device, dtype=model.dtype)
with torch.inference_mode():
generation = model.generate(**inputs, max_new_tokens=32, disable_compile=False)
generation = generation[:, input_len:]
decoded = processor.batch_decode(generation, skip_special_tokens=True)
print(decoded[0])
# Text Question Answering
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "What is the capital of France?"}
]
}
]
model_generation(model, messages)
# Speech-to-Text
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Transcribe the following speech segment in English:"},
{"type": "audio", "audio": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/speech.wav"},
]
}
]
model_generation(model, messages)
# Image Captioning
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": "https://huggingface.co/datasets/ariG23498/demo-data/resolve/main/airplane.jpg"},
{"type": "text", "text": "Describe this image."}
]
}
]
model_generation(model, messages)
The project provides various fine-tuning solutions and scripts:
# Install dependencies
$ pip install -U -q -r requirements.txt
# Install core dependencies
$ pip install -U -q transformers timm
# Install complete dependencies (for fine-tuning)
$ pip install -U -q -r requirements.txt
huggingface-gemma-recipes/
├── notebooks/ # Jupyter notebook tutorials
│ └── fine_tune_gemma3n_on_t4.ipynb
├── scripts/ # Fine-tuning scripts
│ ├── ft_gemma3n_image_vt.py
│ ├── ft_gemma3n_audio_vt.py
│ └── ft_gemma3n_image_trl.py
├── requirements.txt # Dependency list
└── README.md # Project description
This project, as an open-source project officially maintained by Hugging Face, has the following advantages:
huggingface-gemma-recipes
is a high-quality open-source project that provides a complete solution for using Gemma models. Whether you are a beginner or an experienced developer, you can find suitable resources and guidance. The project's multi-modal support and flexible fine-tuning solutions make it an important tool in the current AI development field.