Horovod is an open-source distributed deep learning training framework developed by Uber. Its goal is to make distributed deep learning training simpler, faster, and easier to use. Horovod supports popular deep learning frameworks such as TensorFlow, Keras, PyTorch, and Apache MXNet.
Horovod primarily supports the following deep learning frameworks:
Horovod supports the following communication mechanisms:
The installation process for Horovod depends on the deep learning framework and communication mechanism you are using. Typically, you need to install MPI or NCCL first, and then install Horovod using pip.
For example, to install Horovod with pip and support TensorFlow and NCCL, you can run the following command:
pip install horovod[tensorflow,gpu]
Please refer to the Horovod official documentation for more detailed installation instructions: https://github.com/horovod/horovod
Using Horovod for distributed training typically involves the following steps:
horovod.init()
to initialize Horovod.DistributedOptimizer
: Wrap the original optimizer with the DistributedOptimizer
provided by Horovod.Here is a simple example of using Horovod for TensorFlow distributed training:
import tensorflow as tf
import horovod.tensorflow as hvd
# 1. Initialize Horovod
hvd.init()
# 2. Pin GPU
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
if gpus:
tf.config.experimental.set_visible_devices(gpus[hvd.local_rank()], 'GPU')
# 3. Load Dataset
(mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
(tf.cast(mnist_images[..., None] / 255.0, tf.float32),
tf.cast(mnist_labels, tf.int64)))
dataset = dataset.repeat().shuffle(10000).batch(128)
# 4. Build Model
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, [3, 3], activation='relu'),
tf.keras.layers.Conv2D(64, [3, 3], activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10, activation='softmax')
])
# 5. Define Optimizer
opt = tf.keras.optimizers.Adam(0.001 * hvd.size()) # Scale learning rate
# 6. Use DistributedOptimizer
opt = hvd.DistributedOptimizer(opt)
# 7. Define Loss Function and Metric
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
metric = tf.keras.metrics.SparseCategoricalAccuracy()
# 8. Define Training Step
@tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
probs = model(images, training=True)
loss = loss_fn(labels, probs)
tape = hvd.DistributedGradientTape(tape)
gradients = tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(gradients, model.trainable_variables))
metric.update_state(labels, probs)
return loss
# 9. Broadcast Initial Variables
@tf.function
def initialize_vars():
if hvd.rank() == 0:
model(tf.zeros((1, 28, 28, 1)))
hvd.broadcast_variables(model.variables, root_rank=0)
hvd.broadcast_variables(opt.variables(), root_rank=0)
initialize_vars()
# 10. Training Loop
for batch, (images, labels) in enumerate(dataset.take(10000 // hvd.size())):
loss = train_step(images, labels)
if batch % 10 == 0 and hvd.rank() == 0:
print('batch: %d, loss: %.4f, accuracy: %.2f' % (batch, loss, metric.result()))
Horovod is a powerful distributed deep learning framework that can help you train large-scale deep learning models more easily and quickly. By leveraging multiple GPU or CPU nodes, Horovod can significantly reduce training time and improve model accuracy.