Skip to content

Tutorial: Using Slurm for Execution

This tutorial demonstrates how to use the Scheduler library with Slurm for running optimization trials on a cluster. Using Slurm allows you to scale up your optimization by distributing trials across multiple compute nodes.

Prerequisites

  • Scheduler library installed with Slurm support (pip install -e .[slurm])
  • Access to a Slurm cluster
  • Basic understanding of Slurm concepts (partitions, job submission, etc.)

Step 1: Import Required Libraries

from ax.service.ax_client import AxClient
from scheduler import AxScheduler, SlurmRunner

Step 2: Define Your Objective Function

For this example, we'll use a script-based objective instead of a Python function, since this is more common in HPC environments:

A toy function goes here

Step 3: Initialize Ax Client and Define Parameter Space

# Initialize Ax client
ax_client = AxClient()

# Define the parameter space
ax_client.create_experiment(
    name="slurm_optimization",
    parameters=[
        {
            "name": "field_strength",
            "type": "range",
            "bounds": [1.0, 3.0],
            "value_type": "float",
        },
        {
            "name": "detector_length",
            "type": "range",
            "bounds": [4.0, 8.0],
            "value_type": "float",
        },
        {
            "name": "detector_radius",
            "type": "range",
            "bounds": [1.0, 3.0],
            "value_type": "float",
        },
    ],
    objectives={
        "resolution": "minimize",
        "acceptance": "maximize",
    },
    outcome_constraints=["cost <= 100.0"],
)

Step 4: Create a Slurm Runner

# Create a runner for Slurm execution
runner = SlurmRunner(
    partition="compute",       # Specify your partition
    time_limit="00:30:00",     # 30 minutes per job
    memory="4G",               # 4GB of memory per job
    cpus_per_task=4,           # 4 CPUs per task
    config={
        'modules': ['python/3.9'],  # Modules to load
        'sbatch_options': {
            'account': 'eic-project',       # Your account/allocation
            'mail-user': 'user@example.com',  # Email for notifications
            'mail-type': 'END,FAIL'           # When to send notifications
        }
    }
)

Step 5: Create the Scheduler and Set Script Objective

# Create the scheduler
scheduler = AxScheduler(
    ax_client, 
    runner,
    config={
        'job_output_dir': './slurm_outputs',  # Directory for job outputs
        'synchronous': False,                  # Run trials asynchronously
        'monitoring_interval': 30              # Check status every 30 seconds
    }
)

# Set the script objective
scheduler.set_script_objective(
    script_path="./simulation_script.py",
    script_options={
        'interpreter': 'python',  # Use Python to run the script
        'timeout': 1200           # Timeout in seconds (20 minutes)
    }
)

Step 6: Run the Optimization

# Run the optimization
best_params = scheduler.run_optimization(max_trials=20)

# Print the results
print("Best parameters:")
print(best_params)

# Get the best metrics
best_metrics = ax_client.get_best_trial().values
print("Best metrics:")
print(best_metrics)

Complete Example

Write a complete example

Next Steps