The Eventual platform is built around three core abstractions that give you the serverless experience of a cloud data warehouse but for all the other modalities of data. Understanding these concepts is key to effectively using the ev SDK and daft (our multimodal query engine).

Jobs

Jobs

Jobs are procedures composed of daft operations that run on the Eventual platform.They automatically handle scaling, retries, and fault tolerance, so you can focus on your business logic.

What is a Job?

A Job is a Python function decorated with @job.main() that defines what work should be performed. Jobs are:
  • Automatically Distributed: Your code runs across multiple machines without configuration
  • Fault Tolerant: Built-in retry logic and error handling
  • Scalable: Automatically scales based on workload
  • Monitored: Full logging and metrics collection

Basic Job Example

from ev import Env, Job

# Create environment
env = Env("3.11").pip_install(["daft==0.5.9"])

# Create job
job = Job("data_processor", env)

@job.main()
def main():
    """Process data using daft."""
    import daft
    
    # Read data using daft - automatically distributed
    df = daft.read_parquet("s3://input/data.parquet")
    
    # Process with daft operations - scales across cluster
    df = df.where(df["status"] == "active")
    df = df.with_column("processed_at", daft.lit("2024-01-01"))
    
    # Write results with daft - fault tolerant
    df.write_parquet("s3://output/processed.parquet")
    
    print(f"Processed {len(df)} rows")
    return 0

Job Lifecycle

1

Submit

Job is submitted to the Eventual platform
2

Schedule

Platform schedules execution on available compute resources
3

Execute

Job function runs with automatic scaling and fault tolerance
4

Monitor

Progress is tracked with full logging and metrics
5

Complete

Results are returned and resources are cleaned up

Environments

Environments

Environments define the runtime context for your jobs, including Python dependencies and configuration.They ensure your jobs have everything needed to run successfully.

What is an Environment?

An Environment specifies:
  • Python Dependencies: Packages required by your job
  • Environment Variables: Configuration values and secrets
  • Files: Additional files needed at runtime
  • System Configuration: Runtime settings and resource requirements

Creating Environments

from ev import Env, Job

# Create environment with dependencies
env = Env("3.11").pip_install([
    "daft==0.5.9",
    "requests==2.31.0"
])

job = Job("data_fetcher", env)

@job.main()
def main():
    import daft
    import requests
    
    # Download data
    response = requests.get("https://api.example.com/data")
    data = response.json()
    
    # Process with daft
    df = daft.from_pylist(data)
    
    # Show summary
    df.show()
    print(f"Processed {len(df)} records")
    return 0

Environment Best Practices

Resources

Resources

Resources are reference-able entities that can be used across jobs and shared within your organization.They provide abstractions over infrastructure components like data volumes, ML models, and compute clusters.

What are Resources?

Resources represent:
  • Data Volumes: S3 buckets, databases, file systems
  • ML Models: Trained models, embeddings, checkpoints
  • Compute Resources: GPU clusters, specialized hardware
  • External Services: APIs, databases, third-party systems

Using Resources

from ev import Env, Job

# Create environment
env = Env("3.11").pip_install(["daft==0.5.9"])

# Define volume paths as environment variables
env.environ["DATA_PATH"] = "s3://company-data-bucket/customers/"
env.environ["MODEL_PATH"] = "s3://company-models/production/"

job = Job("customer_processor", env)

@job.main()
def main():
    import daft
    import os
    
    data_path = os.environ["DATA_PATH"]
    model_path = os.environ["MODEL_PATH"]
    
    # Read from data volume
    df = daft.read_parquet(data_path + "*.parquet")
    
    # Process data
    df = df.where(df["status"] == "active")
    
    print(f"Processed {len(df)} customer records")
    return 0

Resource Benefits

Reusability

Define once, use across multiple jobs

Versioning

Track versions and metadata

Sharing

Share resources across teams

Governance

Control access and permissions

Putting It All Together

Here’s how Jobs, Environments, and Resources work together:
from ev import Env, Job

# Define environment with all dependencies
env = Env("3.11").pip_install([
    "daft==0.5.9",
    "torch==2.0.0",
    "torchvision==0.15.0",
    "pillow==9.0.0"
])

# Set up paths and configuration
env.environ["IMAGE_DATA_PATH"] = "s3://company-images/products/"
env.environ["MODEL_PATH"] = "s3://models/product-classifier-v3.pkl"
env.environ["MODEL_CACHE_DIR"] = "/tmp/models"
env.environ["MODEL_VERSION"] = "3.0.0"

# Include model files
env.include(["models/", "config/"])

# Create job using environment
job = Job("product_classifier", env)

@job.main()
def main():
    import daft
    import torch
    import torchvision.transforms as transforms
    from PIL import Image
    import os
    
    image_data_path = os.environ["IMAGE_DATA_PATH"]
    model_path = os.environ["MODEL_PATH"]
    model_version = os.environ["MODEL_VERSION"]
    
    # Load images metadata from path
    df = daft.read_parquet(image_data_path + "metadata.parquet")
    
    # Load model
    model = torch.load(model_path)
    model.eval()
    
    # Set up image transforms
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                           std=[0.229, 0.224, 0.225])
    ])
    
    # Process images (simplified example)
    def classify_image_path(image_path):
        # In practice, would load and classify the actual image
        # For demo, return mock classification
        return "electronics"
    
    # Apply classification using daft
    df = df.with_column(
        "category",
        df["image_path"].apply(classify_image_path, return_dtype=daft.DataType.string())
    )
    
    # Save results
    df.write_parquet(image_data_path + "classified/")
    
    print(f"Classified {len(df)} images using model version {model_version}")
    return 0

How They Work Together

1

Environment Setup

The environment installs PyTorch and sets up the model cache directory
2

Resource Loading

The job loads images from the data volume and the ML model from the model resource
3

Distributed Processing

daft automatically distributes the image classification across the cluster
4

Result Storage

Classified results are saved back to the data volume

Next Steps

Now that you understand the core concepts, dive deeper into each area:
Ready to see these concepts in action? Check out our image processing example to see how Jobs, Environments, and Resources work together in a real-world scenario.