JO Jason M. Oliverjmoliver.ai
I solve infrastructure problems that live between ownership boundaries.

Principal Systems · Reliability · Platform Engineering

Jason M. Oliver

Principal Platform & Distributed Systems Engineer

I work across distributed infrastructure where compute, storage, networking, and workload behavior intersect.

My work follows the full data path: client behavior, compute, network fabric, storage, virtualization, orchestration, telemetry, and the assumptions teams make between those layers.

I am most useful in ambiguous platform issues where the system is technically available, but workload behavior, latency, throughput, or reliability shows that something deeper is wrong.

Compute & GPU Clusters High-Performance Storage Networking & RDMA AI/ML Workloads Telemetry Workload Shape Failure Domains Operational Context

Why me

I solve problems at the boundaries between systems, teams, and assumptions.

I do not reduce infrastructure work to a product list. The hard problems usually live in the interaction between layers: a workload pattern that exposes a storage constraint, a fabric behavior that looks like an application problem, a telemetry gap that hides the true failure domain, or a platform decision that only fails under pressure.

That is the work I am built for: tracing reality, narrowing ambiguity, explaining the path clearly, and helping teams move from scattered symptoms to a practical operating picture.

Core domains

Infrastructure areas I work across

Compute & GPU Clusters

GPU workload pressure, scheduler placement, host behavior, firmware, drivers, Linux, and resource contention.

Distributed Storage Systems

Scale-out NAS, S3/object, metadata behavior, NVMe/NVMe-oF, data-path latency, and platform constraints.

Networking & RDMA

RDMA, RoCE, InfiniBand, Ethernet, MTU, PMTUD, congestion, retransmits, and fabric-level evidence.

AI/HPC Workloads

Data access patterns, throughput, pipeline reliability, GPU starvation, and infrastructure behavior under load.

Proof signal

15+ yearsenterprise infrastructure experience
Dell Technologiesstorage, VMware, escalation engineering
VAST Datadistributed storage, AI/HPC, customer reliability
VCP-DCV / WCNA / SNIAvirtualization, storage, and packet-level diagnostics

What I actually do

Practical work, not buzzwords.

Platform Reliability

I help determine why infrastructure that appears healthy is still failing the workload, customer, or operational objective.

Performance Diagnostics

I trace behavior across client, compute, network, storage, virtualization, telemetry, and orchestration layers.

Storage & Data Path Analysis

I reason through NAS, S3/object, metadata, NVMe/NVMe-oF, RDMA, latency, throughput, and topology constraints.

AI/HPC Infrastructure

I look at GPU utilization, data access patterns, scheduling, fabric behavior, and storage pressure as one system.

Escalation Leadership

I turn scattered symptoms into an operating picture that engineering, support, product, and customer teams can act on.

Automation & Tooling

I build repeatable diagnostics, evidence collection, runbooks, and workflows that reduce time-to-understanding.