Distributed Infrastructure · AI/HPC Infrastructure · Storage Platforms · Virtualization · Reliability Engineering

Jason M. Oliver

Principal Platform & Distributed Systems Engineer

I work across distributed infrastructure where compute, storage, networking, and workload behavior have to be understood together.

My work follows the full data path: client behavior, compute, network fabric, storage, virtualization, orchestration, telemetry, and the assumptions teams make between those layers.

I am most useful in ambiguous platform issues where the system is technically available, but workload behavior, latency, throughput, or reliability shows that something deeper is wrong.

↔ jason@jmoliver.ai

Systems Model • Principles • Recruiter View • Services • TXIO Systems

Why Me

I solve problems at the boundaries between systems, teams, and assumptions.

I do not reduce infrastructure work to a product list. The hard problems usually live in the interaction between layers: a workload pattern that exposes a storage constraint, a fabric behavior that looks like an application problem, a telemetry gap that hides the true failure domain, or a platform decision that only fails under pressure.

That is the work I am built for: tracing reality, narrowing ambiguity, explaining the path clearly, and helping teams move from scattered symptoms to a practical operating picture.

Areas of Work

Compute, storage, networking, and workload behavior.

Compute & GPU Clusters

GPU workload pressure, Slurm/Kubernetes scheduler behavior, host behavior, firmware, drivers, Linux, and resource contention.

Distributed Storage Systems

Scale-out NAS, S3/object, metadata behavior, NVMe/NVMe-oF, data-path latency, and platform constraints.

Networking & RDMA

RDMA, RoCE, InfiniBand, Ethernet, MTU, PMTUD, congestion, retransmits, and fabric-level evidence.

AI/HPC Workloads

Data access patterns, throughput, pipeline reliability, workload readiness, GPU starvation, and infrastructure behavior under load.

Credentials & Context

15+ yearsenterprise infrastructure experience

Dell Technologiesstorage, VMware, escalation engineering

VAST Datadistributed storage, AI/HPC, customer reliability

VCP-DCV / WCNA / SNIAvirtualization, storage, and packet-level diagnostics

What I Actually Do

Core areas of work.

Platform Reliability

I help determine why infrastructure that appears healthy is still failing the workload, customer, or operational objective.

Platform Design & Validation

I translate ambiguous requirements into practical platform designs, validation paths, runbooks, and implementation guidance.

Storage & Data Path Analysis

I reason through NAS, S3/object, metadata, NVMe/NVMe-oF, RDMA, latency, throughput, and topology constraints.

AI/HPC Infrastructure

I look at GPU utilization, data access patterns, scheduling, fabric behavior, and storage pressure as one system.

Escalation Leadership

I turn scattered symptoms into an operating picture that engineering, support, product, and customer teams can act on.

Diagnostics & Automation

I build repeatable diagnostics, evidence collection, automation-backed validation, runbooks, and workflows that reduce time-to-understanding.

Focused Advisory Services

For teams that need infrastructure evidence, not generic IT guesswork.

Architecture & Reliability Review

Review platform design, workload readiness, failure domains, operational risk, and what the environment can safely support.

Cross-Layer Diagnostics

Reason through performance and reliability symptoms across compute, storage, network, virtualization, Linux, telemetry, and workload behavior.

Runbooks & Operating Clarity

Turn ambiguous troubleshooting paths into practical diagnostics, evidence collection, escalation notes, runbooks, and handoff material.

Core premise: healthy infrastructure is not defined by “it’s up.” It is defined by whether it performs as expected under real workload.

IT ConsultingCloud ManagementNetwork SupportComputer NetworkingBackup & Recovery SystemsCybersecurity ReviewInformation ManagementProject ManagementProgram ManagementTechnical Writing

Where to Go Next

A few direct paths depending on what you want to understand first.

Systems Model

A visual way to show how I look across compute, storage, network, workload behavior, telemetry, failure domains, and operations.

Work Examples

Short examples of the kinds of infrastructure problems I have been involved in, without exposing customer details.

Recruiter View

A resume-aligned summary, role alignment, contact path, and downloadable PDF.

Services

Focused advisory and diagnostic work for infrastructure that is up, but not performing under real workload.