Network Observability Tools Case Study | Senior Product Designer

Role

Senior Product Designer

Year

2022-2024

Company

Internal Platform

Focus

SYSTEMS OBSERVABILITY

Impact

61%

Reduction in incident detection time

Achieved through relational visualization and unified dashboard.

20%

Faster response time

Improved operational resilience through actionable dashboards.

The Challenge

Traditional observability tools monitor metrics—CPU load, latency, error rates. But a global fulfillment network doesn't fail in isolation. Thousands of sites, hundreds of technical teams, and thousands of interdependent services form a dependency graph with millions of connections. When something breaks, the question isn't "which metric spiked?" but "which dependency failed?" Before this platform, incident resolvers had no unified view of these relationships. They manually navigated 7 disparate data sources — alarm consoles, graph databases, monitoring dashboards, network analyzers, system logs, and deployment trackers — while 6 additional metadata systems fed site context in the background. The real challenge wasn't monitoring; it was understanding how services depended on each other at scale.

"Currently if a Power or Dual WAN outage occurs at a site, we can't tell the difference if it's ISP or Power related."

The Solution

Working closely with the engineering team, we designed a single-pane dashboard that turns a complex dependency graph into something an incident resolver can read at a glance. The key design decision was structuring the interface around dependencies rather than individual metrics — so resolvers could trace a failure path instead of checking tools one by one. The system highlights the probable root cause automatically, and we built in an explainability framework for ML-driven trust scores planned for future releases.

hub

Relational Visualization

A visual language that maps how services depend on each other — making invisible relationships visible at a glance.

crisis_alert

Probable Cause Engine

Automated root cause detection that traces the failure path and highlights where the chain breaks.

dashboard_customize

Unified Information Architecture

Aggregated alerts with consistent navigation patterns reducing cognitive load across thousands of services.

Reflection

Visualizing dependencies rather than isolated metrics fundamentally changed how operators reasoned about incidents — they stopped chasing individual alarms and started tracing causal chains. Running UX discovery in parallel with backend development meant we could validate design decisions against real data within the same sprint. The biggest accelerator was establishing a design system where every backend concept had a direct UI counterpart — status cards, metric badges, dependency connectors. Engineers could ship new monitoring views by referencing the design system alone, without waiting for custom specs.

Network Observability Tools

Impact

The Challenge

The Solution

Relational Visualization

Probable Cause Engine

Unified Information Architecture

The Grammar of a Graph

Whiteboard to Wire

Navigating the Tree

System Grammar

Tuning the Interface

Node Detail

Reflection

GenAI Procurement Assistant

Ops Platform