Resume
Software Engineer, SRE
Software Engineer, SRE @ SpaceX (Starlink)
Jun. 2025 - Jan. 2026 · Sunnyvale, CA
GoKubernetesArgoCDPrometheusGrafanaAlertmanagerCockroachDBJsonnetHelmcert-managerHashiCorp VaultPlaywrightGCP
SRE & Infrastructure
- Spearheaded in-house Kubernetes operators for CockroachDB using Go and kubebuilder, building 8 controllers for authentication, backup restoration, schema migrations, connection pooling, and multi-region data synchronization.
- Executed zero-downtime production database migrations using ArgoCD and in-house K8s operator, coordinating schema changes while maintaining 99.9%+ availability and data integrity.
- Engineered Kubernetes operator conversion webhook to upgrade 100+ legacy custom resources for 60+ microservices, eliminating manual effort and ensuring seamless API upgrades.
- Transformed infrastructure provisioning from Helm to Jsonnet for 10+ microservices across all environments, improving configurability and eliminating YAML-based overlay sprawl.
- Automated TLS certificate and secrets management with cert-manager, HashiCorp Vault, and external-secrets, replacing manual credential rotation across all environments.
- Right-sized Kubernetes replicas across 60+ services using Prometheus and Grafana, achieving $50K/month ($600K annualized) in savings and defining capacity for GCP to on-premises migration.
- Built database monitoring system using Prometheus and Grafana; migrated legacy alerting infrastructure to Alertmanager, reducing mean time to detection by 40% across production clusters.
- Maintained 99.9%+ uptime SLO for starlink.com serving millions of users through on-call rotation, triaging and resolving critical incidents including data replication failures and failovers.
- Standardized CI test infrastructure across 700+ Playwright tests, identifying and triaging flaky tests to owners, driving 60% reduction in flaky tests and faster deployment cycle.
Software Engineer @ Supermicro
Nov. 2023 - May 2025 · San Jose, CA
KubernetesArgoCDPrometheusGrafanaGitOpsDockerAnsibleHelmKustomizeDroneCIMariaDBKafkaRook-CephTypeScriptReactExpressGraphQLJest
SRE & Infrastructure
- Architected and operated 25+ highly available, on-premises Kubernetes clusters managing hundreds of servers across 4 regions, automating bare-metal provisioning and configuration management with Ansible, ensuring 99.9%+ availability via Kube-VIP, MetalLB, HAProxy, and Nginx.
- Owned ArgoCD platform end-to-end, driving GitOps adoption across the organization to manage 20+ applications across 25+ Kubernetes clusters, reducing deployment time by 90%.
- Engineered CI/CD pipelines as sole infrastructure engineer on a 4-person team, building with ArgoCD, DroneCI, Helm, and Kustomize to reduce deployment lead time by 50% and enable self-service deployments for engineering teams.
- Implemented observability stack with Prometheus and Grafana, enabling real-time monitoring and alerting across 25+ production clusters with 500+ custom metrics.
- Built multi-purpose containerized development environments with Docker Compose, cutting setup time by 60% and reducing image sizes by 80%.
- Mentored 5+ engineers on GitOps best practices, achieving team-wide adoption and establishing standardized deployment workflows.
Full Stack Development
- Reduced GraphQL query latency by 95%, implementing efficient data fetching patterns that cut API response times from seconds to milliseconds.
- Refactored legacy codebases and increased test coverage from ~40% to 97%+ using Jest, reducing production bugs by 70% and improving maintainability.
- Developed and optimized 10+ mission-critical services using TypeScript, React, Express, GraphQL Apollo, and RESTful APIs, supporting global manufacturing operations across 10+ facilities and reducing production line downtime by 50%.
- Deployed and managed distributed storage infrastructure, orchestrating multi-cluster data synchronization, migrations, and disaster recovery for 2 TB of data using Rook-Ceph, MariaDB-Galera, and Bash/Python scripting, achieving 99.9%+ uptime across production environments.
- Administered and tuned MariaDB, MSSQL, Cassandra, and ScyllaDB databases and Kafka event pipelines, ensuring sub-second query performance across distributed system components.
Software Engineer @ NavisX-AlluringSelf (Startup)
Mar. 2023 - Oct. 2023 · Remote
ReactNext.jsTypeScriptTailwindCSSZustand
Full Stack Development
- Led engineering at 4-person startup, architecting full-stack application with React and Next.js, shipping MVP to 100+ beta users within 4 months.
Cornell Tech, Cornell University
May 2022 · New York, NY
Master of Engineering in Computer Science
Merit Scholarship
University of California, San Diego
Jun. 2021 · La Jolla, CA
Bachelor of Science in Computer Science
GPA: 3.94 | Provost Honors