Lead – Production Support (SaaS) & Service Reliability
About Rezolve.ai
We’re an AI-first SaaS company leveraging the latest advancements in Generative AI. We are proud to build a world-class employee support Agentic AI solution that is disrupting ITSM and HR operations. Rezolve.ai is recognized by Gartner and Forrester for its rapid adoption and end-user benefits. We are in an exciting growth phase and are looking for experienced, ambitious professionals who want to accelerate their own career goals and ours.
Location: Bangalore (On-site)
Department: Engineering / Operations
About the Role
We are looking for a Production Support Lead to manage and mature our 24×7 global support operations for a mission-critical SaaS platform. The primary responsibility of this role is ensuring high product availability, platform stability, rapid incident resolution, and a data-driven reduction in repetitive issues.
This leader will own major incident response, drive actionable root-cause analysis (RCA), identify recurring problem patterns, and work closely with engineering to ensure structural fixes are prioritized. Customer satisfaction, uptime, and operational excellence are the core measurements of success.
Experience in release management is beneficial but not the main focus of the role.
Key Responsibilities
Production Support Ownership (Primary Scope)
- Lead global 24×7 SaaS production support, including support rosters, on-call rotations, and incident workflows.
- Act as the highest-level escalation point for critical incidents, outages, and customer-impacting issues.
- Ensure high platform availability by proactively detecting, triaging, and mitigating issues before they impact customers.
- Drive structured, high-quality Root Cause Analysis (RCA) for all priority incidents and ensure meaningful preventive actions (CAPA) are implemented.
- Use a data-driven approach to identify recurring problems and work with engineering to systematically eliminate them.
- Build trend dashboards to track MTTR, repetitive incidents, alert noise, ticket patterns, and customer impact.
- Own and evolve standard operating procedures, runbooks, support documentation, and escalation playbooks.
- Ensure SLAs/OLAs around availability, responsiveness, and resolution time are consistently met across global teams.
- Maintain continuous communication with internal teams and customers during major incidents and service disruptions.
Customer Satisfaction & Product Stability
- Ensure customer-reported issues are resolved quickly, transparently, and with root causes addressed—not band-aided.
- Collaborate closely with Customer Success to ensure support quality, issue follow-through, and communication clarity.
- Monitor end-to-end product health, proactively flag risks, and champion improvements that directly enhance customer experience.
- Drive internal alignment and prioritization for issues that meaningfully impact stability, usability, or customer trust.
- Maintain a customer-centric mindset, balancing quick recovery with long-term stability and prevention.
Service Reliability & Continuous Improvement
- Build operational KPIs, dashboards, and insights to drive decisions based on real-time and historical data.
- Reduce repetitive incidents and noisy alerts by improving monitoring quality, thresholds, and log visibility.
- Partner with engineering, QA, and DevOps to strengthen platform resilience, scalability, and readiness.
- Run environment readiness checks and operational reviews to minimize outages and improve uptime.
- Drive cross-team accountability for problem resolution and long-term fixes.
- Foster a culture that values stability, reliability, transparency, and continuous learning.
Release Management (Secondary Scope / “Plus”)
- Support the release team with coordination and governance as needed (not the primary responsibility).
- Ensure releases do not introduce regressions or recurring issues through basic quality checks and readiness reviews.
- Maintain visibility into deployment schedules to properly prepare support teams.
Skills & Qualifications
- 8+ years in production support, SaaS operations, service delivery, or support management, including 3+ years in a lead/manager role.
- Proven experience managing global 24×7 SaaS support operations, on-call rotations, and escalation processes.
- Strong experience in major incident handling, RCA, data-driven problem resolution, and reducing recurring issues.
- Deep understanding of SaaS availability metrics, uptime, reliability best practices, and customer expectations.
- Strong knowledge of ITIL concepts — especially Incident, Problem, Change, and Major Incident Management.
- Solid communication skills, especially in high-pressure customer-impact situations.
- Basic-to-intermediate understanding of:
- SQL queries for troubleshooting
- Cloud infrastructure (AWS/Azure basics)
- Monitoring/logging tools (Grafana, Splunk, ELK, Datadog)
- Environment structure (prod, staging, QA, etc.)
- Experience with release management or DevOps workflows is a plus, not required.
- SaaS experience is highly preferred.

.png)






