Full Job Description
You Lead the Way. We’ve Got Your Back.
At American Express, we know that with the right backing, people and businesses have the power to progress in incredible ways. Whether we’re supporting our customers’ financial confidence to move ahead, taking commerce to new heights, or encouraging people to explore the world, our colleagues are constantly redefining what’s possible — and we’re proud to back each other every step of the way. When you join #TeamAmex, you become part of a diverse community of over 60,000 colleagues, all with a common goal to deliver an exceptional customer experience every day.
American Express has embarked on an exciting transformation driven by an energetic new team of high performers. This group is nimble and creative with the power to shape our technology and product road map. Service Operations is responsible for providing reliable platforms for hundreds of critical applications and utilities within American Express
Purpose of the Role:
Primary focus is to provide technical expertise and tooling to ensure the highest level of reliability and availability for critical applications. Able to provide consultation and strategic recommendations by quickly assessing and remediating complex availability issues. Responsible for driving automation, efficiencies to increase quality, availability and auto-healing of complex processes.
Deliver technical solutions to problems identified and make architectural recommendations along with process conformance for incident, problem, change management, and compliance
Drive the resolution of problems by identifying the root cause and implementing architecture/code fixes
Analyze the existing technology environment and architecture, to develop technical/architectural recommendations to improve system/application performance and lead improvement implementation
Drive continuous improvement to reduce MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection) on production impacts. Proactively address gaps with stability issues across infrastructure environment.
Collaborate with business users and technical SMEs in order to analyze, triage, and creatively resource product development and enhancements for loads into operational data warehouse
Automate, optimize, and execute process improvements on existing processes
Provide post-implementation production support for processes
Critical Factors to Success:
Development: Enable creation and updating of logging standards to streamline dashboard creation and ensure usability of logging repository.
Automation: Responsible for evaluating and implementing orchestration, automation, and tooling solutions to ensure consistent processes and repetitive tasks are performed with a higher level of accuracy and reduced defects.
Operational Readiness: Responsible for availability, proactive monitoring / alerting, performance (reducing latency and increasing efficiency) to include testing for technical platforms
Production Support: Ensure application data flows are accurate and up to date with the objective to increase the knowledge base of all support teams and drive reliability.
6+ years of engineering and/or architecture experience in a complex environment, such as: large scale web infrastructure or development team
Experience supporting a 24/7 enterprise environment with on-call responsibilities for production support.
Understanding of monitoring technologies including various logging frameworks and tooling. Focused on logging, time-series or machine-learning products.
Bachelor’s Degree in related field preferred; Relevant industry experience can substitute
Knowledge of Platform Engineering and Understanding of architecture, application system design.
Maintain and enhance monitoring framework (data collection, alert aggregation and enhance alerting logic).
Detailed understanding of applicable programming methodologies
Problem solving and analytical skills
Programming languages and framework
Has an ‘Automation First” mindset.
Combines deep technical expertise, systematic and rational root cause analysis to identify opportunities to make things faster and better
Scripting Languages (Bash/Python/Perl)
Splunk, Dynatrace, ELK/Kibana
Load & performance testing tools ( e.g. JMeter)
VMWare, Unix, Citrix, Linux, Solaris, Windows, Open Stack etc.
Good Understanding of Public Cloud offerings & Microservice architecture.
Exposure to AIOps would be a plus
Enterprise Leadership Behaviors
Set The Agenda: Define What Winning Looks Like, Put Enterprise Thinking First, Lead with an External Perspective
Bring Others With You: Build the Best Team, Seek & Provide Coaching Feedback, Make Collaboration Essential
Do It The Right Way: Communicate Frequently, Candidly & Clearly, Make Decisions Quickly & Effectively, Live the Blue Box Values, Great Leadership Demands Courage
Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations.