Job Description
The Senior Automation Site Reliability Engineer will drive cross-team initiatives that improve Delta engineering practices through increased accountability and deliver increased uptime and performance for the business. An ideal candidate would have prior experience implementing observability plans around logs, metrics, and traces.
Responsibilities:
- Participate in projects, solution implementations, upgrades, and enhancements while fostering relationships with internal customers and stakeholders.
- Accomplish business applications reliability targets such as SLIs and SLOs
- Experience in Python Development.
- Install, configure, test, and maintain operating systems, application software and system management tools
- Automates tasks to improve reliability, repeatability, and scalability.
- Proactively ensure the highest levels of systems and infrastructure availability
- Monitor and test application performance for potential bottlenecks, identify possible solutions, and work with developers to implement those fixes
- Follow security best practices and build capacity provisioning and redundancy strategies
- Write and maintain custom scripts to increase system efficiency and lower the human intervention time on any tasks with the goal of automation of tasks to improve reliability, repeatability, scalability, etc.
Requirements:
- Experienced in the Creation/modification of Ansible playbooks/roles.
- Experienced in the Creation/modification of Job Templates/Projects/Inventories/workflows and other Ansible Tower Objects
- Experience in Python Development.
- Experienced with the architecture of Ansible plays/roles/workflows.
- Well-versed with service integration and automation frameworks.
- Knowledge of ITSM tools like ServiceNow and integration with Ansible Tower
- Automation Tools SME, expertise in implementation, managing, configuring and troubleshooting applications.
- Expert in scripting - Python
- Experience in SRE
- Knowledge in DB, Middleware, Linux IMS life cycle
- Knowledge in Web Services - REST API /SOAP
- Knowledge of ITIL processes
- Knowledge of windows platforms and Unix platforms
- Network troubleshooting and configuration
- Hardware and software troubleshooting including, experience building and replacing system hardware.
- Excellent customer service skills with the ability to communicate clearly with all levels of employees, external vendors, and management teams.
- Experience with both on-prem and cloud-based infrastructure (AWS) in terms of deployment, support, monitoring, administration and troubleshooting
- 3-6 years of experience in scripting languages - Python/Ansible
- 2+ years of experience in SRE
- 2+ years of experience in AWS Cloud
Found this job inappropriate? Report to us