Requirements & Qualification
– Bachelor’s Degree Software Engineering / Computer Science / Information Technology / Project Management or any related field.
– Minimum 8 years working experience or similar role is required for this position.
– Strong knowledge of Linux/Unix systems and administration.
– Strong knowledge of cloud infrastructure and services, particularly AWS.
– Experience with containerization and orchestration technologies such as Docker and Kubernetes.
– Experience with automation and configuration management tools such as AWS CDK, Terraform, Ansible, Puppet, or Chef.
– Experience with monitoring and logging tools such as Prometheus, Grafana, ELK or equivalent.
– Experience in implementing observability platforms using any product suites like DataDog, NewRelic, ELK, and Prometheus.
– Familiarity with build tools like GitLab CI, Travis, or equivalent.
– Strong scripting skills in languages such as Bash, Python, or Ruby.
– Experience with networking concepts and protocols.
– Experience with database management and administration.
– Familiarity with service-mesh technologies such as Istio and Linkerd.
– Experience with modern cloud development practices (microservices architectures, REST interfaces, etc.)
– Experience with source code management using Git.
– Deep hands-on technical expertise and problem-solving skills.
– Strong understanding of software development methodologies and principles.
– Strong problem-solving and analytical skills.
– Good communication and collaboration skills.
Responsibilities
– Designing, implementing, and maintaining high-availability systems.
– Proactively monitoring and troubleshooting production systems to identify and resolve issues.
– Creating and maintaining automated systems for deployment, scaling, and monitoring.
– Managing incident response and post-mortem analysis to prevent similar issues in the future.
– Collaborating with development teams to resolve any production issues in a timely manner.
– Continuously improving the performance, scalability, and reliability of the systems they manage.
– Participating in on-call rotation for incident response.
– Developing and maintaining documentation for processes and procedures.