Site Reliability Engineer
1 day ago
Sword Services Greece S.A. is seeking to recruit a high-caliber Site Reliability engineer. The successful candidate will responsible for ensuring the reliability, performance, and availability of our critical platforms: Kong (API Management), Solace (Messaging), Mulesoft (iPaaS), and Informatica ETL). This role requires a deep understanding of distributed systems, cloud technologies, and a passion for building resilient and scalable platforms.
This role requires a deep understanding of distributed systems, cloud technologies, and a passion for building resilient and scalable platforms.
Responsibilities
- Ensure the reliability and availability of the Kong, Solace, Mulesoft, and Informatica platforms, applying SRE principles of automation, monitoring, and continuous improvement.
- Proactively identify and resolve potential issues before they impact production environments, using data-driven insights and predictive analysis.
- Develop and implement comprehensive monitoring and alerting systems to ensure platform health and performance.
- Collaborate with the Support team and conduct thorough post-incident reviews with the goal of continuous improvement of the reliability of the platform.
- Conduct root cause analysis (RCA) for incidents and implement preventative measures, with a focus on automation and systemic solutions.
- Collaborate with development, operations, and security teams to ensure smooth platform operations, promoting a culture of shared responsibility for reliability.
- Take ownership of platform SLAs and SLOs, ensuring they are met or exceeded, and proactively identifying opportunities for improvement.
- Evaluate and implement new tools and technologies to improve platform reliability and efficiency, staying up-to-date with the latest SRE trends and technologies.
Chaos Engineering & Resilience
- Design, implement, and execute chaos engineering experiments to proactively identify weaknesses and vulnerabilities in the integration platforms.
- Develop and maintain a chaos engineering framework to systematically test the resilience of the platforms under various failure scenarios.
- Analyze the results of chaos experiments and collaborate with engineering teams to implement improvements to enhance platform resilience.
- Participate in the design and implementation of fault-tolerant and self-healing systems.
Disaster Recovery & Business Continuity
- Collaborate with DevOps engineers to develop, maintain, and test disaster recovery plans for the integration platforms.
- Participate in disaster recovery exercises to validate the effectiveness of the plans and identify areas for improvement.
- Ensure that disaster recovery plans are aligned with business continuity requirements.
- Implement and maintain backup and recovery procedures for critical platform components.
Upstream/Downstream Dependency Management
- Analyze the dependencies of the integration platforms on other systems (e.g., API Gateway, backend services) and assess the impact of their reliability on the overall service.
- Implement monitoring and alerting to detect issues in upstream and downstream systems that could affect the integration platforms.
- Collaborate with other teams to improve the reliability and performance of dependent systems.
- Design and implement strategies for handling failures in dependent systems, such as circuit breakers, retries, and fallbacks.
Qualifications
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 5+ years of experience in a similar role, with a focus on platform reliability and operations, preferably with experience in a Site Reliability Engineering (SRE) environment.
- Strong understanding of Kong API Gateway, Solace PubSub+, Mulesoft Anypoint Platform, and Informatica PowerCenter.
- Experience with cloud platforms such as AWS, Azure, or GCP.
- Proficiency in scripting languages such as Python, Bash, or Go.
- Experience with infrastructure-as-code (IaC) tools such as Terraform or Ansible.
- Experience with monitoring and alerting tools such as Datadog.
- Strong understanding of networking concepts and protocols.
- Excellent problem-solving and troubleshooting skills.
- Excellent communication and collaboration skills, with the ability to effectively communicate technical information to both technical and non-technical audiences.
- Strong understanding of Site Reliability Engineering (SRE) principles and practices.
- Experience with containerization technologies such as Docker and Kubernetes.
- Experience with CI/CD pipelines and automation tools.
- Relevant certifications (e.g., AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, Google Cloud Professional Cloud Architect).
- Experience with Agile development methodologies.
Applications must be in English.
-
Site Reliability Engineer
3 days ago
Athens, Attica, Greece Coca-Cola HBC Full time €25,000 - €45,000 per yearDepartment:Data & Automation, Digital & Technology Platform Services.As aSite Reliability Engineer - Master Data Governance, you will play a critical role in ensuring the reliability, scalability, automations and performance of our Data Governance production systems. You will work closely with our development and operations teams to build and maintain...
-
Site Reliability Engineer
1 day ago
Athens, Attica, Greece Vodafone Full time €25,000 - €45,000 per yearJoin UsAt Vodafone, we're not just shaping the future of connectivity for our customers – we're shaping the future for everyone who joins our team. When you work with us, you're part of a global mission to connect people, solve complex challenges, and create a sustainable and more inclusive world. If you want to grow your career whilst finding the perfect...
-
Site Reliability Engineer
3 days ago
Athens, Attica, Greece Coca-Cola HBC Full timeDepartment:Data & Automation, Digital & Technology Platform Services.Location:Greece.As aOperations Engineer specializing in SAP BW, you will play a crucial role in ensuring the reliability, scalability, automation, and performance of our SAP BW system. You will collaborate closely with development and operations teams to enhance system reliability, optimize...
-
Senior Site Reliability Engineer
1 week ago
Athens, Attica, Greece Playnetic Full time €90,000 - €120,000 per yearEstablished in 2023, Playnetic is a new player in the world of gaming entertainment. We design and build slot games from scratch - from idea to release. Our games will be played in regulated markets globally through industry-leading operators. Our innovative gaming content is centred around our core values: quality gaming, dedicated customer service, and...
-
Ass. Site Reliability Engineer
2 weeks ago
Athens, Attica, Greece Vodafone Full time €30,000 - €60,000 per yearJoin UsAt Vodafone, we're not just shaping the future of connectivity for our customers – we're shaping the future for everyone who joins our team. When you work with us, you're part of a global mission to connect people, solve complex challenges, and create a sustainable and more inclusive world. If you want to grow your career whilst finding the perfect...
-
Lead Site Reliability Engineer
1 week ago
Athens, Attica, Greece Workable Full time €60,000 - €120,000 per yearFor over 31,000 growing businesses and HR teams seeking a comprehensive, all-in-one HR suite, Workable emerges as the premier solution. We uniquely combine the world's most widely adopted Applicant Tracking System (Workable Recruiting) with a full-spectrum employee management system (Workable HR). At Workable, we empower companies to focus on what truly...
-
Site Reliability Engineer
3 days ago
Athens, Attica, Greece wherewework Hellas Full time €40,000 - €60,000 per yearon behalf of: Peoplora Intl. Job details Own service reliability: SLOs/SLIs, capacity planning, and incident management.Automate deployments and recovery; reduce toil with tooling and runbooks.Co-design for performance and resilience with engineering teams.This position is for one of our international clients. Please apply by completing your CV on our...
-
DevOps Site Reliability Engineer
3 days ago
Athens, Attica, Greece HFM Full time €40,000 - €60,000 per yearAbout HFM: HFM is an internationally acclaimed multi-asset broker, delivering cutting-edge trading tools, platforms, and conditions to traders worldwide. We are committed to innovation, transparency, and excellence in the financial markets. Your role at HFM: Assist in monitoring production systems to ensure availability, reliability, and performance....
-
DevOps / Site Reliability Engineer (SRE)
2 weeks ago
Athens, Attica, Greece NXCODE Full time €40,000 - €60,000 per yearΑναζητούμε DevOps / Site Reliability Engineer με εμπειρία στη διαχείριση Linux servers, αυτοματοποίηση διαδικασιών και βελτιστοποίηση υποδομών.Θα αναλάβεις τη συντήρηση, ασφάλεια και συνεχή διαθεσιμότητα των...
-
Senior DevOps/Site Reliability Engineer
1 day ago
Athens, Attica, Greece eSHARE Full time €60,000 - €90,000 per yearWho is eSHARE?eSHARE is a leading provider of enterprise software solutions for file sharing and content collaboration with external parties using Microsoft 365. We enable organizations to engage their clients, partners and suppliers easily and securely using the productivity tools and workflows users are already familiar with – Teams, SharePoint Online,...