21 days old
2018-07-312018-08-28

Site Reliability Engineer

Waltham, MA 02451
  • Job Code
    636446
  • Payrate
    $45 To $65
job summary:

Duties:

  • Assume the responsibilities and perform the duties of a Site Reliability Engineer (SRE) to support and deliver SaaS / IaaS solutions.
  • Design and implement future state SaaS / IaaS architecture.
  • Enable and implement an OpenShift / Kubernetes platform and associated services.
  • Enable platform services that support continuous delivery and continuous integration.
  • Analyze a variety of approaches to Site Reliability Engineering and the provide pros and cons of different approaches to enable the team to arrive at an agreed upon direction.
  • Develop and administer tools to enable rapid Micro-Service based software deployments.
  • Create operations handbook as required for others to assist in the administration.
  • Collaborate to incorporate automated unit, integration, functional, and performance testing into the Continuous Integration process across multiple projects.
  • Collaborate with the Development, Project Management, and Product Management teams to align projects and other efforts.
  • Evolve and automate processes to increase flexibility within the development and testing of multiple simultaneous projects.
  • Provide Development Project level Support.
  • Build, maintain and deploy the application level software in our development and test regions.
  • Prioritize and troubleshoot development and test region issues.
  • Develop runbooks that detail building, deploying, and troubleshooting processes.
  • Promote and contribute to best practices.
  • Plan and execute tasks within an agile environment.
  • Provide Production Support and monitor Production Regions and Environments.
  • Provide first level support for application software issues in all environments.
  • Prioritize and rapidly troubleshoot issues to ensure maximum uptime and optimal performance for customers in our production environment.

Job Qualifications:

  • 7+ years of experience in SRE, DevOps, Release Engineering, System Operations, or Software Development
  • 3+ years of experience in operating /developing large scale distributed services/applications
  • Excellent organizational, verbal, and written communication skills
  • Demonstrate strong collaboration skills, within function and across peer stakeholders.
  • Extensive experience with Linux and UNIX System Administration. Experience in using Windows.
  • Proficiency with Linux Containers, Docker, Container Solutions, associated Management Tools and challenges.
  • Hands on experience with shell scripting, including Bash, Python, Groovy, etc.
  • Proficiency working within a Java Software Development Team.
  • Experience and deep commitment to the transformation to a DevOps culture focusing on continuous integration - full lifecycle of building, automated and performance testing, and automating deployment.
  • Experience with VMware provisioning of Virtual Machines, Virtual Networking and Storage Resources.
  • Experience with Ansible (preferred), Chef, Puppet or other Configuration Management tools.
  • Experience with Jenkins (preferred), TeamCity or other Continuous Integration tools.
  • Deep knowledge of build tools like Gradle (preferred), Maven, and Ant.
  • Hands on experience with SQL, and DB Release Management.
  • Usage of Jira (preferred), Rally, or other tracking tools.
  • Usage of Confluence (preferred), or other documentation tools.
  • Demonstrate strong problem analysis, problem resolution, and decision making and judgment skills.
  • Demonstrate understanding of complex software architecture, and ability to enhance, support, and troubleshoot same.
  • Demonstrate excellent and effective interpersonal and communication skills (written, verbal and listening), with ability to build positive working relationships with all levels of the organization.
  • Ability to leverage technical know-how to find viable compromises amidst competing business needs.
  • Demonstrate ability to plan and excel in a fast-paced and demanding environment.
  • Solid understanding of agile methodology and Release Engineering and able to leverage what has worked and adapt it to fit new situations.
  • Knowledge of cloud compute technologies, network monitoring, data processing and analytics.

Additional Qualifications:

  • Knowledge of Site Reliability Engineering.
  • Networking Monitoring, Networking protocols, SNMP, syslog, network telemetry, REST API.
  • Exposure to Grafana, Prometheus, Alert Manger, Kafka, Elastic search, and other platforms.
  • Prior experience with Datacenter Monitoring, Service Oriented Systems, and Micro-Services.
  • Contributions towards Open Source projects is big plus.
  • Knowledge and Practice with using Scrum & Agile Methodologies.
  • Master's degree in Computer Science or related field.

Other Knowledge, Skills, Abilities or Certifications:

  • Agile scrum master experience.
  • Able to plan and execute projects as part of a collaborative team.
 
location: Waltham, Massachusetts
job type: Contract
salary: $45 - 65 per hour
work hours: 9 to 5
education: Bachelors
 
responsibilities:

Duties:

  • Assume the responsibilities and perform the duties of a Site Reliability Engineer (SRE) to support and deliver SaaS / IaaS solutions.
  • Design and implement future state SaaS / IaaS architecture.
  • Enable and implement an OpenShift / Kubernetes platform and associated services.
  • Enable platform services that support continuous delivery and continuous integration.
  • Analyze a variety of approaches to Site Reliability Engineering and the provide pros and cons of different approaches to enable the team to arrive at an agreed upon direction.
  • Develop and administer tools to enable rapid Micro-Service based software deployments.
  • Create operations handbook as required for others to assist in the administration.
  • Collaborate to incorporate automated unit, integration, functional, and performance testing into the Continuous Integration process across multiple projects.
  • Collaborate with the Development, Project Management, and Product Management teams to align projects and other efforts.
  • Evolve and automate processes to increase flexibility within the development and testing of multiple simultaneous projects.
  • Provide Development Project level Support.
  • Build, maintain and deploy the application level software in our development and test regions.
  • Prioritize and troubleshoot development and test region issues.
  • Develop runbooks that detail building, deploying, and troubleshooting processes.
  • Promote and contribute to best practices.
  • Plan and execute tasks within an agile environment.
  • Provide Production Support and monitor Production Regions and Environments.
  • Provide first level support for application software issues in all environments.
  • Prioritize and rapidly troubleshoot issues to ensure maximum uptime and optimal performance for customers in our production environment.

Job Qualifications:

  • 7+ years of experience in SRE, DevOps, Release Engineering, System Operations, or Software Development
  • 3+ years of experience in operating /developing large scale distributed services/applications
  • Excellent organizational, verbal, and written communication skills
  • Demonstrate strong collaboration skills, within function and across peer stakeholders.
  • Extensive experience with Linux and UNIX System Administration. Experience in using Windows.
  • Proficiency with Linux Containers, Docker, Container Solutions, associated Management Tools and challenges.
  • Hands on experience with shell scripting, including Bash, Python, Groovy, etc.
  • Proficiency working within a Java Software Development Team.
  • Experience and deep commitment to the transformation to a DevOps culture focusing on continuous integration - full lifecycle of building, automated and performance testing, and automating deployment.
  • Experience with VMware provisioning of Virtual Machines, Virtual Networking and Storage Resources.
  • Experience with Ansible (preferred), Chef, Puppet or other Configuration Management tools.
  • Experience with Jenkins (preferred), TeamCity or other Continuous Integration tools.
  • Deep knowledge of build tools like Gradle (preferred), Maven, and Ant.
  • Hands on experience with SQL, and DB Release Management.
  • Usage of Jira (preferred), Rally, or other tracking tools.
  • Usage of Confluence (preferred), or other documentation tools.
  • Demonstrate strong problem analysis, problem resolution, and decision making and judgment skills.
  • Demonstrate understanding of complex software architecture, and ability to enhance, support, and troubleshoot same.
  • Demonstrate excellent and effective interpersonal and communication skills (written, verbal and listening), with ability to build positive working relationships with all levels of the organization.
  • Ability to leverage technical know-how to find viable compromises amidst competing business needs.
  • Demonstrate ability to plan and excel in a fast-paced and demanding environment.
  • Solid understanding of agile methodology and Release Engineering and able to leverage what has worked and adapt it to fit new situations.
  • Knowledge of cloud compute technologies, network monitoring, data processing and analytics.

Additional Qualifications:

  • Knowledge of Site Reliability Engineering.
  • Networking Monitoring, Networking protocols, SNMP, syslog, network telemetry, REST API.
  • Exposure to Grafana, Prometheus, Alert Manger, Kafka, Elastic search, and other platforms.
  • Prior experience with Datacenter Monitoring, Service Oriented Systems, and Micro-Services.
  • Contributions towards Open Source projects is big plus.
  • Knowledge and Practice with using Scrum & Agile Methodologies.
  • Master's degree in Computer Science or related field.

Other Knowledge, Skills, Abilities or Certifications:

  • Agile scrum master experience.
  • Able to plan and execute projects as part of a collaborative team.
 
qualifications:

Top 3 skills:

Experience and high-level understanding of DevOps patterns for build, release, deployment engineering.

Experience with system, web administration on Linux and high-level understanding of network administration.

Fast learner and proactive self-starter especially regarding process automation using programming.scripting languages like Python, Ruby, Shell scripting

 
skills: Top 3 skills:

Experience and high-level understanding of DevOps patterns for build, release, deployment engineering.

Experience with system, web administration on Linux and high-level understanding of network administration.

Fast learner and proactive self-starter especially regarding process automation using programming.scripting languages like Python, Ruby, Shell scripting


Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

Categories

Featured Jobs

Career News

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Site Reliability Engineer

Randstad Technologies
Waltham, MA 02451

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast