5 days old
2017-12-132018-01-10

Site Reliability Engineer (TechOps)

Washington, DC 20036
  • Job Code
    589268

Title: Site Reliability Engineering (SRE), Engineer (TechOps)


Team Mission


This team currently consists of passionate engineers who strive to demonstrate excellence in the field of DevOps. We are not only responsible for the uptime of the various .COM websites and backend services, but a large portion of the job is to innovate. Historically, once the infrastructure is configured and working - it is mandated that there are no more changes to the system. Not here. We frequently and deliberately rebuild our entire system in an automated fashion. These activities help us not only discover pain points in our system, but it gives us the opportunity to improve continually. How do we guarantee geographic redundancy? How do we get the code to production faster? 15 minutes to rebuild a system, how can we get it down to 5? Instead of Mongo, should we use Cassandra or DynamoDB? Do we even need a database for this application? How can we orchestrate a cloud failover from one provider to another?


Who are we looking for?



  • The SRE Engineer is a part of an innovative team, who are on a continuous mission of building bulletproof, scalable, secure private and public cloud environments for our customers and users.
  • If you think hard is fun, and get bored easily if you aren't challenged, this might be the place for you. We want someone who has an insatiable thirst for technology, desire to learn and grow - individually, with the team, and the business. Someone who has a passion to lead, architect, design, document and implement comprehensive platform solutions using security best practices.
  • This is an extremely challenging position but would be the perfect fit for someone who wants to contribute and grow.


The Challenge:



  • The SRE Engineer is responsible for any and all tasks related to the performance, stability, reliability, efficiency, and security to both the sites and the general team operations. Responsibility also extends to how incidents are managed and operated.
  • Proactive relationship building and communication essential in this role. This includes engagements with SRO's, Clients, and 3 rd -Parties to ensure continuous improvements in system architecture, deployments, automation, and configuration management.
  • Establish the service delivery culture for our business, building best-in-class service engineering capabilities in the SRE team.
  • Work across the engineering team to influence software development to meet the cloud needs and influence product and cloud engineering to improve the manageability and the supportability of the cloud products.
  • Design and develop complete end to end automation environment using configuration/auto-scaling tools.
  • Lead architecture, monitoring, performance optimization and capacity planning of new infrastructure services to support a high-performance computing environment and ensure 99.9%+ uptime.
  • Respond to off-hours and weekend emergency alerts, alarms, and requests, in keeping with the team's on-call rotation schedule.
  • Work closely with Architects, Security Engineers, Product Managers, SRO and other clients and partners of the SRE team to meet the needs of the organization to stay competitive - from the infrastructure up to the highest level of applications.
  • Strategize with the teams to develop new technology initiatives with a primary focus on availability, supportability, scalability, security, and performance.
  • Configure and tune an enterprise monitoring and instrumentation system(s) to efficiently detect existing issues and predict future issues based on trends.
  • Stay up-to-date with technology. Recurrently advance your technical skill-sets.
  • Continuously improve via taking justifiable risks, not being afraid to fail.
  • Be flexible and at the same time push back respectfully to ensure we are doing what is best for the company in the long run.
  • Hold vendors accountable and set the bar high, ensure they deliver above expectations.



  • 5+ years of hands-on experience as an individual contributor in a systems administration/development or DevOps role working on highly scalable distributed systems.
  • Experience supporting mission-critical platforms, both physical and virtualized environments, using CentOS, RedHat, Ubuntu.
  • Experience designing, building and managing large scale infrastructure in AWS and Rackspace, including experience leveraging one or more coding languages for automation.
  • Strong knowledge and experience using Python to build and automate.
  • Proven experience leading positive change, cultivating product technology visions and innovative solutions, and fostering effective engineering practices and culture.
  • Experience in driving process improvements, with a strong focus on leveraging technology for the establishment of fluid interactions and interfaces between teams.
  • Ability to communicate and transfer knowledge clearly and effectively in both technical and non-technical manners.
  • Strong ability to prioritize and multi-task in a fast-paced environment.


List of technologies (We tend to be technology agnostic - whatever is the best fit for the job):



  • Automation: Ansible, Puppet, Jenkins, Bamboo, Rundeck
  • Repositories: GIT
  • Web Architectures: NodeJs, LAMP Stack, Java, JBoss, Tomcat, AEM
  • Scripting: Python, Bash
  • Cloud Providers: AWS (CF, AWS CLI, Botocore, Lambda, ECS, Beanstalk, etc.)
  • CDN: Akamai, CloudFront
  • Database: MySQL, Postgres, Mongo, Redshift, Dynamo
  • Containerization/disposable environments: Docker, Vagrant
  • Network Operation Tools: Icinga2, New Relic, Logstash, Elasticsearch, Nagios
  • Operating Systems: CentOS, RHEL, Ubuntu
  • Collaboration Tools: Jira, Confluence, Slack

Categories

  • Information Technology

Randstad utilizes a technology-driven focus with a human touch to provide better staffing and business solutions to organizations around the world. Our team of experts match professionals with available career opportunities in a variety of fields.

Featured Jobs

Career News

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Site Reliability Engineer (TechOps)

Randstad Technologies
Washington, DC 20036

Share this job

Site Reliability Engineer (TechOps)

Randstad Technologies
Washington, DC
US

Separate email addresses with commas

Enter valid email address for sender.

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast