Skip to content

Latest commit

 

History

History
367 lines (276 loc) · 35.8 KB

individual_site_reliability_engineering.md

File metadata and controls

367 lines (276 loc) · 35.8 KB

Kensho - Individual Contributor - Site Reliability Engineering

Technical Excellence
  • Tools and Tech
  • Technical Problem Solving
  • Architecture
  • Implementation and Forethought
  • Code/Design Review
  • Collaboration
  • Debugging & Support
Site Reliability Engineer I Site Reliability Engineer II Senior Site Reliability Engineer I Senior Site Reliability Engineer II Staff Site Reliability Engineer Principal Site Reliability Engineer
Leadership
  • Intellectual Curiosity
  • Innovation
  • Decision Scope
  • Operational Capacity
  • Collaboration Sphere
  • Conflict Resolution
  • Self-Development
  • Adaptability
  • Judgement
  • General Problem Solving
Site Reliability Engineer I Site Reliability Engineer II Senior Site Reliability Engineer I Senior Site Reliability Engineer II Staff Site Reliability Engineer Principal Site Reliability Engineer
Autonomy
  • Bias to Action
  • Influence on Work
  • Resourcefulness
  • Delegation
  • Prioritization
Site Reliability Engineer I Site Reliability Engineer II Senior Site Reliability Engineer I Senior Site Reliability Engineer II Staff Site Reliability Engineer Principal Site Reliability Engineer
Teamwork
  • Feedback
  • Communication
  • Participation
  • Go Team
Site Reliability Engineer I Site Reliability Engineer II Senior Site Reliability Engineer I Senior Site Reliability Engineer II Staff Site Reliability Engineer Principal Site Reliability Engineer

Site Reliability Engineer I

Technical Excellence

Tools and Tech Understands most processes and principles and is familiar with the essential technologies for their work (e.g. git, JavaScript, python, etc.); relies on reference documentation as needed to develop their understanding of core technologies. Understands the tools used to run a service in production. (e.g. k8s, jsonnet, KD, Jenkins, etc)
Technical Problem Solving Independently develops well-scoped features requiring a single deploy. Able to deploy a service in all environments independently. (e.g. stage, prod with unit/smoke tests). Understands where underlying data is stored. (e.g. requests and responses, models, model metadata). Understands the process for software version updates and model updates.
Implementation and Forethought Identifies and uses the standard patterns across applications. Improvises as needed (e.g. library/modules). Implementations perform as specified under typical conditions; observables (e.g. logs and metrics) capture concrete actions.
Code/Design Review Actively participates in own and teammates code and design reviews; responsive to comments and change requests in a timely manner.
Collaboration Understands the architecture of few of the services in production. Participates in SRE design reviews and postmortems.
Debugging Able to use necessary tools to identify the root cause of the services in production. Debugs reproducible issues in their own code with full context.

Leadership

Intellectual Curiosity Exercises curiosity in all things. Seeks understanding of company and engineering philosophy and principles. Seeks alternative ways to solve the same problem; proactively asks for feedback or input.
Innovation Shares new ideas and perspectives to solving common problems.
Decision Scope Accountable for decisions made regarding completion of own daily tasks and projects as directed by manager.
Operational Capacity Contributes mainly in a tactical capacity.
Collaboration Sphere Works to build stable internal relationships with peers in department and immediate supervisor.
Conflict Resolution With manager support, resolves conflict.
Self-Development Partners with manager to set personal goals.
Adaptability Establishes a growth mindset by setting and acting upon personal skills goals; discusses goals with manager. Shifts focus and priorities as directed by manager and/or company leadership.
Judgement Exercises sound judgement in all things; seeks counsel from manager, aligned Senior Leader, and/or HR as needed. Seeks to be a positive force on Kensho.
General Problem Solving Learns to use professional concepts. Applies company policies and procedures to resolve routine issues.

Autonomy

Bias to Action Self-starter, but does receive detailed direction on most work.
Influence on Work Primary day-to-day body of work. Limited influence beyond immediate team.
Resourcefulness Seeks existing documentation. Takes full advantage of all training opportunities. Attempts to unblock self; consults teammates when stuck.
Delegation Manager drives most day-to-day work. Proactively seeks to add to current body of work.
Prioritization Priorities are usually determined by manager.

Teamwork

Feedback Open to giving and receiving feedback on code review, design review, and during quarterly performance evaluations. Regularly seeks feedback from peers and managers. With manager support, takes action on areas of development. Provides constructive and actionable feedback to peers and managers.
Communication Practices open and transparent communication with all levels of the organization. Clearly and concisely communicates status updates (state of work, timelines, estimates, blockers, progress, knowledge gaps, PTO/sick time) with stakeholders. Updates documentation when they identify deficiencies or gaps. Shares ideas with teammates, manager, and company leadership.
Participation Attends and actively participates in all team meetings; pays close attention to details.
Go Team Understands their work contributes to Kensho's overall success and does whatever necessary to see it through, without doing harm.

Site Reliability Engineer II

Technical Excellence

Tools and Tech Experienced with essential technologies for their role; may occasionally reference documentation for technologies. Proficient with developer tools (i.e. IDE, shell) to increase workflow efficiency. Proficient with all the tools needed to run a cluster (autoscaler, linkerd, etc). Able to provision services in AWS (RDS, redis, s3, terraform).
Technical Problem Solving Independently develops features scoped to entire application components; may involve multiple releases. Independently migrate/launch a new service in production. (e.g. SRE Migration of a service which include S3, ALB, traffic splits, envoy, alerts, etc). Understands how models are created, updated and used.
Architecture Evaluates technology choices within a specific application component; considers tradeoffs between complexity, memory usage, performance, and speed of implementation.
Implementation and Forethought Implementations perform as specified under typical conditions. Abnormal behavior from other components of the project is managed and resolved. Observables capture the state of the application. Independently provisions all dependencies for a service (e.g. Database, Load Balancer, ingress, dashboards, etc)
Code/Design Review Provides thorough, timely code reviews. Provides feedback on design docs about specific areas of expertise.
Collaboration Can evaluate the customer reliability of the service (SLO, SLA, SLI, etc). Proficient with continuous maturity model. Conducts blameless postmortems.
Debugging & Support Debugs reproducible issues in products with which they have familiarity even with less than full context. Increases efficiency by shortening debugging loops. Can provide access and onboard users for our services. Independently resolves customer issues and mitigate production issues. Participates in OPS and SRE rotations.

Leadership

Intellectual Curiosity Exercises curiosity in all things; encourages peers to seek ansrers. Demonstrates strong understanding of company and engineering philosophy and principles. Provides evidence and counter proposals to support ideas.
Innovation Takes appropriate amount of risk and tries new ways of doing old things. Learning to be comfortable with failure and seeks to learn from it. Shares lessons with peers and teammates.
Decision Scope Independently determines completion path of own daily tasks and projects.
Operational Capacity Supports the tactical initiatives for the relative area; seeks opportunities to lead certain initiatives.
Collaboration Sphere Builds productive internal/external working relationships with department and functional leadership.
Conflict Resolution Resolves most conflict independently, immediately makes manager aware of issue and proposed solution.
Self-Development Independently sets and achieves personal goals; assesses own progress regularly.
Adaptability Regularly asks for and acts upon feedback to continually improve and adjust personal skills goals. Assesses and shifts priorities with minimal disruption.
Judgement Exercises sound judgement in all things; seeks counsel from manager, aligned Senior Leader, and/or HR as needed. Behaviors and actions demonstrate good intent and positive impact on Kensho.
General Problem Solving Develops professional expertise, applies company policies and procedures to resolve a variety of issues.

Autonomy

Bias to Action Mostly independent. Normally receives little instruction on day-to-day work, general instructions on new assignments. Seeks to independently achieve results as often as possible. May act as more experienced mentor to junior teammates.
Influence on Work Primary day-to-day body of work; regularly assesses own strengths, scope of work, and level of effectiveness. May influence team based projects and tasks.
Resourcefulness Seeks out and maintains existing documentation. Regularly attends all training opportunities offered by Kensho. Actively seeks additional training/learning opportunities outside of Kensho. Usually unblocks self; foresees when assistance is required.
Delegation Seeks to take level-appropriate tasks from manager; offers opportunities to teammates.
Prioritization Usually determines own priorities; able to manage multiple priorities. Manager may help sort out priorities if conflicts arise.

Teamwork

Feedback Proactively provides clear, actionable feedback to peers and managers in code review, design review and during quarterly performance evaluations. Regularly seeks feedback from peers and managers. Independently takes action on areas of development.
Communication Proactively shares knowledge with teammates. Proactively communicates current state of the work (and all other related matters) to stakeholders. Proactively communicates concerns and blockers to peers and/or managers as appropriate. Proactively updates documentation when updating code.
Participation Actively participates in retrospectives, provide detailed estimates, detailed explanations of technical decisions (if necessary.) Observes gaps in current processes and offers suggested solutions. Offers new ideas to enhance experience of team members.
Go Team Acts as an internal cheerleader to peers and teammates. Seeks out and actively engages in opportunities to further Kensho's mission (e.g. career fair ambassador, lightning talks, interview panelist, etc.)

Senior Site Reliability Engineer I

Technical Excellence

Tools and Tech Deeply experienced with all essential technologies for their role. Recognized as a team expert on particular tools and/or technologies. Employs unfamiliar reference documentation efficiently and judiciously for atypical use of technologies. Deep experience with all essential technologies involved in production.
Technical Problem Solving Independently identifies the future needs and follows up with implementation. Independently develops features across an entire application/product/tool; deployment of these features potentially takes place across a significant number of deploys. Independently identifies and upgrades the technology involved in production (e.g. k8s, linkerd, prometheus, etc). Able to recognize model update/upgrade process. (e.g. Where the models are created, framework used, storage and serving of input data, models, etc)
Architecture Evaluate tradeoffs between complexity, memory usage, performance, reliability, and suitability for product objectives.
Implementation & Forethought Abnormal behavior from dependencies is handled when possible; observables highlight actionable problems with the system. Scope is primarily within own project. Observability and migration expert (e.g. alerts/graphs). Identifies features and implements to reduce toil. Implements tools to track SLI, error budgets, mean time * metrics
Code/Design Review Provides thorough, timely code reviews. Sought out as a reviewer for more complicated changes affecting an application. Provides feedback on all areas of design docs.
Collaboration Suggests improvements/enhancements to the technologies and also introduces better/advanced technology. Covers application dependencies and also monitoring/alerting tools. Train L1 folks and delegate more responsibilities. Train fellow SREs on handling pages (wheel-of-misfortune, etc.) Ability to convince stakeholders about the performance of our services and build trust and confidence.
Debugging & Support Debugs issues in products with which they have familiarity even with incomplete context. Offers mitigating or partial solutions when appropriate. Can identify the root cause of issues (both production and customers) and resolves independently. Owns postmortems. Able to prepare services for disasters. (e.g. documentation, create & restore from backups, etc)

Leadership

Intellectual Curiosity Exercises curiosity in all things. Builds good relationships with teammates and stakeholders. Reinforces and shares company and engineering philosophy and principles.
Innovation Assesses associated risk of failure and learning trade-offs, and seeks to fail and learn frequently and rapidly to ultimately achieve desired results. Often partners with junior teammates to increase failure/learning opportunities.
Decision Scope Determines completion path of own daily tasks and projects; may influence how peers do their work. Considers ripple effects and impact (present and future) of decisions. May create alternative scenarios to help undo a bad decision.
Operational Capacity Leads tactical initiatives related to an important business area or process.
Collaboration Sphere Networks with key contacts outside own area of expertise. Seeks out regular opportunities to collaborate strategically with functional senior staff and leadership.
Conflict Resolution Resolves most conflicts independently. May identify conflicts with which they are not directly involved within the team; understands when to seek escalation paths to management and/or HR.
Self-Development Seeks to work beyond own level of comfort. Demonstrates active interest in learning more about other areas outside immediate focus.
Adaptability Focuses primarily on what is within one's own control to change or influence. Seeks to understand why company or org decisions are made in order to better adapt team decisions. Follows up on team decisions by focusing on the facts and adjusts accordingly & empathetically.
Judgement Exercises sound judgement in all things. Behaviors and actions set positive examples internally and proudly represent Kensho externally.
General Problem Solving Having wide-ranging experience using professional concepts and company objectives to resolve complex issues in creative and effective ways.

Autonomy

Bias to Action Mostly independent. Often acts as experienced mentor. Often consults with peers to make suggestions on critical next-step decisions.
Influence on Work Determines methods and procedures on new assignments. May coordinate activities of other individual team members.
Resourcefulness Prunes and maintains documentation for a project even when not actively making changes. Regularly attends internal and external training/learning opportunities; may coordinate team-wide opportunities.
Delegation Assigns or seeks opportunities for teammates. Managers are consulted, but remain mostly hands off based on what make sense.
Prioritization Remains focused in the face of interruptions and distractions. Proactively communicates to team/manager if timelines will slip. Runs efficient, effective, and succinct meetings. Able to accurately assess self and teammates' progress and difficulties; may assist more junior teammates in prioritizing their work.

Teamwork

Feedback Regularly provides clear, actionable, direct and compassionate feedback to peers and managers to help them thrive. Is often sought out to provide feedback to peers; is becoming comfortable delivering constructive feedback. Constantly seeking and acting upon feedback from peers and managers.
Communication Practices open and transparent communication with all levels of the organization. Regular knowledge and idea sharing with team, collaborating and discussing technical topics cross-team, frankly communicate with all project stakeholders, firm grasp of presentation skills, etiquette, and slide creation (in-person and remote).
Participation Consistently involved in all team activities and conversations, often leading them. Suggests ideas to improve the team and company culture and enhance existing processes. Hosts, coordinates, and attends various team events (social or otherwise).
Go Team Invests ample time to better Kensho culture and progress Kensho's vision by acting both internally and externally, and is willing to make sacrifices in order to do so.

Senior Site Reliability Engineer II

Technical Excellence

Tools and Tech Is sought out / seen as an expert in essential technologies for the role. Provides guidance regarding best practices. Deep experience with all essential technologies involved in production. Independently identifies the future needs and follows up with implementation. Evaluates and articulates the impacts of tradeoffs in complexity, memory usage, performance, reliability, and suitability for product goals across projects. Automates technology for easy/no effort adaption. (e.g. cluster-wide metrics and alerts).
Technical Problem Solving Comes up with innovative ways to automate operational work. Understands model validation and graduation process (e.g. Graduate a model from stage to production). Improves the efficiency of our resources (e.g. Nodes/replica usage percentage, efficient and automated upgrades).
Code/Design Review Provides guidance to colleagues in their system designs.
Collaboration Development of customized SRE policies and guidelines. (e.g. Database migration/upgrades, app migrations, model updates, caching of models/images). Work with applications and advocate to adapt common technologies. (e.g. Testing/validation methodologies).
Debugging & Support Is sought out by teammates to help debug issues in products with which they have familiarity even with incomplete context. Offers mitigating or partial solutions when appropriate. Recognize common patterns of problems. Create a plan for resolution. Automated testing of disaster recovery scenarios. Automated gathering of data for issues.

Leadership

Innovation Assesses associated risk of failure and learning trade-offs, and seeks to fail and learn frequently and rapidly to ultimately achieve desired results. Frequently partners with junior teammates to increase failure/learning opportunities.
Collaboration Sphere Proactively builds strong relationships with key contacts outside own area of expertise, Creates regular opportunities to collaborate strategically with functional senior staff and leadership.
Conflict Resolution Is sought as a mediator to resolve technical conflicts on the team.

Autonomy

Bias to Action Mostly independent. Often acts as experienced mentor. Often consulted by peers for suggestions on critical next-step decisions.
Influence on Work Sets expectations for new projects. Coordinates activities of other individual team members
Delegation Understands and takes advantage of teammates strengths.

Teamwork

Feedback Is often sought out by managers to provide upward feedback.

Staff Site Reliability Engineer

Technical Excellence

Tools and Tech Experienced with all essential technologies for their role. Recognized as a company expert on particular tools/technologies. May contribute to technologies via bug reports/fixes or performance improvements when possible. Recognized as the company expert for the SRE. Mastered most of SRE concepts.
Technical Problem Solving Works on and initiates technical projects that span multiple applications/products at Kensho. Solutions may involve coordinating deploys across many services. The scope of technical projects typically requires a collaborative interaction mode in contrast to x-as-a-service. Detects system- or project-level estimation and timeline issues. Notices bugs in cross-team behavior or ability to deliver. Improves velocity on large, months-long projects. Recognizes new technology necessary for Reliability, Observability and Scalability of all services and recognizes new technology necessary for model management and inference in production. (e.g. Collection of metrics, serving, storage and sync models).
Architecture Evaluates technology tradeoffs between complexity, memory usage, performance, reliability, and suitability for long term sustainability at Kensho. Articulates how those trade-offs manifest across teams and monitors suitability of team structures for the desired architecture(s). Understands architectures of all the running services and dependencies. Can suggest changes to a service to make it available in multi-region. Innovates and develops new tools/tech to eliminate most of the toil and reduce operation workload.
Implementation & Forethought Consistently builds software which is resilient to unexpected failures. Observability extends beyond project boundaries. Can adapt software to fit our needs. Carves a path for usage without disrupting any service. Takes actions based on error budgets.
Code/Design Review Provides thorough, timely code reviews. Sought out as a reviewer for more complicated changes affecting multiple applications, or changes which are identified as requiring more thorough review. Provides design feedback on integrating applications into the broader Kensho ecosystem.
Collaboration Handles SRE migrations completely. Recognizes the need for re-reviews. Keeps the rest of the SRE team updated on changes to services, subtle implications and the effects on customers. Can migrate applications from Kensho to any S&P division.
Debugging & Support Debugs issues even in unfamiliar products with incomplete context. Offers mitigating or partial solutions; may drive and/or take ownership of solutions. Can reduce the duration and frequency of failures. Can identify and fix all of the production issues. Provides executive summary for all incidents to executives and customers. Owns disaster management for all services. Creates environment to mock disasters and improve disaster recovery process regularly.

Leadership

Intellectual Curiosity Exercises curiosity in all things. Builds good relationships with managers and those leading products and initiatives. Understands when to be pragmatic and helps managers and those leading products and initiatives to mitigate situations where conditions are not ideal. Informs engineering philosophy and principles.
Innovation Accustomed to the risks and benefits of failing and learning fast and does so frequently. Supports junior teammates through the learning process to achieve desired results.
Decision Scope Makes decisions that influence and drive the work of their peers and teammates. Plans always include multiple, well-thought out contingencies.
Operational Capacity Works on significant and unique issues where analysis of situations or data requires an evaluation of intangibles that is critical to the strategic planning and execution of important business processes.
Collaboration Sphere Creates formal networks involving coordination among groups and builds mostly strategic relationships with peers and company leadership.
Conflict Resolution Addresses and resolves/diffuses conflict proactively, before escalation is required. Attuned at identifying conflict within own team; aids in resolving/diffusing promptly, professionally.
Self-Development Demonstrates ability to take on responsibilities outside their immediate area of focus. Consistently creates opportunities to challenge themselves in new ways.
Adaptability Consistently reframes obstacles as opportunities. Solicits input from teams on ideas for mitigating obstacles and change. Seeks to understand historical decisions to avoid repeating mistakes and/or reassess decisions.
Judgement Exercises sound, independent judgement in all things, especially with methods, techniques and evaluation criteria for obtaining results. Work/business-related decisions often reflect the notion of putting company before self.
General Problem Solving Having broad expertise or unique knowledge, uses skills to contribute to development of company objectives and principles and to achieve goals in creative and effective ways.

Autonomy

Bias to Action Acts independently to determine methods and procedures on new or special assignments. May supervise the activities of others in some capacity. Makes swift decisions based on knowns, with consideration given to the unknowns, and accepts consequences of those decisions as long as desired outcomes are achieved.
Influence on Work Leverages own experience to successfully manage and direct cross team commitments and timelines. Influences work and team in a broad way, often at the departmental level.
Delegation Consistently provides broader context. Increases own technical leverage by delegating responsibility over outcomes, not tasks.
Prioritization Often determines the priorities and timelines of own and/or other teams. Understands and implements time effectiveness principles to keep self and others on track.

Teamwork

Feedback Is a primary source of feedback to peers, teammates, and managers. Versed and comfortable delivering constructive feedback. Understands which elements of feedback are critical to their own success; implements changes as needed.
Communication Practices open and transparent communication with all levels of the organization. Communicates deeply technical topics appropriately to a variety of audiences. Skilled in persuasive presentation (i.e. pitching the Eng function on a new technology). Demonstrable ability to utilize rhetoric and prose to inspire, build engagement, and foster high-trust teams. Competent presenting to enterprise and Kensho executives; also capable of external presentations with support.
Participation Models the behavior of involvement and interaction expected within the team. Proactively shares ideas and opportunities where the team members can participate and develop their skills. Consolidates ideas from team members and brainstorms on ways to implement it at a larger scale across the organization. Supportive of and encourages participation outside of the day to day work wherever it makes sense.
Go Team Embodies the team player mindset, infusing it into all actions and behaviors. Seeks to constantly better Kensho's culture and brand.

Principal Site Reliability Engineer

Technical Excellence

Tools and Tech Experienced with all essential technologies for their role. Recognized as a company expert on many tools/technologies. Potentially recognized outside Kensho as an expert on particular tools/technologies.
Technical Problem Solving Works on technical projects that span all applications at Kensho and potentially includes applications developed with partners. Solutions may involve deploys coordinated across multiple organizations. Works on bringing in new workflows and policies. (e.g. Changes to maturity model, migration policies). Advocates SRE mindset and principles throughout the company.
Architecture Shares engineering lessons learned from own experience and/or from networks and industry knowledge. Delivers public talks and publish blog posts on unique adoption of technologies.
Implementation & Forethought Oversees the deployment of new concepts/technology. Has forethought to know if a choice is going to be a failure even before implementation. Open sources SRE tools developed in Kensho.
Code Design/Review Provides design feedback to multiple application teams. Understands third party/open source architecture of products to fix issues arising from them.
Debugging & Support Debugs issues in unfamiliar products with limited context. Provides mitigations at multiple levels of the stack. Can diagnose and fix issues in third party software. Provides patches to open source software. Able to handle all aspects of production issues (e.g. technology, communication, management of incidents, resolution). Performs and evaluates disaster management strategies via mock exercises.

Leadership

Intellectual Curiosity Exercises curiosity in all things. Builds good relationships with cross-functional leaders.
Innovation Assesses risk and reward of failure velocity at a larger, departmental or even company-wide scale. Suggests opportunities for more frequent failure and learning. Operates as an innovation champion within their field or function.
Decision Scope Decisions often drive functional, if not company, direction. Consistently demonstrates awareness for when to be directive or decisive versus when to build consensus.
Operational Capacity Leads important business areas and/or process/processes. Accountable for the business decisions related to their area of focus and beyond. Works on significant, highly visible and highly impactful issues where analysis of situations or data requires an evaluation of intangibles, and an anticipation of the unforeseen.
Collaboration Sphere Builds and nurtures strategic relationships with executive management and senior business leadership. May offer suggestions and advice to company leaders to arrive at solutions.
Conflict Resolution Supports peers and teammates in resolving and diffusing conflict. May be the point of escalation.
Adaptability Consistently persists above obstacles to achieve desired objectives. Balances confidence in convictions with open-mindedness. Often balances multiple priorities, successfully pivoting to and from each one.
Judgement Exercises sound, independent judgement in all things, especially for obtaining results. Decisions are most always made through the lens of benefiting Kensho.

Autonomy

Bias to Action Acts independently, often making critical business-impacting decisions sometimes with limited information.
Influence on Work Set goals for department and/or function based on company objectives; may also contribute to setting company objectives. Provides guidelines for organizational documentation best practices. Significant influence over meta-Software issues such as selecting the right languages, frameworks, serverless, on-prem, etc. Anticipates and addresses future org-wide needs based on company direction.
Delegation Delegates larger-scoped responsibilities across the Engineering function, and to stakeholders outside of Engineering. Understands when and how to decline requests while motivating others to be directly involved.

Teamwork

Communication Practices open and transparent communication with all levels of the organization. Exemplary rhetorical skill across technical and non-technical topics. Actively knowledge shares within industry. Collaborates on marketing material (e.g. blog posts). Tone of internal and external presentations is often visionary, inspirational, and motivational.
Participation Models the behavior of involvement and interaction expected within the team. Proactively shares ideas and opportunities where team members can participate and develop their skills. Consolidates ideas from team members and brainstorms on ways to implement it at a larger scale across the organization. Supportive and encourages participation outside of the day to day work wherever it makes sense.
Go Team Operates as pillar of exemplary behavior. Recognized as an active partner in shaping the future vision of Kensho.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.