Site Reliability Engineer
Chicago, IL, USA
Cohesion is a leading Intelligent Buildings software solution that is disrupting how buildings operate and how people engage with buildings – from real estate owners/investors, operators, building engineers, to tenants and visitors. Our cutting-edge converged IoT-enabled platform brings together building systems, building software, and business applications into a single portal for on web and mobile platforms to forge the path to autonomous buildings.
We are searching for a Site Reliability Engineer to join our rapidly growing team in our Chicago office. This role will report to the Program Director. We are small and our people wear many hats; this role is no different. The Site Reliability Engineer builds, maintains, analyzes, and revises the telemetry, alerts, and visualizations that help everyone from developer to C-suite executive know what they need to know about the health of the platform. The Site Reliability Engineer partners with Platform and DevOps engineers to harden the platform. Success looks like a platform we do not have to talk about – it just works. We all see a dashboard of green lights and can go to sleep at night knowing the system is happy and healthy. The right candidate will enjoy a fast-paced environment, working through challenging problems, and enabling several delivery teams to operate with world-class efficiency, sustainability, and quality.
You might be the perfect fit for this role if you:
•Get stuff done
•Are a servant leader
•Never give up
•Can figure stuff out on your own but love to collaborate with others
•Love tinkering around to find the system’s weaknesses and engineer to get rid of them
•Have a knack for getting to know everyone, anticipating what they want, and tactfully educating them about why change for the common good is a good thing
•Know how to spot the mistakes that will hurt long term and get ahead of them in the short term
•Can use data to tactfully and clearly articulate improvement areas and inefficiencies
•Thrive in chaos but move progressively toward consistent best practices in the right places at the right times
What You'll Do
Whatever it takes: Day 1 is kind of a freebie because of onboarding, but from Day 2 on, what you will do is mostly whatever it takes to get stuff done. We have a ton of things going on, and what we need most out of this role is a person who finds the key things that will make a huge difference to the overall platform and gets them knocked out fast and well – without a whole lot of direction or specific guidance. Here are some bullets that help understand the role:
•Design, implement, and tune the cohesion platform’s telemetry to provide observability across the system.
•Tune alerting to send the right signals to achieve proactive response without burying teams in noise.
•Create easy visualization into cohesion’s observability signals so that engineers and management easily understand the state of the system.
•Work with DevOps and test engineers to provide observability into the state of deployment pipelines and automated test suites.
•Conduct analysis of incidents, bugs, and defects to understand how to catch, remediate, and prevent them in the future.
•Monitor systems capacity and health indicators and trends; provide analytics and forecasts for added or reduced capacity as required.
•Work with Product to systematically and efficiently route bugs according to priority, team ownership, and subject matter expertise.
•Participate in and optimize the on-call process.
•Define and document standard run books and operating procedures. Create and maintain system information and architecture diagrams.
•Empower and support development teams to stay agile with automated Ops processes.
•Present audience-appropriate concepts and insights at all levels of a software organization (CxO, architect, technical team).
•7+ years of Hands-on experience in Development and Systems operations.
•Bachelor’s level education or higher in computer science, information technology, or related field.
•Expert in monitoring infrastructure health (Application Insights, ELK, Datadog)
•Experience with log aggregation tools (e. g. Azure Monitor)
•Experience with Agile methodologies (Scrum and Kanban) and Tools (Azure DevOps)
•Strong knowledge of ITIL incident management and how to improve the process through tools and automation
•Experience with various Git flows and tools (e.g., GitHub)
•Knowledge of Implementing Automated Tests integration, Code analysis tools, security vulnerability checks, network availability.
•Experience supporting IoT systems on edge and in the cloud.
•Working with different architecture patterns like MicroServices, SOA, Event-driven systems.
•Relevant industry experience.
•Graduate level education in computer science, information technology, or related field.
•Understanding of the IoT domain.
•Early-stage startup experience.
What You Can Expect From Cohesion
Cohesion is proud to offer a comprehensive benefits package to eligible, full-time employees in the United States. Our benefits are designed to invest in our employees— and their family’s —well-being, including investments in their health, happiness, and well-being.
• Receiving a competitive compensation package, including bonus, medical/dental/vision insurance, and 401k match
• Receiving a monthly cell phone reimbursement
• Comprehensive wellness stipend program (eligible after 6 months of employment)
• Enjoying a responsible unlimited PTO program to help employees maintain work-life balance after 3 months of employment
• 2-way flexibility of work schedules
• Dressing for your day
• Working in an open environment with creative optional brainstorming sessions for all employees
• Participating in one all-employee lunch per month and one all-employee breakfast per month
• Enjoying family leave benefits
At Cohesion, we see diversity and inclusion as a source of strength. We believe building trust and innovation are best achieved through diverse thought and practice. Individuals seeking employment at Cohesion are considered without regard to race, religion, color, national origin, gender identity and expression, sexual orientation, gender identity and expression, age, marital status, veteran status, or disability status.