Observability and Monitoring Engineer

Opportunity.Opportunities.JobCategory: Information Technology
Opportunity.Opportunities.RequisitionNumber: OBSER016982

Opportunity.Create.Publishing.ThirdPartyJobBoards.MoreInfoModal.Title

    • Opportunity.OpportunityDetail.PostedLabel: March 14, 2025
  • Opportunity.OpportunityDetail.FullTime
  • OpportunityDetail.CompanyInformation.Locations


    Remote Texas


    Texas, USA

    Remote Texas


    Texas, USA
    REM TN 37203 Nashville

    Home Office

    Nashville, TN 37203, USA
    REM TN 37203 Nashville

    Home Office

    Nashville, TN 37203, USA

    Remote Florida


    Florida, USA

    Remote Florida


    Florida, USA

    Remote Pennsylvania


    Pennsylvania, USA

    Remote Pennsylvania


    Pennsylvania, USA

    Remote (Anywhere in USA)


    United States

    Remote (Anywhere in USA)


    United States
    Opportunity.Opportunities.MoreJobLocations Opportunity.Opportunities.LessJobLocations

Opportunity.OpportunityDetail.JobDetails

Opportunity.OpportunityDetail.Description

Leidos QTC Health Services is hiring an Observability and Monitoring Engineer to be responsible for designing, implementing, and maintaining systems that provide insights into the performance, availability, and reliability of applications and infrastructure. This role involves working with monitoring tools, logging systems, distributed tracing, and alerting mechanisms to detect and resolve issues proactively.

This role will be a combination of technical hands-on and soft skills such as building technical project plans, design, and transition of service to operational steady state. The role is considered a driver of technology and forward thinking as LQTC continues to drive our enterprise. 

 

Essential Duties and Responsibilities:

  • Partnership with leadership, along with Infrastructure and Development Service Owners to capture requirements and baselines.
  • Act as the Service Owner of Observability and Monitoring Operations and Engineering.
  • Create and maintain Observability Strategy through the definition and implementation of the strategic frameworks driven by logging, metrics and tracing.
  • Engineer and implement Monitoring & Alerting (e.g., Splunk, Solarwinds, Dyantrace..) from which configured alerts can be dispatched to appropriate engineers and consolidated into the current incident management solution.
  • Engineer and maintain Log Aggregation schemas to collect and analyze system logs to allow for proactive issue resolution.
  • Performance Analysis to identify and troubleshoot performance bottlenecks in infrastructure, applications, and networks.
  • Engineer and implement Automation & Scripting for monitoring, alerting, and remediation using industry standard toolsets (ie: Ansible, Python, Bash, or Terraform).
  • Engineering and maintenance of a Single Pain of Glass solution with underlying specific Dashboarding & Reporting to provide real-time visibility into system health.
  • Incident response and collaboration with Infrastructure teams, SREs, and DevOps teams to respond to system outages and performance issues.
  • Proactively Capacity Plan and and Analyze trends in system usage to optimize resources and prevent downtime.
  • Ensure that the observability ecosystem is engineered and maintained to meet security and compliance requirements working with our Cyber teams.

 

Competencies:

  • Ability to work effectively in a team environment.
  • Ability to utilize discretion and independent judgment to switch between priorities quickly without affecting quality or performance.
  • Excellent written and verbal communication skills.
  • Understanding of the level in which verbal or electronic communication is being delivered to.
  • Superior customer service skills.
  • Ability to work with minimal supervision.
  • Solid organization and planning skills, with strong attention to detail.
  • Advanced level knowledge of infrastructure, OS and database technologies to include but not limited to Windows, LINUX, Oracle, SQL, Active Directory, load balancing and fire wall technologies, network switching and routing, and core infrastructure such as compute, storage, virtualization, data protection, business continuity.
  • Working experience in observability and automation in a hybrid environment (on premise / AWS cloud).
  • Must possess the ability and flexibility to work extra hours and weekends.
  • Ability and desire to take ownership of work assignments and drive tasks to completion.
  • Engineering mindset while designing, rolling out new service/product and providing T3 support.
  • Solid organization and planning skills, with strong attention to detail.
  • Working proficiency in Monitoring and Logging tools (ie; Splunk, Prometheus, Solarwinds, Dynatrace and open source tools).
  • Working experience with scripting and automation tools to enhance operational support and service delivery tasks (e.g., Python, PowerShell, Terraform, Ansible,).
  • Understanding of business drivers and metrics and the ability to translate them into measurable infrastructure metrics to drive proactive engagement and resolution of issues. The use of AI tools is a plus.
  • Engineering mindset in an enterprise environment.
  • Advanced Working knowledge of Cloud services including SaaS, PaaS and IaaS capabilities across the main hyperscaler providers.
  • Working familiarity with Microservices in a hybrid ecosystem including Kubernetes, Docker, and other containerization solutions.
  • Working knowledge of ITIL and ITSM to define and maintain Service-Level Objectives (SLOs) Defining SLOs, Service-Level Indicators (SLIs), and Service-Level Agreements (SLAs).
  • Working knowledge of Service Now and toolsets to consolidate and automatically create ITIL base aspects.
  • Advanced knowledge of self-healing technologies and automation in an active software development environment is a plus.

 

Education and/or Experience: (includes certificate & licenses)

  • Bachelor’s degree in computer science, business administration, related field, or possess equivalent work
  • 10 to 15 years of experience designing and implementing infrastructure solutions
  • 15+ years of industry relevant experience
  • Relevant technical certifications a plus
  • Understanding/use of SDLC, Agile, six sigma to drive fit for purpose technologies
  • Must be able to successfully pass National Agency Check with Inquiries (NACI) background investigation

 

Pay and Benefits:

Salary range (level 5) - $135,000 - $173,000 w/ up to 9% bonus eligibility

Other level(s) that may be considered is/are as follows:

(Level 6) = 20+ years of industry relevant experience:  Salary range 174K - 220K w/ up to 15% bonus eligibility

The Leidos QTC Health Services pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to): geographic location, responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.

Pay and benefits are fundamental to any career decision. That's why we craft compensation packages that reflect the importance of the work we do for our customers. As a result, we offer meaningful and engaging careers to support you and your career goals, all while nurturing a healthy work-life balance. Employment benefits include competitive compensation, Health and Wellness programs, Income Protection, Paid Leave and Retirement. More details are available here: Join Our Team | Jobs & Career Opportunities | Leidos QTC Health Services (qtcm.com)

Leidos QTC Health Services. is a VEVRAA Federal contractor and an Equal Opportunity Employer. The company has an ongoing commitment to affirmative action and the creation of a workplace free of discrimination, harassment and retaliation. The company recruits, hires, trains, and promotes individuals in all job titles without regard to race, color, creed, religion, ancestry, national origin, age, sex, pregnancy, sexual orientation, gender identity, genetic information, people with disabilities protected under law, and protected veteran status.

* This job description supersedes all prior job descriptions and is intended to describe the general content and essential requirements for the position listed above. It is not to be construed as an exhaustive statement of requirements, duties and responsibilities. Management reserves the right to add or change the duties of this position as required at any time.

Opportunity.OpportunityDetail.Qualifications


Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities
This employer is required to notify all applicants of their rights pursuant to federal employment laws. For further information, please review the Know Your Rights notice from the Department of Labor.