[Remote] Sr Platform Engineer

Note The job is a remote job and is open to candidates in USA. Flexential is a company focused on building and operating IT platforms, and they are seeking a Senior Platform Engineer to join their platform development team. This role involves hands-on engineering responsibilities for developing and managing critical platform subsystems, ensuring high availability and operational resiliency while utilizing native-AI capabilities. Responsibilities Design, develop and operationally manage automated, resilient, high availability, self-healing, secure platforms with native-AI capabilities for IT needs, serving both internal as well as customer business capabilities Develop, and manage the Observability OpenTelemetry Central Backend Stack Grafana Enterprise, Mimir, Loki, Tempo, and Alertmanager on Kubernetes/RKE2 via Helm and GitLab CI-CD Build and manage iaC and CI-CD for automated provisiong and deployment, including Terraform modules for Infra/VM/storage provisioning, Ansible AWX playbooks for OS/App bootstrap, ArgoCD and Helm for Kubernetes configuration Develop and manage OpenTelemetry Prometheus scrape profile library including SNMP exporters, REST API exporters, and cloud provider exporters (CloudWatch, Azure Monitor, GCP) for multiple device classes Develop AIOps capabilities on platforms for e.g Observability use-cases anomaly detection integrations, event correlation rules in Alertmanager, and synthetic monitoring patterns to reduce alert noise Configure and maintain Zabbix auto-discovery network range scanning, device classification, and Prometheus service discovery integration Build and harden Edge Stack deployments (Prometheus + OTel collector) per data center site using GitOps templates Integrate Alertmanager with ServiceNow webhook routing, ticket enrichment, auto-close logic, and escalation policy configuration Maintain platform security Conjur/CyberArk secret injection at runtime, mTLS between stack components, RBAC in Grafana Enterprise Author and maintain Grafana dashboards in JSON/GitLab — facility overview, network health, RED metrics, application telemetry Mentor mid-level engineers, lead code reviews, and establish engineering standards for the team. Represent platform engineering in cross-functional architecture reviews and executive-level program updates Perform other duties as required and assigned Skills DevOps / Automation - 5+ years in a production environment, Kubernetes (RKE2/k3s), Helm chart deployment, system services, Docker/container LGTM Stack Development and Configuration - 4+ years Grafana, Mimir, Loki, Tempo configuration, tuning, dash-boarding and production operations; Prometheus required Senior-level Python / Scripting frameworks - 5+ years, Automation scripts, exporter development, GitLab pipeline scripting, REST API integrations GitOps / CI/CD - 5+ years, GitLab CI/CD pipeline authoring; Terraform and Ansible as primary IaC tools; ArgoCD or Flux preferred AIOps / Observability Engineering - 2+ years, Alertmanager rule authoring, anomaly detection integration, event correlation, noise reduction techniques Working infrastructure (Linux/VM) management knowledge - 5+ years, Linux administration, VMware vCenter/VCF experience, Netapp storage management, network fundamentals (SNMP, TCP/IP) Secrets Management - 2+ years, CyberArk/Conjur, HashiCorp Vault, or equivalent — runtime secret injection patterns Minimal travel may be required Experience and/or knowledge of ITSM processes and workflow automation e.g. Incident & Response Mgmt (IRM), Release mgmt., ServiceNow ITSM integration, alert routing, escalation policy design, SLA-driven on-call workflows Hands-on experience or working knowledge of Boomi integrations PaaS(iPaaS) technologies Experience working with BAS / BMS systems in a Datacenter / OT environment Hands-on experience working with AWS products in a Well-architected Framework and multi-account model to develop various compute, storage, network iaaS and PaaS services for IT applications Benefits Medical, Telehealth, Dental and Vision 401(k) Health Savings Accounts (HSA) and Flexible Spending Accounts (FSA) Life and AD&D Short Term and Long-Term disability Flex Paid Time Off (PTO) Leave of Absence Employee Assistance Program Wellness Program Rewards and Recognition Program Company Overview Flexential provides IT solutions including integrated colocation, interconnection, cloud, data protection, and professional services. It was founded in 2000, and is headquartered in Charlotte, North Carolina, USA, with a workforce of 501-1000 employees. Its website is https//www.flexential.com/.

Back to blog