Secure Snapshot Verification in Elasticsearch with Minimal Privileges

Posted on Sun 20 April 2025 in DevSecOps • Tagged with elasticsearch, snapshot, security, observability, prometheus, minimal-permissions

Learn how to securely verify Elasticsearch snapshots without using manage_snapshot, using a minimal API key, Prometheus-compatible script, and hardened monitoring practices. Includes a GitHub tools repo for automation.


Continue reading

Hardening Kubernetes Deployments

Posted on Sat 19 April 2025 in Kubernetes Security • Tagged with kubernetes, hardening, pod-security-standards

Hardening Kubernetes workloads goes beyond RBAC tweaks or image scans. This post shares field-tested pod-level guardrails—like non-root containers, dropped Linux capabilities, and read-only filesystems—aligned with the Pod Security Standards (Restricted profile).


Continue reading

Taming the OOM Killer: Process Prioritization for Memory-Constrained Linux Systems

Posted on Fri 18 April 2025 in DevSecOps • Tagged with linux, oomkiller, memory, system-administration, devsecops, process-management, hardening

In memory-constrained environments, the Linux OOM Killer decides what lives and what gets killed. This guide shows how to protect critical processes like sshd and mysqld using oom_score_adj values, with a script that applies them reliably and securely. Make memory pressure predictable and survivable.


Continue reading

Catching a Nation-State Proxy: OSINT Lessons from the Twitter Frontlines

Posted on Thu 17 April 2025 in Threat Intelligence • Tagged with osint, threat-intelligence, phishing, venezuela, twitter, surveillance, devsecops

In 2012, I uncovered a state-aligned Twitter proxy tied to Venezuela’s ruling party. It mimicked Twitter, redirected traffic, and risked phishing user credentials. This post breaks down the OSINT methods I used to uncover it — and why threat intel teams still need to watch for subtle, state-run infrastructure.


Continue reading

The 208.5-Day Kernel Bug: A Lesson in Uptime, Overflow, and Operational Risk

Posted on Wed 16 April 2025 in DevSecOps • Tagged with kernel, bug, Linux, uptime, overflow, devsecops, integer-overflow

A 2012 Linux kernel bug caused CPU lockups after 208.5 days of uptime due to an integer overflow in sched_clock(). Affecting RHEL 5 and 6, it exposed the risks of long uptimes, underscoring the importance of timely patching, uptime observability, and operational risk management in DevSecOps.


Continue reading

The Chaos of the Leap Second (2012): When Time Broke Java and the Cloud

Posted on Tue 15 April 2025 in Incident Retrospectives • Tagged with leap-second, kernel, linux, java, ntp, distributed-systems, devops, sre, incident-retrospective

In 2012, a single leap second triggered global outages across Reddit, Yelp, and more. This retrospective unpacks how fragile timekeeping broke Java apps at scale, and what DevOps, SRE, and distributed systems teams can do today to avoid repeating history.


Continue reading