Building an Autonomous SOC with TheHive, Cortex & Shuffle

The modern Security Operations Center is drowning. Alert fatigue is endemic. MTTR is measured in hours when it should be measured in minutes. Analysts spend 60-70% of their time on repetitive, mechanical triage tasks that should be executed by machines.

Commercial SOAR platforms promise to solve this - but at $50,000 to $300,000 per year in licensing fees, they are inaccessible to the majority of organizations that need them most. And even those organizations that can afford them frequently discover that vendor lock-in, opaque pricing structures, and integration bias toward first-party tooling make the promise difficult to realize in practice.

There is a better architecture. It is open-source. It is battle-tested. And in my operational experience deploying it across mid-market and enterprise environments, it consistently outperforms commercial alternatives in flexibility, data sovereignty, and total cost of ownership.

This is the blueprint.

The Architecture: Three Platforms, One Autonomous SOC

The stack is composed of three distinct, purpose-built platforms that operate in concert to form a closed-loop incident response ecosystem.

Platform	Role	Primary Function
TheHive 5	Security Incident Response Platform (SIRP)	Case management, analyst collaboration, alert triage, evidence custody
Cortex	Observable Analysis & Active Response Engine	Automated IOC enrichment via TIPs, sandboxes, and OSINT; active containment actions
Shuffle	Security Orchestration, Automation & Response (SOAR)	Multi-step playbook execution, cross-platform data routing, complex conditional logic

Understanding the functional boundaries between these three platforms is the single most important architectural decision you will make. Conflating their roles leads to redundant engineering, brittle integrations, and operational instability.

The mental model: Shuffle is the pipe. Cortex is the scalpel. TheHive is the brain.

When Wazuh detects a high-severity alert, Shuffle receives it, parses it, evaluates it, and routes it. Cortex enriches the suspicious observables with automated intelligence at machine speed. TheHive receives the pre-triaged, pre-enriched case and presents it to the human analyst with full context already loaded.

Platform Deep Dive: TheHive 5

TheHive is the operational command center for your SOC - the single pane of glass where every incident begins and ends.

Architectural Components

A production TheHive 5 deployment requires three infrastructure dependencies:

Component	Purpose	Production Alternative
Apache Cassandra	Primary data storage (HA, distributed)	ScyllaDB for higher throughput
Elasticsearch	Search indexing for rapid query across millions of alerts	OpenSearch (AWS-managed)
File Storage	Evidence, attachments, and observable files	MinIO (S3-compatible), NFS, or AWS S3

Data Model

TheHive enforces a strictly hierarchical data model that maps directly to real incident response workflows:

Alert (raw telemetry input)
  └── Case (escalated investigation container)
        ├── Tasks (specific analyst assignments)
        ├── Observables (IOCs: IPs, domains, hashes, emails)
        │     └── [Cortex analysis results attach here]
        └── TTPs (MITRE ATT&CK technique tags)

Licensing Tiers

License	Cost	Max Users	HA Support	SSO	MSSP Multi-tenancy
Community	Free	Unlimited	No	No	Limited
Gold	€500/month	Unlimited	Yes	Yes	No
Platinum	€1,000/month	Unlimited	Yes	Yes	Limited
MSSP	Custom	Unlimited	Yes	Yes	Full (unlimited orgs)

For single-organization deployments without SSO requirements, the Community license handles production workloads effectively. The moment you need Active Directory integration, Kubernetes-grade HA, or managing multiple client tenants, Platinum or MSSP becomes necessary.

Key Capabilities

Real-time collaboration via WebSockets - multiple analysts see case updates, task completions, and observable additions live without page refresh
TLP and PAP classification enforcement - Privacy Protocol and Traffic Light Protocol tags control how intelligence can be shared and acted upon
Custom alert templates - pre-define case structures with tasks and required observables for common scenario types (phishing, ransomware, insider threat)
Webhook notification framework - trigger external systems (Shuffle, Slack, PagerDuty) on any case state change via configurable filter expressions
REST API + thehive4py - full programmatic access to every object, enabling custom integrations and automated reporting pipelines

Platform Deep Dive: Cortex

Cortex is the automated analysis engine that eliminates the single most time-consuming task in incident response: manual IOC lookups.

How It Works

When an observable (IP address, domain, file hash, email address, URL) is attached to a TheHive case, analysts can trigger one or multiple Cortex Analyzers with a single click. Cortex routes the observable to the appropriate Docker-containerized script, executes the analysis, and returns structured JSON results that populate directly on the observable’s intelligence panel.

Analyzers query external intelligence sources:

VirusTotal (file hash, IP, domain, URL reputation)
Shodan (internet-exposed service fingerprinting)
AbuseIPDB (IP reputation and abuse reporting)
MalwareBazaar (malware sample database lookup)
URLhaus (malicious URL database)
PassiveTotal/RiskIQ (passive DNS, WHOIS history)
MISP (cross-reference against your local threat intelligence)
Hybrid Analysis / ANY.RUN (dynamic sandbox detonation results)
Spamhaus, Talos Intelligence, OTX, and 280+ more

Responders take active containment actions:

Block IP on Palo Alto NGFW or pfSense
Add domain to Pi-hole DNS blocklist
Send alert to Slack/Teams/Discord
Create JIRA/ServiceNow ticket
Isolate endpoint via CrowdStrike or SentinelOne API

Performance Architecture

┌─────────────┐     API Request     ┌──────────────────────────┐
│   TheHive   │ ──────────────────► │         Cortex           │
│  (Observable│                     │  ┌────────────────────┐  │
│   attached) │ ◄────────────────── │  │  Cache (10-minute) │  │
│             │   Enriched Result   │  └────────────────────┘  │
└─────────────┘                     │  ┌────────────────────┐  │
                                    │  │ Orborus (Docker)   │  │
                                    │  │  ┌──────────────┐  │  │
                                    │  │  │ VT Analyzer  │  │  │
                                    │  │  │ (container)  │  │  │
                                    │  │  └──────────────┘  │  │
                                    │  └────────────────────┘  │
                                    └──────────────────────────┘

The 10-minute result cache is a critical operational feature. If 15 analysts independently analyze the same malicious IP within a 10-minute window, Cortex executes the external API call exactly once and serves cached results to subsequent requests - preserving API rate limits and dramatically accelerating the response workflow.

Platform Deep Dive: Shuffle

Shuffle transforms your SOC from a reactive alert-processing machine into a proactive, self-operating response system.

Architecture

Component	Technology	Function
Backend	Golang	High-concurrency API server and workflow engine
Frontend	ReactJS + Cytoscape	Visual drag-and-drop workflow canvas
Worker	Orborus	Spawns Docker containers per workflow execution
Database	OpenSearch	Stores workflow execution logs and audit trails at scale
App Library	800+ OpenAPI integrations	Pre-built connectors for Wazuh, Splunk, CrowdStrike, ServiceNow, etc.

Trigger Types

# Example: Wazuh → Shuffle webhook trigger
triggers:
  - type: webhook
    name: "Wazuh High-Severity Alert"
    description: "Receives Wazuh alerts with rule.level >= 12"

  - type: schedule
    cron: "0 6 * * 1"
    description: "Monday 6AM: Weekly threat summary digest"

  - type: user_input
    prompt: "Analyst: Should this IP be blocked?"
    description: "Human-in-the-loop approval gate"
    
  - type: api
    endpoint: "/api/v1/execute_workflow"
    description: "Direct programmatic trigger"

The Hybrid Execution Model

Shuffle’s most architecturally significant feature is its hybrid cloud-on-prem execution model. Organizations can route webhook traffic through the Shuffler.io cloud relay directly to their on-premises Shuffle instance. This eliminates the requirement to open inbound ports on the corporate firewall - a significant security win for organizations operating in hardened network environments.

Why This Stack Over Commercial SOAR?

Head-to-Head: TCO Comparison

Platform	Year 1 Cost	Year 3 Cost	Vendor Lock-in	Data Sovereignty
TheHive + Cortex + Shuffle	$0-$12K (infra only)	$0-$36K	None	Full on-prem
Palo Alto Cortex XSOAR	$80K-$350K	$240K-$1M+	High	Cloud/hybrid
Splunk SOAR	$60K-$250K	$180K-$750K	Very High	Cloud/hybrid
IBM QRadar SOAR	$70K-$300K	$210K-$900K	High	On-prem/cloud
Microsoft Sentinel + Logic Apps	$30K-$150K	$90K-$450K	Medium	Azure cloud

Infrastructure costs assume a 3-server production deployment (4 vCPU / 16GB RAM each).

Operational Advantages

1. Multi-Vendor Integration Without Bias Commercial SOAR platforms work best with their own ecosystem. Palo Alto XSOAR excels with Palo Alto tools. Splunk SOAR shines with Splunk telemetry. Shuffle doesn’t care what your stack is - it integrates via OpenAPI with anything that has an HTTP API.

2. Engineering Leverage for MSSPs Commercial platforms typically require per-tenant engineering effort that scales linearly. Shuffle’s reusable playbook templates and Cortex’s multi-tenant analyzer configurations allow MSSPs to onboard new clients by cloning and parameterizing existing configurations - the marginal cost of the tenth client is dramatically less than the first.

3. Documented 77% MTTR Reduction Organizations deploying automated enrichment and triage pipelines via this stack have documented a 77% reduction in Mean Time to Respond across common use cases. This translates directly to:

Fewer analyst FTEs required for L1/L2 triage
Faster containment of active threats
Reduced dwell time and blast radius for successful attacks
Improved SLA performance for MSSPs

Deployment Guide: Production Docker Compose

The fastest path to a functional production environment is Docker Compose. This configuration deploys the full stack on a single server (minimum 16GB RAM, 8 vCPU, 500GB SSD for production).

Prerequisites

# Server requirements: Ubuntu 22.04 LTS
# Docker Engine 24.x + Docker Compose v2

# Increase virtual memory for Elasticsearch
sudo sysctl -w vm.max_map_count=262144
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf

# Create directory structure
mkdir -p /opt/soc-stack/{thehive,cortex,shuffle,elasticsearch,cassandra,misp}
cd /opt/soc-stack

Docker Compose Configuration

version: "3.8"

services:
  # ─── Cassandra (TheHive Database) ───────────────────────────
  cassandra:
    image: cassandra:4.1
    container_name: cassandra
    hostname: cassandra
    environment:
      - CASSANDRA_CLUSTER_NAME=thp
      - MAX_HEAP_SIZE=1G
      - HEAP_NEWSIZE=200M
    volumes:
      - /opt/soc-stack/cassandra/data:/var/lib/cassandra
    healthcheck:
      test: ["CMD-SHELL", "nodetool status | grep UN"]
      interval: 30s
      timeout: 10s
      retries: 10

  # ─── Elasticsearch (TheHive Index) ──────────────────────────
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.1
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms2g -Xmx2g
    volumes:
      - /opt/soc-stack/elasticsearch/data:/usr/share/elasticsearch/data
    healthcheck:
      test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health | grep -v red"]
      interval: 30s
      timeout: 10s
      retries: 10

  # ─── TheHive 5 ──────────────────────────────────────────────
  thehive:
    image: strangebee/thehive:5.3
    container_name: thehive
    depends_on:
      cassandra: { condition: service_healthy }
      elasticsearch: { condition: service_healthy }
    ports:
      - "9000:9000"
    volumes:
      - /opt/soc-stack/thehive/config:/etc/thehive
      - /opt/soc-stack/thehive/data:/opt/thp/thehive/files
    command:
      - --storage-provider localfs
      - --cassandra-host cassandra
      - --es-host http://elasticsearch:9200

  # ─── Cortex ─────────────────────────────────────────────────
  cortex:
    image: thehiveproject/cortex:3.1.7
    container_name: cortex
    depends_on:
      elasticsearch: { condition: service_healthy }
    ports:
      - "9001:9001"
    volumes:
      - /opt/soc-stack/cortex/config:/etc/cortex
      - /var/run/docker.sock:/var/run/docker.sock  # Required for analyzer containers
      - /opt/soc-stack/cortex/jobs:/tmp/cortex-jobs
    environment:
      - JOB_DIRECTORY=/tmp/cortex-jobs

  # ─── Shuffle SOAR ───────────────────────────────────────────
  shuffle-backend:
    image: ghcr.io/shuffle/shuffle-backend:latest
    container_name: shuffle-backend
    hostname: shuffle-backend
    ports:
      - "5001:5001"
    volumes:
      - /opt/soc-stack/shuffle:/shuffle-database
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - DATASTORE_EMULATOR_HOST=shuffle-database:8000
      - SHUFFLE_APP_HOTLOAD_FOLDER=/shuffle-database/apps
      - SHUFFLE_FILE_LOCATION=/shuffle-database/files
      - SHUFFLE_DEFAULT_USERNAME=admin
      - SHUFFLE_DEFAULT_PASSWORD=changeme_on_first_login

  shuffle-frontend:
    image: ghcr.io/shuffle/shuffle-frontend:latest
    container_name: shuffle-frontend
    ports:
      - "3001:80"
    environment:
      - BACKEND_HOSTNAME=shuffle-backend

  shuffle-orborus:
    image: ghcr.io/shuffle/shuffle-orborus:latest
    container_name: shuffle-orborus
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - BASE_URL=http://shuffle-backend:5001
      - SHUFFLE_APP_SDK_VERSION=1.1.0

Post-Deployment Configuration Steps

# Step 1: Start the stack
docker compose up -d

# Step 2: Monitor startup (takes 3-5 minutes)
docker compose logs -f thehive cortex

# Step 3: Access TheHive at http://YOUR_SERVER:9000
# Default credentials: admin@thehive.local / secret

# Step 4: Access Cortex at http://YOUR_SERVER:9001
# Create a new org and generate an API key

# Step 5: Link Cortex to TheHive
# TheHive Admin → Platform Management → Cortex
# Paste the Cortex API key and server URL

# Step 6: Access Shuffle at http://YOUR_SERVER:3001
# Complete onboarding wizard

Integration Blueprint: Wazuh → Shuffle → TheHive

This is the most common production integration pattern. Here is the complete data flow and configuration.

Step 1: Configure Wazuh to Forward Alerts to Shuffle

Add the following integration block to /var/ossec/etc/ossec.conf on your Wazuh Manager:

<integration>
  <name>custom-shuffle</name>
  <hook_url>http://YOUR_SHUFFLE_SERVER:5001/api/v1/hooks/webhook_XXXXXXXX</hook_url>
  <level>12</level>  <!-- Only forward critical/high severity -->
  <alert_format>json</alert_format>
</integration>

Restart the Wazuh Manager after the configuration change:

sudo systemctl restart wazuh-manager

Step 2: Build the Shuffle Phishing Triage Playbook

[Webhook: Wazuh Alert Received]
         │
         ▼
[Condition: rule.groups contains "syscheck" or "web"]
         │
    ─────┴─────
    │          │
   YES         NO → [Skip/Log]
    │
    ▼
[Regex: Extract observables]
 • src_ip from data.srcip
 • file_hash from syscheck.sha256_after
 • domain from data.url
         │
         ▼
[Cortex: Analyze src_ip with AbuseIPDB]
[Cortex: Analyze file_hash with VirusTotal]
         │
         ▼
[Condition: vtScore > 5 OR abuseConfidence > 75]
         │
    ─────┴─────
    │          │
   YES         NO → [Close: Low Priority]
    │
    ▼
[TheHive: Create Alert]
 • title: "Wazuh - [rule.description]"
 • severity: Critical
 • observables: [ip, hash, Cortex results]
 • tags: ["automated", "wazuh", rule.groups]
         │
         ▼
[Slack/Teams: Notify SOC Channel]
 • "New critical alert created in TheHive"
 • Link to case

Step 3: TheHive Alert Template for Wazuh

Create an Alert Response Template in TheHive for automated case creation from triaged alerts:

{
  "title": "Wazuh SIEM - {{alert.title}}",
  "description": "## Automated Alert\n\n**Source:** Wazuh SIEM\n**Rule:** {{alert.sourceRef}}\n**Host:** {{alert.source}}\n\n## Observables\n\n{{#observables}}\n- {{dataType}}: `{{data}}`\n{{/observables}}",
  "tasks": [
    { "title": "Verify alert is not a false positive", "assignee": null },
    { "title": "Enrich all observables with Cortex analyzers", "assignee": null },
    { "title": "Determine scope - check for lateral movement", "assignee": null },
    { "title": "Execute containment if confirmed malicious", "assignee": null },
    { "title": "Document findings and close case", "assignee": null }
  ],
  "tags": ["wazuh", "automated-triage"]
}

Real-World Case Study: Phishing Campaign Triage at Scale

The Scenario

A regional financial services firm (1,200 employees, 3-person security team) was receiving 400-600 security alerts per day from Wazuh. The team was spending 6-7 hours daily on manual triage, leaving little capacity for proactive threat hunting or incident response.

The Problem (Before)

Metric	Before Deployment
Daily alert volume	450 average
Manual triage time per alert	8-12 minutes
True positive rate (pre-enrichment)	~12%
MTTR (confirmed incidents)	4.2 hours
Analyst hours on L1 triage	6.5 hours/day
Incidents missed (per quarter)	3-4 (analyst fatigue)

The Solution Architecture

The team deployed:

TheHive 5 Community on a dedicated Ubuntu 22.04 VM (8 vCPU, 32GB RAM)
Cortex 3.1 with 12 configured analyzers (VirusTotal, AbuseIPDB, MISP, Shodan, URLhaus, MalwareBazaar, Hybrid Analysis, PassiveTotal, Talos Intelligence, OTX, Spamhaus, Greynoise)
Shuffle with two primary playbooks: Phishing Email Triage and Malware Alert Enrichment
Integration: Wazuh Manager forwarding rules 85, 91xxx (phishing detection), and rule level ≥12 to Shuffle webhook

The Phishing-Specific Playbook

Wazuh fires on email gateway rule (rule group: office365 or exchange)
Shuffle extracts: sender domain, sender IP, attachment hash, embedded URLs
Cortex triggered in parallel:
- AbuseIPDB on sender IP (confidence threshold: 50%)
- VirusTotal on attachment hash (detection threshold: 3/72)
- URLhaus on embedded URLs
- SpamHaus on sender domain
Shuffle evaluates results - if any threshold exceeded, severity escalated to Critical
TheHive case created with pre-populated observables, all Cortex results attached
Slack message sent to #soc-alerts with case link and summary
If hash detected by VirusTotal with 20+ detections, Shuffle triggers CrowdStrike Falcon API to initiate custom IOC block across all endpoints

The Results (After 90 Days)

Metric	Before	After	Change
Daily alert volume	450	450	-
Automated triage rate	0%	78%	+78%
True positive escalation rate	12%	91%	+79%
MTTR (confirmed incidents)	4.2 hours	58 minutes	-77%
Analyst hours on L1 triage	6.5 hours/day	1.4 hours/day	-78%
Incidents missed (per quarter)	3-4	0	-100%

Those 5.1 hours of daily analyst time recovered translated directly into a meaningful investment in proactive threat hunting, detection rule tuning, and vulnerability management - the high-value work that prevents incidents from happening in the first place.

Who Should Deploy This Stack?

Ideal Candidates

Organizations that benefit most from this stack:

Profile	Why This Stack Wins
MSSPs (5-100+ clients)	Multi-tenancy in TheHive + reusable Shuffle playbooks = near-zero marginal cost per new client
Mid-market enterprises (200-5,000 employees)	Full SOAR capability at infrastructure-only cost vs. $80K+ commercial licensing
Government & regulated industries	Complete data sovereignty - no telemetry leaves your network
Security teams with Linux/Python skills	The engineering investment pays dividends in full stack control
Lean SOC teams (2-8 analysts)	Automation multiplies analyst capacity without hiring

When to Consider Commercial Alternatives

This stack is not the right answer for every organization:

Situation	Better Alternative
Zero in-house DevOps/Linux expertise	Splunk SOAR or Microsoft Sentinel + Logic Apps
Require vendor SLA for uptime	TheHive Gold/Platinum with StrangeBee support
Fully Microsoft-native environment	Microsoft Sentinel + Logic Apps (native integration depth is unbeatable)
Need pre-built playbooks for instant value	Palo Alto XSOAR with their playbook library

Security Hardening: Production Best Practices

Deploying these platforms securely requires specific hardening beyond default configurations.

Docker Security

# Run containers as non-root where possible
# Cortex requires Docker socket access - harden with socket proxy

# Deploy docker-socket-proxy instead of exposing raw socket to Cortex/Shuffle
docker run -d \
  --name dockerproxy \
  -e CONTAINERS=1 \
  -e IMAGES=1 \
  -e INFO=1 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -p 127.0.0.1:2375:2375 \
  tecnativa/docker-socket-proxy

Network Segmentation

[Internet]
    │
[Wazuh Manager] ─── webhook ──► [Shuffle] (VLAN: Automation)
                                     │
                        ┌────────────┼────────────┐
                        ▼            ▼             ▼
                   [TheHive]    [Cortex]     [MISP]
                  (VLAN: SOC)  (VLAN: SOC) (VLAN: SOC)
                      │            │
                      └────────────┘
                           │
                   [Elasticsearch + Cassandra]
                   (VLAN: Data, no internet access)

API Key Rotation Policy

# TheHive API key rotation (run quarterly via cron)
# Generate new API key via TheHive API
NEW_KEY=$(curl -s -X POST http://localhost:9000/api/v1/user/current/credentials \
  -H "Authorization: Bearer $CURRENT_KEY" \
  -H "Content-Type: application/json" \
  -d '{"type": "key"}' | jq -r '.key')

# Update Shuffle with new TheHive API key
curl -X PUT http://localhost:5001/api/v1/apps/authentication \
  -H "Authorization: Bearer $SHUFFLE_ADMIN_KEY" \
  -d "{\"thehive_api_key\": \"$NEW_KEY\"}"

Cortex Analyzers: Recommended Starter Configuration

For a new deployment, prioritize configuring these analyzers in sequence. Each requires an API key from the respective platform - most offer free tiers sufficient for small SOC volumes.

Priority	Analyzer	Data Types	Free Tier
🔴 Critical	VirusTotal	Hash, IP, Domain, URL	500 req/day
🔴 Critical	AbuseIPDB	IP	1,000 req/day
🔴 Critical	MISP	All	Self-hosted, unlimited
🟠 High	Shodan	IP	100 req/month
🟠 High	URLhaus	URL, Domain	Free
🟠 High	MalwareBazaar	Hash	Free
🟠 High	OTX (AlienVault)	Hash, IP, Domain, URL	Free
🟡 Medium	Hybrid Analysis	Hash, URL	200 req/day
🟡 Medium	PassiveTotal	IP, Domain	15 req/day
🟡 Medium	Talos Intelligence	IP, Domain	Free (web)
🟢 Optional	Spamhaus	IP, Domain	Free
🟢 Optional	Greynoise	IP	100 req/day (free)

The Bottom Line

I have deployed commercial SOAR platforms with six-figure price tags. I have also built this open-source stack from scratch on a $150/month cloud server. The performance difference in analyst workflow and alert enrichment quality is not discernible. The cost difference is $80,000 to $300,000 per year.

What you pay for with commercial platforms is convenience, vendor-managed integrations, and support SLAs. If your team has the Linux and Docker expertise to operate this stack - and that is a meaningful “if” - the ROI calculation is straightforward.

The 77% MTTR reduction documented across real deployments is not marketing. It is the direct result of eliminating manual, repetitive enrichment work and replacing it with deterministic, machine-speed automation. That time goes directly into the threat hunting and proactive defense work that actually moves the security needle.

The blueprint exists. The tools are free. The only investment required is the willingness to build it right.

Additional Resources

Resource	Description
TheHive 5 Documentation	Official deployment and API guides
Cortex Documentation	Analyzer/Responder catalog and configuration
Shuffle Documentation	Playbook building and app integration guides
Cortex-Analyzers GitHub	300+ open-source analyzer scripts
MISP Integration Guide	Threat intelligence platform integration
Docker-Templates (StrangeBee)	Official TheHive + Cortex + Shuffle Compose templates
SOC Automation Lab (uruc)	Community-built end-to-end homelab guides

Pros

Cons