Core AI is Apple's new Python-based framework for converting, compressing, and deploying machine learning models natively on Apple Silicon. It provides coreai-torch for PyTorch-to-CoreAI conversion, coreai-opt for config-driven quantization, and a standalone Core AI Debugger app for inspecting and validating model behavior.
• Enables developers to take any PyTorch model (including large LLMs) and deploy it on-device with a unified pip-installable Python toolchain
• coreai-opt's preset quantization configs (e.g. w4 for 4-bit) can shrink 3 GB+ models to ~430 MB in just a few lines, with per-layer control to protect quality-sensitive components
• Core AI Debugger provides a GUI for visualizing model graphs, running on-device inference, and comparing intermediate tensors against PyTorch reference runs without modifying model code
Shows how to load a pre-converted .aimodel asset produced by the coreai-torch pipeline and run inference on it using the Core ML runtime, which is the Swift-side integration point for models authored with Core AI tooling.
import CoreML
import Foundation
import SwiftUI
// MARK: - Core AI Model Runner
// The .aimodel artifact is produced offline by the coreai-torch Python pipeline:
// exported = torch.export.export(model, (example_input,))
// core_ai_program = TorchConverter().convert(exported, inputs=[...], outputs=[...])
// asset = core_ai_program.specialize(SpecializationOptions())
// asset.save("SAM3Compressed.aimodel")
// Then add SAM3Compressed.mlpackage to your Xcode project.
struct CoreAIInferenceView: View {
@State private var result: String = "Tap Run to execute model"
@State private var isRunning = false
var body: some View {
VStack(spacing: 20) {
Text("Core AI Model Demo")
.font(.title2.bold())
Text(result)
.multilineTextAlignment(.center)
.padding()
.background(.secondarySystemBackground)
.clipShape(RoundedRectangle(cornerRadius: 12))
Button(isRunning ? "Running…" : "Run Inference") {
Task { await runModel() }
}
.buttonStyle(.borderedProminent)
.disabled(isRunning)
}
.padding()
}
@MainActor
func runModel() async {
isRunning = true
defer { isRunning = false }
do {
// Load the compiled Core AI / Core ML model asset
guard let modelURL = Bundle.main.url(
forResource: "SAM3Compressed",
withExtension: "mlmodelc"
) else {
result = "Model asset not found in bundle"
return
}
// Configure compute units — Neural Engine preferred for Apple Silicon
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine
let model = try await MLModel.load(contentsOf: modelURL, configuration: config)
// Build feature provider with input tensors
// Inputs match the names registered during coreai-torch conversion
let imageData = MLMultiArray(
shape: [1, 3, 1024, 1024],
dataType: .float16
)
// (populate imageData with pixel values in a real app)
let inputProvider = try MLDictionaryFeatureProvider(dictionary: [
"pixel_values": MLFeatureValue(multiArray: imageData)
])
// Run inference asynchronously
let output = try await model.prediction(from: inputProvider)
// Read output feature by name registered at conversion time
if let maskArray = output.featureValue(for: "pred_masks")?.multiArrayValue {
result = "Mask shape: \(maskArray.shape) — inference succeeded ✓"
} else {
result = "Output feature not found"
}
} catch {
result = "Error: \(error.localizedDescription)"
}
}
}
#Preview {
CoreAIInferenceView()
}Core AI Python libraries (coreai-torch, coreai-opt) are part of the offline authoring/conversion toolchain — not an on-device Swift API. The Swift integration point is loading the resulting .aimodel artifact. The save_intermediates API used for debugger comparison is new in iOS 27 tooling and may have beta rough edges. Aggressive global quantization (e.g. flat w4 across all layers) can degrade model quality; per-layer overrides are recommended for sensitive components.
Requires Apple Silicon (M-series or A-series chips); some compression modes and advanced specializations may require specific chip generations. Core AI Debugger is a macOS-only standalone application.
More iOS 27 APIs land every week.
Get notified when new capabilities are published — no noise, just signal.