MLX Glossary — MLX Guide

MLX

Apple's machine learning framework built for Apple silicon.

Apple silicon

Apple's M-series chips, like M1, M2, M3, and newer.

Array

A grid of numbers in memory. In ML work, arrays hold inputs, activations, and weights.

Tensor

A general ML word for a multi-dimensional array. Same basic idea, just broader terminology.

Autograd

Automatic differentiation: the system that figures out gradients for training.

Gradient

A signal showing how to change model weights to reduce error.

Lazy evaluation

Work gets delayed until the result is actually needed, which can help performance.

Dynamic graph

The computation graph forms as your code runs, which makes experimentation easier.

Unified memory

CPU and GPU share memory on Apple silicon, reducing expensive copying.

Inference

Running a trained model to get an output.

Fine-tuning

Training an existing model a bit more so it gets better at a specific job.

LoRA

A lightweight fine-tuning method that updates a small adapter instead of the whole model.

Quantization

Shrinking model precision so it uses less memory and often runs faster.

Tokenizer

The tool that breaks text into pieces a model can process.

Weights

The learned numbers inside a model. Loading weights means loading what the model has learned.

Checkpoint

A saved model state you can reload later.

GPU

The graphics processor, often used to speed up ML computation.

Batch

A group of examples processed together for efficiency.

Context window

How much text a language model can consider at once.

Prompt

The instructions or input text you feed into a model.

VLM

A vision-language model: a model that can reason across images and text.

ONNX

A model format and interchange layer used to move models across frameworks or runtimes.

Bindings

A way to use the same core library from another language like Swift or C.

Local inference

Running the model on your own machine instead of calling a cloud API.

MLX terms, in plain English

MLX