There are a lot of meanings of the term “model” in machine learning and machine learning-adjacent fields. Depending on the context, “model” can mean:

  1. a general architecture/configuration for weights of a specific model

    • e.g. “We embedded texts using a BERT model” might mean DistillBERT, RoBERTa, ModernBERT etc which all differ from the architecture in the original 2018 BERT paper.
  2. a specific set of weights which are the result of a training process

    • e.g. “You can download the model from HuggingFace”
    • Confusingly, models can be versioned to fix bugs so the same model (by definition #1) can actually be different models (by definition #2).
  3. a product and the system around that product, which isn’t quite #2 because it likely includes additional systems like guardrails and rate limiting. And the specific weights used might change without the end-user knowing, such as with additional fine-tuning.

    • e.g. “Our newest Claude model is the most powerful yet”
  4. in reinforcement learning, a “model” is simply the understanding of how the world works, i.e. the “world model”.

    • This leads to the confusing situation of “model-free” systems which still include neural networks, but simply don’t act based on any information about what effects their actions will have on the world.
    • For example, AlphaZero is not a model-free algorithm because it uses Monte Carlo Tree Search, and each node in that tree is the result an action taken in a simulated world. It’s not learning from scratch what the resulting chessboard if it moves a piece.
  5. since it’s also possible to “model” a complex process with a simpler one, a “model” might be some heuristic or simpler algorithm which is implied to be imperfect.

    • This is the original etymology behind language “modeling” since early models were simple bigrams and trigrams.
    • This is also the connotation of climate models, economic models, and models of the spread of disease.