MLflow Onboarding

MLflow supports two broad use cases that require different onboarding paths:

GenAI applications and agents

LLM-powered apps, chatbots, RAG pipelines, tool-calling agents. Key MLflow features include
tracing
for observability,
evaluation
with LLM judges, and
prompt management
, among others.
Traditional ML / deep learning models: scikit-learn, PyTorch, TensorFlow, XGBoost, etc. Key MLflow features include experiment tracking (parameters, metrics, artifacts), model logging , and model deployment , among others. Determining which use case applies is the first and most important step. The onboarding path, quickstart tutorials, and integration steps differ significantly between the two. Step 1: Determine the Use Case Before recommending tutorials or integration steps, determine which use case the user is working on. Use the signals below, checking them in order. If the signals are ambiguous or absent, you MUST ask the user directly. Signal 1: Check the Codebase Search the user's project for imports and usage patterns that indicate the use case: GenAI indicators (any of these suggest GenAI): Imports from LLM client libraries: openai , anthropic , google.generativeai , langchain , langchain_openai , langgraph , llamaindex , litellm , autogen , crewai , dspy Imports from MLflow GenAI modules: mlflow.genai , mlflow.tracing , mlflow.openai , mlflow.langchain Usage of chat completions, embeddings, or agent frameworks Prompt templates or prompt engineering code Traditional ML indicators (any of these suggest ML): Imports from ML frameworks: sklearn , torch , tensorflow , keras , xgboost , lightgbm , catboost , statsmodels , scipy Imports from MLflow ML modules: mlflow.sklearn , mlflow.pytorch , mlflow.tensorflow Model training loops, .fit() calls, hyperparameter tuning code Dataset loading with tabular/image/time-series data

Search for GenAI indicators

Search for ML indicators

grep -rl --include = '*.py' -E '(from sklearn|import torch|import tensorflow|import keras|import xgboost|import lightgbm|mlflow.sklearn|mlflow.pytorch|mlflow.tensorflow|.fit()' . Signal 2: Check the Experiment Type Tag If the codebase or project directory is the MLflow repository itself, skip to Signal 3 — the MLflow repo contains code for all use cases and does not indicate the user's intent. If the experiment ID is known, check its mlflow.experimentKind tag. This tag is set by MLflow to indicate the experiment type: mlflow experiments get --experiment-id < EXPERIMENT_ID

--output json

/tmp/exp_detail.json jq -r '.tags["mlflow.experimentKind"] // "not set"' /tmp/exp_detail.json genai_development → GenAI use case custom_model_development → Traditional ML use case Not set → Proceed to Signal 3 If the experiment ID is not known, skip to Signal 3. Signal 3: Ask the User If the codebase and experiment signals are inconclusive, ask directly: Are you building a GenAI application (e.g., an LLM-powered chatbot, RAG pipeline, or tool-calling agent) or a traditional ML/deep learning model (e.g., training a classifier, regression model, or neural network)? Do not guess. The onboarding paths are different enough that starting down the wrong one wastes the user's time. Step 2: Recommend Quickstart Tutorials Once the use case is determined, recommend the appropriate quickstart tutorials from the MLflow documentation. Present them to the user and ask if they'd like to follow along or jump directly to integrating MLflow into their project. GenAI Path The MLflow GenAI documentation is at: https://mlflow.org/docs/latest/genai/getting-started/ Choose the most relevant tutorials based on the user's context and what they've told you. Available tutorials include: Tracing Quickstart ( https://mlflow.org/docs/latest/genai/tracing/quickstart/ ) — Enabling automatic tracing for LLM calls. Covers starting an MLflow server, creating an experiment, enabling autologging, and viewing traces in the UI. Python + OpenAI variant: https://mlflow.org/docs/latest/genai/tracing/quickstart/python-openai/ TypeScript + OpenAI variant: https://mlflow.org/docs/latest/genai/tracing/quickstart/typescript-openai OpenTelemetry (language-agnostic) variant: also linked from the quickstart page Evaluation Quickstart ( https://mlflow.org/docs/latest/genai/eval-monitor/quickstart/ ) — Evaluating GenAI application quality using LLM judges (scorers). Covers defining datasets, prediction functions, and built-in + custom scorers. Version Tracking Quickstart ( https://mlflow.org/docs/latest/genai/version-tracking/quickstart/ ) — Prompt management, application versioning, and connecting tracing to versioned prompts. If none of these match the user's needs, look up the MLflow GenAI documentation for more relevant guides. Traditional ML Path The MLflow ML documentation is at: https://mlflow.org/docs/latest/ml/getting-started/ Choose the most relevant tutorials based on the user's context and what they've told you. Available tutorials include: Tracking Quickstart ( https://mlflow.org/docs/latest/ml/tracking/quickstart/ ) — Experiment tracking with scikit-learn: autologging, manual parameter/metric/model logging, and exploring results in the MLflow UI. Deep Learning Tutorial ( https://mlflow.org/docs/latest/ml/getting-started/deep-learning/ ) — Training a PyTorch model with MLflow logging: parameters, metrics, checkpoints, and system metrics (GPU utilization, memory). Hyperparameter Tuning Tutorial ( https://mlflow.org/docs/latest/ml/getting-started/hyperparameter-tuning/ ) — Running hyperparameter searches with Optuna + MLflow, comparing results, and selecting the best model. If none of these match the user's needs, look up the MLflow ML documentation for more relevant guides. Step 3: Integrate MLflow into the User's Project After the user has reviewed the quickstart tutorials (or opted to skip them), offer to help integrate MLflow directly into their codebase. Always ask for the user's consent before making changes to their code. GenAI Integration The core integration for GenAI apps is tracing — capturing LLM calls, tool invocations, and agent steps automatically. If asked to create an example project: Do not assume the user has LLM API keys (e.g., OpenAI, Anthropic). Instead, create traces with mock data using @mlflow.trace and mlflow.start_span() to demonstrate tracing without requiring external API access. For example: import mlflow mlflow . set_experiment ( "example-genai-app" ) @mlflow . trace def mock_chat ( query : str ) -

str : with mlflow . start_span ( name = "retrieve_context" ) as span : context = "Mock retrieved context for: " + query span . set_inputs ( { "query" : query } ) span . set_outputs ( { "context" : context } ) with mlflow . start_span ( name = "generate_response" ) as span : response = "Mock response based on: " + context span . set_inputs ( { "context" : context , "query" : query } ) span . set_outputs ( { "response" : response } ) return response mock_chat ( "What is MLflow?" ) What to set up (for an existing project): Autologging — If the user's code uses a supported framework, a single line automatically traces all calls to their LLM provider. See https://mlflow.org/docs/latest/genai/tracing/ for the full list of supported providers. If the provider is supported: import mlflow

Pick the one that matches the user's LLM provider:

mlflow . openai . autolog ( )

OpenAI SDK

mlflow . anthropic . autolog ( )

Anthropic SDK

mlflow . langchain . autolog ( )

LangChain / LangGraph

mlflow . litellm . autolog ( )

LiteLLM

Add this call once at application startup (e.g., top of main.py , app.py , or the entry point module). It must execute before any LLM calls are made. If the provider is not supported by autologging, skip to step 3 (Custom tracing) and use @mlflow.trace to manually instrument the relevant functions. Experiment configuration — Set the experiment so traces are organized: mlflow . set_experiment ( "my-genai-app" ) Or via environment variable: export MLFLOW_EXPERIMENT_NAME="my-genai-app" Custom tracing (optional) — For functions that aren't automatically traced (custom tools, business logic), use the @mlflow.trace decorator: @mlflow . trace def my_custom_tool ( query : str ) -

str :

... tool logic ...

return result Where to add it: Find the application's entry point or initialization module and add the autologging call there. Search for the main LLM client instantiation (e.g., openai.OpenAI() , ChatOpenAI() ) to find the right location. Traditional ML Integration The core integration for ML is experiment tracking — capturing parameters, metrics, and models from training runs. What to set up: Autologging — If the user's code uses a supported framework, a single line automatically logs parameters, metrics, and models during training. See https://mlflow.org/docs/latest/ml/ for the full list of supported frameworks. If the framework is supported: import mlflow

Pick the one that matches the user's ML framework:

mlflow . sklearn . autolog ( )

scikit-learn

mlflow . pytorch . autolog ( )

PyTorch / PyTorch Lightning

mlflow . tensorflow . autolog ( )

TensorFlow / Keras

mlflow . xgboost . autolog ( )

XGBoost

mlflow . lightgbm . autolog ( )

LightGBM

Add this call once before training starts. It automatically captures model.fit() calls, logged metrics, and model artifacts. If the framework is not supported by autologging, skip to step 3 (Manual logging) and use mlflow.log_param() , mlflow.log_metric() , and mlflow.log_artifact() to log data explicitly. Experiment configuration — Set the experiment so runs are organized: mlflow . set_experiment ( "my-ml-experiment" ) Or via environment variable: export MLFLOW_EXPERIMENT_NAME="my-ml-experiment" Manual logging (optional) — For metrics or parameters not captured by autologging: with mlflow . start_run ( ) : mlflow . log_param ( "custom_param" , value ) mlflow . log_metric ( "custom_metric" , value ) Where to add it: Find the training script or module where model.fit() (or equivalent) is called. Add the autologging call before the training loop begins. Verification After integration, verify that MLflow is capturing data correctly: GenAI Verification Run the application and trigger at least one LLM call Check for traces: mlflow traces search \ --experiment-id < EXPERIMENT_ID

\ --max-results 5 \ --extract-fields 'info.trace_id,info.state,info.request_time' \ --output json

/tmp/verify_traces.json jq '.traces | length' /tmp/verify_traces.json If traces appear, open the MLflow UI to inspect them visually ML Verification Run the training script Check for runs: mlflow runs search \ --experiment-id < EXPERIMENT_ID

\ --max-results 5 \ --output json

/tmp/verify_runs.json jq '.runs | length' /tmp/verify_runs.json If runs appear, open the MLflow UI to inspect logged parameters, metrics, and artifacts

mlflow-onboarding

安装