MLFlow Models
MLFlow is a popular tool to track your experiment to compare metrics, parameters and much more. It helps streamlining your job as a data scientist and machine learning engineers. Their MLFlow Models is a sub-project that helps making deployments smooth and integrates with their Model Registry that has versioned and tagged models which ties together with MLFlow Experiments.
All in all the MLFlow Models project helps building self-contained models in a streamlined fashion that integrates very well in the MLFlow ecosystem.
MLFlow Model
MLFlow Models is MLFlows “self-contained model” that can automatically build a Docker Container and run inference through the built-in mlflow serve
command.
It’s an interesting concept that’s not “phenomenal” or “innovating” but helps streamlining our lives, just like the “bread and butter” MLFlow Experiments. I love projects that make the average persons life easier. Advanced user, like I’d call myself, might find it “blocking” but the PythonModel
concept I explain later is likely helpful for anyone out there!
MLFlow really hits that sweet point of keeping things simple and not going too far, except perhaps in the current LLM tracing which feels like the shot-gun methodology.
Why?
Keeping it short:
- A “self-contained model” with all the code files and dependencies in a simple package
- Natively Integrated in MLFlow which is one of the biggest “MLOps” systems
- All the MLFLow goodies enabled, such as
ModelRegistry
,model-evaluation
, andauto-Apache Spark UDF
.
Flavours
MLFlow Model automatically support multiple formats: Keras, PyTorch, scikit-learn, and many more (full list).
More interestingly they really support ANY model through their PytonModel
which is what I opt to use.
Why PythonModel
PythonModel
allows you to get a streamlined format that supports custom models, including Preprocessing and Postprocessing. Quite excellent!
To keep it simple you define a PythonModel
as follows:
class MyModel(mlflow.pyfunc.PythonModel):
def predict(self, context, model_input: np.ndarray, params: dict | None):
# model_input can also be pd.DataFrame, dict[str, np.ndarray], ...
return model_input
There’s additionally a load_context
method which lets you write how to load your model and other things. It’s run when “booting up”.
To log and load a model:
import mlflow
= "my_model.py"
model_path
with mlflow.start_run():
= mlflow.pyfunc.log_model(
model_info =MyModel(),
python_model="my_model",
artifact_path
)
= mlflow.pyfunc.load_model(model_info.model_uri) my_model
A bugged infer_code_paths
If you find, like me, that infer_code_paths
don’t work well see my fix in this blog-post.
This problem seems to be very common if you use sub-classing or have custom dependencies that are called outside load_context
, but my simple script helps you out!
Docker Containerization
It’s easily containerized calling the CLI or Python:
mlflow models build-docker -m runs:/<run_id>/model -n <image_name> --enable-mlserver
and
import mlflow
mlflow.models.build_docker(=f"runs:/{run_id}/model",
model_uri="<image_name>",
name=True,
enable_mlserver )
Smooth! Obviously this might not be an optimal image, but it’ll be sufficient and it’s very easy for people to build good enough images. All in all a helpful feature!
Outro
MLFlow Models provide a simple way to deploy models in a self-contained way.
I hope you try out MLFlow Models as they could end up helping you a lot.
~Hampus Londögård