Faster model inference with ONNX¶

A brief intro to ONNX followed by a demo! 🧪

By: Jeroen Overschie

onnx-homepage

ONNX libaries and targets

There's support for plenty of libraries that run on many different platforms.from IPython.display import IFrame

Out[2]:

Benchmark ONNX conversion¶

Comparing:

In [27]:
# 1) sklearn
import sklearn

# 2) onnxruntime
import skl2onnx
import onnxruntime

# 3) mlprodict
import mlprodict
Out[5]:
VotingRegressor(estimators=[('gb', GradientBoostingRegressor(random_state=1)),
                            ('rf', RandomForestRegressor(random_state=1)),
                            ('lr', LinearRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
VotingRegressor(estimators=[('gb', GradientBoostingRegressor(random_state=1)),
                            ('rf', RandomForestRegressor(random_state=1)),
                            ('lr', LinearRegression())])
GradientBoostingRegressor(random_state=1)
RandomForestRegressor(random_state=1)
LinearRegression()

Inference¶

Using sklearn's model.predict()

In [9]:
model.predict(one_observation)
Out[9]:
array([-118.94052453])

Using onnxruntime

In [26]:
from skl2onnx import to_onnx
from onnxruntime import InferenceSession

onx = to_onnx(model, X_train[:1].astype(numpy.float32),
              target_opset=14)

sess = InferenceSession(onx.SerializeToString())
sess.run(None, {'X': one_observation})[0]
Out[26]:
array([[-118.94051]], dtype=float32)

We can also save the .onnx model to disk:

In [30]:
with open("sklearn_model.onnx", "wb") as f:
    f.write(onx.SerializeToString())

!ls sklearn_model.onnx
sklearn_model.onnx
100%|███████████████████████████████████████████| 10/10 [00:47<00:00,  4.78s/it]
Out[15]:
average deviation min_exec max_exec repeat number size scikit-learn onnxruntime mlprodict
0 0.007386 0.000550 0.006989 0.009626 50 10 1 0.007386 2.229566e-05 0.000151
1 0.006992 0.000191 0.006660 0.007483 50 10 2 0.003496 1.496709e-05 0.000034
2 0.006990 0.000170 0.006744 0.007759 50 10 5 0.001398 9.139367e-06 0.000017
3 0.006981 0.000349 0.006703 0.008546 50 10 10 0.000698 8.288960e-06 0.000013
4 0.008790 0.000216 0.008502 0.009894 50 10 250 0.000035 1.145972e-06 0.000003
5 0.010951 0.000558 0.010300 0.013073 50 10 500 0.000022 1.050886e-06 0.000003
6 0.012817 0.000559 0.012026 0.014479 50 10 750 0.000017 1.103438e-06 0.000003
7 0.014354 0.000265 0.014124 0.015040 10 10 1000 0.000014 1.081331e-06 0.000003
8 0.040459 0.007147 0.037611 0.061870 10 10 5000 0.000008 9.428774e-07 0.000003
9 0.072818 0.003606 0.066701 0.077930 5 10 10000 0.000007 1.247680e-06 0.000003

Plot speed differences

Plot speed differences

💡 ONNX runtimes are much faster than scikit-learn to predict 1 observation.¶

  • scikit-learn is optimized for training (large batches)
  • scikit-learn and ONNX runtimes converge for big batches

Both use similar implementation, parallelization and languages (C++, openmp).

ONNX.js¶

ONNX in the browser ✨

In [22]:
IFrame('https://dunnkers.com/neural-network-backdoors/', width=1000, height=700)
Out[22]:

Conclusion¶

ONNX might be useful for you! 🫵🏻

  • Fast inference
  • Many platforms
  • Edge devices

🙏🏻

Thanks!¶

qr code to repo

Built by Jeroen Overschie @ GoDataDriven in 2022

→ inspired by a sklearn-onnx benchmark Notebook