<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Londogard Blog</title>
<link>https://blog.londogard.com/</link>
<atom:link href="https://blog.londogard.com/index.xml" rel="self" type="application/rss+xml"/>
<description>A blog which majorly focuses on Data Science, Data Engineering and sometimes Kotlin / Scala.</description>
<generator>quarto-1.8.27</generator>
<lastBuildDate>Mon, 02 Mar 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Beating ChatGPT and Gemini at Brick Figures</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/</link>
  <description><![CDATA[ 





<p>This post is not only be written by a real person but also hopefull short and to the point!</p>
<section id="the-what" class="level2">
<h2 class="anchored" data-anchor-id="the-what">The what</h2>
<p>Last year I blogged about <a href="../../posts/2025-01-09-img-2-lego/"><code>img2lego</code></a>, now renamed <code>brickportraits</code> to remove all connection to LEGO. In my tool I make something that’s not part of the LEGO franchise but can be built by any type of bricks.</p>
<p>Today I got a website <a href="https://brickportrait.londogard.com/">brickportraits.londogard.com</a> which allows anyone to run the generation, including a 2D mode (mosaic).</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/image.png" class="img-fluid figure-img"></p>
<figcaption>Front of Brickportratis</figcaption>
</figure>
</div>
</section>
<section id="the-comparison" class="level2">
<h2 class="anchored" data-anchor-id="the-comparison">The comparison</h2>
<p>So how good is the result? It depends on what you compare too.</p>
<p>Generalist LLM’s (ChatGPT and Gemini) is strong at generating images of 3D Brick Figures, but they hallucinate a lot including the available bricks! This includes the latest Nano Banana 2.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/nanobanana2.png" class="img-fluid figure-img" style="width:30.0%"></p>
<figcaption>Image generated Brick Figure, with imaginary bricks (gemini/nano banana 2)</figcaption>
</figure>
</div>
<p>Decent results, but not a fair comparison as it’s not actual bricks nor is it possible to get instructions.</p>
<p>Moving forward I ask competitors to generate an actual 3D Figure brick by brick (ThreeJS). They break down completely, let’s review!</p>
<div class="quarto-layout-panel" data-layout="[[1,1], [1,1,1]]">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/original_img.png" class="img-fluid figure-img"></p>
<figcaption><strong>Original</strong></figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/brickportrait_turn1.png" class="img-fluid figure-img"></p>
<figcaption><strong>Brickportrait</strong> <a href="https://brickportrait.londogard.com/job/17987446-e3ce-4df6-b376-f3b40690bf8f">link to demo</a> (mine)</figcaption>
</figure>
</div>
</div>
</div>
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/gemini_turn1.png" class="img-fluid figure-img"></p>
<figcaption><strong>Gemini</strong> #1</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/gemini_turn2.png" class="img-fluid figure-img"></p>
<figcaption><strong>Gemini</strong> #2</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/chatgpt_turn1.png" class="img-fluid figure-img"></p>
<figcaption><strong>ChatGPT</strong> #1</figcaption>
</figure>
</div>
</div>
</div>
</div>
<p>I’m not sure what you think, but to me the winner is as clear as day! 😉</p>
</section>
<section id="the-how" class="level2">
<h2 class="anchored" data-anchor-id="the-how">The how</h2>
<p>My results are made by AI models built for 3D generation, similar to <a href="https://www.meshy.ai/">Meshy.ai</a>, which I voxelize and apply algorithmic enhancements to. This type of model is trained and uses optimal 3D representations internally which makes it easier to work with.</p>
<p>Imagine yourself writing a block by block down and reasoning about how it’ll look, not easy right? It’s the same for the <em>generalist LLM’s</em>.<br>
With that said <em>generalist LLM’s</em> are capable and display good spatial ability in <a href="https://minebench.ai/">MineBench</a> when generating scenes <strong>from text</strong>. I’m not sure if the LLM actually reason or is able to generalize from Minecraft content online but for sure each new SotA model accelerate forward!</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/minebench.png" class="img-fluid figure-img"></p>
<figcaption>Prompt: A steampunk airship with a wooden hull, large brass propellers on each side, a balloon made of patchwork fabric above the deck, hanging ropes and ladders, and a glass-enclosed bridge at the front (minebench.ai)</figcaption>
</figure>
</div>
<p><strong>But!</strong> For now this capability is really only available when generating from text, if I add a reference image it fails utterly as shared in my earlier examples.</p>
<p><strong>What do I think personally?</strong> Text is not as “rigid” and allows more interpretation which in turn is hallucination-friendly, which can help the model generalize from internet content to produce great results.</p>
</section>
<section id="outro" class="level1">
<h1>Outro</h1>
<p>Interested to learn more or discuss something? Please reach out to me!</p>


</section>

 ]]></description>
  <category>python</category>
  <category>LLM</category>
  <guid>https://blog.londogard.com/posts/2026-03-02-brick-figure-beat-openai-gemini/</guid>
  <pubDate>Mon, 02 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>AWS Sagemaker Bring-Your-Own-Container (BYOC) with Pixi</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-10-29-til-pixi-aws-sagemaker-byoc/</link>
  <description><![CDATA[ 





<p>To run your own container in AWS Sagemaker (Training Job) it’s required to have <code>mamba</code>. So how would you use <code>pixi</code> instead?</p>
<p>It’s quite straight-forward, <strong>first add <code>micromamba</code> as part of your dependencies in <code>pixi.toml</code></strong>. Then you update your docker to something like this:</p>
<pre class="docker"><code>FROM --platform=linux/amd64 ghcr.io/pixi

COPY pixi.toml pixi.lock ./

ENV PIP_DEFAULT_TIMEOUT=360
ENV UV_HTTP_TIMEOUT=360
RUN pixi install --frozen # use .lock-file

# === IMPORTANT PART ===
# Link micromamba as mamba, to support AWS sagemaker
# Replace &lt;default&gt; with the environment name you use

# 1. link environments `micromamba` to `mamba`
RUN ln -s /.pixi/envs/default/bin/micromamba /.pixi/envs/default/bin/mamba
# 2. Add pixi to PATH
ENV PATH="/.pixi/envs/default/bin:${PATH}"
# 3. Add MAMBA_ROOT_PREFIX
ENV MAMBA_ROOT_PREFIX=/.pixi

# This isn't used by AWS but nice-to-have for local testing
# RUN pixi shell-hook -e default &gt; /shell-hook.sh
# RUN echo 'exec "$@"' &gt;&gt; /shell-hook.sh
# ENTRYPOINT ["/bin/bash", "/shell-hook.sh"]</code></pre>
<p>This’ll create a Docker image that’s usable inside AWS Sagemaker and let you keep using <code>pixi</code> throughout your whole dev-cycle, remote and local!</p>
<p>Perhaps there’s a easier way, if you know one please share in comments below! 👇</p>
<p>That’s it for now,<br>
~Hampus Londögård</p>



 ]]></description>
  <category>python</category>
  <category>aws</category>
  <category>sagemaker</category>
  <guid>https://blog.londogard.com/posts/2025-10-29-til-pixi-aws-sagemaker-byoc/</guid>
  <pubDate>Wed, 29 Oct 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Pixi - A year later</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-10-22-pixi-2/</link>
  <description><![CDATA[ 





<p>I’ve been using <code>pixi</code> professionally in my team for a year now, and it has been a pleasure from the start.</p>
<p><strong>N.B.</strong> We’ve also been utilizing <code>uv</code> inside the <code>pixi</code> environment.</p>
<blockquote class="blockquote">
<p><strong>Brief Intro</strong> <a href="https://docs.astral.sh/uv/"><code>uv</code></a>: A faster &amp; better pip. <a href="https://pixi.sh/"><code>pixi</code></a>: A faster and better pip+conda (bonus: beats mamba and works for other languages). Uses <code>uv</code> under-the-hood for pip installs.</p>
</blockquote>
<p>From the moment I (we) started using uv/pixi something clicked. It’s simple, fast and developer experience (DX) is at a new level. The way pixi seamlessly integrates uv makes me happy about the ecosystem.<br>
The reason I opted to use pixi rather than uv a year ago:</p>
<ol type="1">
<li>Conda Universe</li>
<li>Able to include CLI tools in environment, e.g.&nbsp;<code>ffmpeg</code> &amp; <code>s5cmd</code>.</li>
<li>Pixi Tasks - alias your common workloads so no-one forget order.
<ul>
<li>Might sound lazy, but it helps structure.</li>
</ul></li>
</ol>
<p><strong>The best part?</strong> Everyone in the team has been pleased with the transition, and it’s been improvements all around.</p>
<p>Moving on I’d like to share what we’ve really enjoyed when you look back.</p>
<section id="dependent-environments" class="level2">
<h2 class="anchored" data-anchor-id="dependent-environments">Dependent Environments</h2>
<p>This is not a pixi-only feature, but it’s great to be able to have multiple environments in our base. We use a mono-repo style of working where we combine everything from Training to Monitoring in the same repo.</p>
<p>We set up a base environment which is shared among the sub-environments, this includes dependencies like <code>polars</code>, <code>deltalake</code> and <code>s5cmd</code> which is integral to our way of working.<br>
Then we extend base with <code>computer_vision</code> and add our CV related dependencies, and for our <code>data_pipeline</code> we extend with data utilities.</p>
<p>What’s great about this? We bump <code>polars</code> in one location and it’ll be validated across all environments at the same time.</p>
<p>What’s also great? We can separate dependencies based on platform, i.e.&nbsp;<code>osx-arm64</code> will have a separate dependency from <code>linux-64</code> which is important as <code>osx-arm64</code> doesn’t have CUDA.</p>
</section>
<section id="pixi-tasks" class="level2">
<h2 class="anchored" data-anchor-id="pixi-tasks">Pixi Tasks</h2>
<p>Another featuer we’ve utilized more and more with time is <code>tasks</code>. They’re easy yet powerful.</p>
<p>What type of tasks do we have this far?</p>
<ol type="1">
<li><code>pixi run deploy_monitor</code>: deploy our Streamlit App to Snowflake</li>
<li><code>pixi run labelme</code>: open our labeling program with the correct settings out-of-the-box, and refreshes the pre-signed s3-urls (yes, we’ve patched LabelMe to support URLs :wink:)</li>
<li><code>pixi run docker_deploy</code>: build our docker containrs and push correctly to AWS based on arguments</li>
</ol>
<p>With one word: <em>Awesome</em>.<br>
We’ve found it helpful in getting people setup and launching internal tools as there’s no need to remember where script is and what order to run.</p>
</section>
<section id="temporary-environments" class="level2">
<h2 class="anchored" data-anchor-id="temporary-environments">Temporary Environments</h2>
<p>Both Pixi and UV supports temporary environments and global tools. To test something out run:</p>
<section id="temporary-environments-1" class="level3">
<h3 class="anchored" data-anchor-id="temporary-environments-1">Temporary Environments</h3>
<ol type="1">
<li><code>pixi exec --with &lt;dep&gt; main.py</code></li>
<li><code>uv run --with &lt;dep&gt; main.py</code></li>
</ol>
</section>
<section id="global-tools" class="level3">
<h3 class="anchored" data-anchor-id="global-tools">Global Tools</h3>
<ol type="1">
<li><code>pixi global install &lt;dep&gt;</code></li>
<li><code>uv tool install &lt;dep&gt;</code></li>
</ol>
</section>
</section>
<section id="final-thoughts-one-year-later" class="level1">
<h1>Final thoughts one year later</h1>
<p>Pixi has been a huge success internally in my team, and I’ve been using both pixi and uv privately. Privately I tend to gravitate towards <code>uv</code> as I only have my own environment and it’s a little bit “easier” (hard to explain).</p>
<p>The idea of introducing a completely new tool, pixi, and hearing no complaints, but rather positive remarks is incredible and speaks magnitudes about how good pixi behaves in enterprise settings.<br>
To give some background the team started with broken pip-installs, “works on my computer”, 2 years ago and I transitioned the team into micromamba which stabilized a lot. But it wasn’t a fully positive experience in the team. 1 year ago I moved the team to pixi which has been a lot smoother and happy faces all around.</p>
<p>Happy team, happy life!<br>
~Hampus Londögård</p>


</section>

 ]]></description>
  <category>python</category>
  <guid>https://blog.londogard.com/posts/2025-10-22-pixi-2/</guid>
  <pubDate>Wed, 22 Oct 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Resizing Images: PIL, cv2 and scikit-image</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-09-03-resizing-pil-cv2-skimage/</link>
  <description><![CDATA[ 





<p>ℹ️ This is a really small piece of “nothing”, but it might save you (and future me) some time!</p>
<hr>
<p>Recently I started playing around with <a href="http://scikit-image.org/">scikit-image</a> library which is really cool. I found that they had a decent resizing tool, but diving deeper it actually turned out to be <strong>really slow</strong>.<br>
I’d even go as far as to say that the anti-aliasing (AA) of scikit-image might be too agressive, but you can tune it luckily.<br>
In comparison PIL seems really performant with sane defaults, while OpenCV is a bit low-level and requires a manual gaussian filter to achieve good results.</p>
<p>This is likely not a bottleneck in anyones pipeline, but I love bags of freebies and when running a server on a Raspberry Pi it’s always nice to have some extra performance.</p>
<section id="quick-benchmarks" class="level2">
<h2 class="anchored" data-anchor-id="quick-benchmarks">Quick benchmarks:</h2>
<blockquote class="blockquote">
<p>Please note that this benchmark is not scientific, it’s a simple <code>timeit(number=100)</code>, but it’s quite telling anyhow!</p>
</blockquote>
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th>Mode</th>
<th>Timer</th>
<th>Person</th>
<th>Chess Board</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Original</td>
<td>N/A</td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/94dbe4f6-7765-40d5-8dec-46cf479a2aea.jpg" class="img-fluid" alt="94dbe4f6-7765-40d5-8dec-46cf479a2aea.jpg"></td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/ffa23487-4156-4fb5-a8dd-12e0a4eaad0a.png" class="img-fluid"></td>
</tr>
<tr class="even">
<td>PIL.resize(LANCOZ, reducing_gaps=None)</td>
<td>9.18</td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/2277afbe-3a8b-4380-8a01-f4eff080f7b0.png" class="img-fluid" alt="2277afbe-3a8b-4380-8a01-f4eff080f7b0.png|160.03750610351562"></td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/ec46b283-7c5b-4cef-ac33-e7c02cc79d9f.png" class="img-fluid" alt="ec46b283-7c5b-4cef-ac33-e7c02cc79d9f.png|441.0375061035156"></td>
</tr>
<tr class="odd">
<td>PIL.resize(LANCOZ, reducing_gaps=2)</td>
<td><strong>0.0003</strong></td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/b1959e2a-9ee3-47b0-8c94-42cf8a6eec84.png" class="img-fluid" alt="b1959e2a-9ee3-47b0-8c94-42cf8a6eec84.png|165.03750610351562"></td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/4b5441bb-fa02-478d-a9f8-1c709d63fd13.png" class="img-fluid" alt="4b5441bb-fa02-478d-a9f8-1c709d63fd13.png|441.0375061035156"></td>
</tr>
<tr class="even">
<td>ski.resize(aa=True)</td>
<td>69.15</td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/8f9ab362-866b-42a4-9bd1-282f36240684.png" class="img-fluid" alt="8f9ab362-866b-42a4-9bd1-282f36240684.png|167.03750610351562"></td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/188cf8f3-d157-4034-98cb-b81c68fa02a0.png" class="img-fluid" alt="188cf8f3-d157-4034-98cb-b81c68fa02a0.png|441.0375061035156"></td>
</tr>
<tr class="odd">
<td>cv2.resize(INTER_AREA)</td>
<td>5.88</td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/0d26dab6-389b-48e1-b354-d62deb862ae3.png" class="img-fluid" alt="0d26dab6-389b-48e1-b354-d62deb862ae3.png|168.03750610351562"></td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/9693c723-e405-4db6-ba7c-c7716a2291c6.png" class="img-fluid"></td>
</tr>
<tr class="even">
<td>cv2.resize(LANCOZ + GaussBlur)</td>
<td>4.25</td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/1747bfaa-bf53-455d-ba32-f3535bf8585f.png" class="img-fluid" alt="1747bfaa-bf53-455d-ba32-f3535bf8585f.png|174.03750610351562"></td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/3a715f42-ff02-465e-bca8-4290c16eeb3d.png" class="img-fluid"></td>
</tr>
<tr class="odd">
<td>cv2.resize(LANCOZ)</td>
<td>0.033</td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/a8fa7498-da6c-4881-93fe-a34697b961e1.png" class="img-fluid" alt="a8fa7498-da6c-4881-93fe-a34697b961e1.png|178.03750610351562"></td>
<td><img src="https://images.amplenote.com/8a41f494-8709-11f0-a85e-7b379051e4bb/fb425821-e7a3-4e0c-8b77-11c66dc32245.png" class="img-fluid"></td>
</tr>
</tbody>
</table>
<p>~Hampus Londögård</p>


</section>

 ]]></description>
  <category>python</category>
  <category>cv</category>
  <guid>https://blog.londogard.com/posts/2025-09-03-resizing-pil-cv2-skimage/</guid>
  <pubDate>Wed, 03 Sep 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>TIL: s5cmd</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-08-29-til-s5cmd/</link>
  <description><![CDATA[ 





<p>When working with code, or anything really, you always apply trade-offs. One example is <em>simplicity</em> versus <em>runtime efficiency</em>, often talked about in <em>CPU-cycles</em> versus <em>brain-cycles</em>, where the latter usually wins. But sometimes the trade-off is hard, s5cmd is such a case - it’s a single new dependencies with massive gains.</p>
<p><em><a href="https://github.com/peak/s5cmd">S5cmd</a> is a very fast S3 and local filesystem execution tool</em>. For those that care it’s written in Go, which is a fast language, by Google, that builds small simple binaries.</p>
<p>I can’t share specific numbers from work, but the speedup is <em>approximately 30x compared to a (simple) custom threading pool in Python</em>, that’s huge! Joshua Robinson found the same numbers in his <a href="https://joshua-robinson.medium.com/s5cmd-for-high-performance-object-storage-7071352cc09d">blog</a> when comparing with s3cmd / aws-cli.</p>
<p>In our case the single dependency addition was worth it because the efficiency and cost-reduction overweighs the cons, especially as the dependency itself is very lean.</p>
<p>I hope someone who’s in need of a faster S3 download/upload tool reads this and manages to speed their tooling up! 😊</p>
<p>Thanks for this time, Hampus Londögård</p>



 ]]></description>
  <category>aws</category>
  <category>s3</category>
  <category>til</category>
  <guid>https://blog.londogard.com/posts/2025-08-29-til-s5cmd/</guid>
  <pubDate>Fri, 29 Aug 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Transformers.js.py accelerated on-device inference with Python WASM</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-03-25-trimesh-marimo-pr/</link>
  <description><![CDATA[ 





<p><a href="https://huggingface.co/docs/transformers.js/en/index">Transformers.js</a> is an ambitious project by HuggingFace to bring <em>transformers</em> to Web/JS and simplify inference on-device, running <code>onnxruntime-web</code> under-the-hood.<br>
I’ve written blogs and apps with <code>onnxruntime-web</code><sup>1</sup>, and I must say - I’m a sucker for efficient on-device inference!</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-03-25-trimesh-marimo-pr/image.png" class="img-fluid figure-img"></p>
<figcaption>Marimo WASM App: Before/After Prediction</figcaption>
</figure>
</div>
<p>Test it live <a href="https://marimo.io/p/@hlondogard/notebook-transformer-js-py-object-detection-wasm?show-code=false">here</a>, and yes it runs on your device in the browser!</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Embedded Live Demo (running on device, in browser)
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-1" class="callout-1-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>I’m not sure why the width can’t resolve itself, if you can’t use it try <a href="https://marimo.io/p/@hlondogard/notebook-transformer-js-py-object-detection-wasm?show-code=false">here</a>.</p>
<iframe src="https://marimo.io/p/@hlondogard/notebook-transformer-js-py-object-detection-wasm?show-code=false&amp;embed=true" frameborder="0" width="100%" height="600px"></iframe>
</div>
</div>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-2-contents" aria-controls="callout-2" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>ONNX Runtime and <code>onnxruntime-web</code>
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-2" class="callout-2-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p><code>onnxruntime-web</code> helps run models efficiently directly in your browser (JS) through the default backend <em>WASM</em> (cpu).</p>
<p>Additionally you can accelerate inference via GPU/NPU when swapping backend to either <code>WebGL</code>, <code>WebGPU</code>, or <code>WebNN</code>. I think this is really cool as we can now develop Progressive Web Apps (PWA) that accelerate their inference using a smartphones NPU - crazy!</p>
<p>ONNX Runtime also supports many other ways to run inference, e.g.&nbsp;JVM, .NET, Python, and C++.</p>
</div>
</div>
</div>
<p>And recently I learned about a thin wrapper around <code>transformers.js</code>, namely <a href="https://github.com/whitphx/transformers.js.py">Transformers.js.py</a> that proxies the API to <a href="https://pyodide.org/en/stable/">Pyodide</a><sup>2</sup>.<br>
Right below 👇 I share how to run Object Detection.</p>
<section id="how-to-object-detection-inference" class="level2">
<h2 class="anchored" data-anchor-id="how-to-object-detection-inference">How to: Object Detection Inference</h2>
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>infer.py</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="annotated-cell-1" data-filename="infer.py" style="background: #f1f3f5;"><pre class="sourceCode python code-annotation-code code-with-copy code-annotated"><code class="sourceCode python"><a class="code-annotation-anchor" data-target-cell="annotated-cell-1" data-target-annotation="1" onclick="event.preventDefault();">1</a><span id="annotated-cell-1-1" class="code-annotation-target"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> transformers_js_py <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> import_transformers_js, as_url</span>
<span id="annotated-cell-1-2">transformers <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> import_transformers_js()</span>
<span id="annotated-cell-1-3">pipeline <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> transformers.pipeline</span>
<span id="annotated-cell-1-4"></span>
<span id="annotated-cell-1-5">img <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;URL_OR_PATH_TO_AN_IMAGE&gt;"</span></span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-1" data-target-annotation="2" onclick="event.preventDefault();">2</a><span id="annotated-cell-1-6" class="code-annotation-target">pipe <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> pipeline(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"object-detection"</span>)</span>
<span id="annotated-cell-1-7"></span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-1" data-target-annotation="3" onclick="event.preventDefault();">3</a><span id="annotated-cell-1-8" class="code-annotation-target">pred <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> pipe(as_url(img))</span>
<span id="annotated-cell-1-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pred) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># list of predictions [{"score": float, "label": str, "box": dict[str, int]}, ...]</span></span><div class="code-annotation-gutter-bg"></div><div class="code-annotation-gutter"></div></code></pre></div></div>
</div>
<dl class="code-annotation-container-grid">
<dt data-target-cell="annotated-cell-1" data-target-annotation="1">1</dt>
<dd>
<span data-code-cell="annotated-cell-1" data-code-lines="1,2,3" data-code-annotation="1">Import library and “import” <code>pipeline</code></span>
</dd>
<dt data-target-cell="annotated-cell-1" data-target-annotation="2">2</dt>
<dd>
<span data-code-cell="annotated-cell-1" data-code-lines="6" data-code-annotation="2">Set up <code>pipeline</code> object, downloading model and making things ready to run</span>
</dd>
<dt data-target-cell="annotated-cell-1" data-target-annotation="3">3</dt>
<dd>
<span data-code-cell="annotated-cell-1" data-code-lines="8" data-code-annotation="3">Run inference. <code>as_url</code> converts local/virtual files to a URL as <code>pipeline</code> object requires URL’s that can be opened in the JS context.</span>
</dd>
</dl>
<p>I share how to customize the inference in Section&nbsp;3 and a full-fledged WebApp with on-device inference using Marimo in Section&nbsp;4</p>
</section>
<section id="why-transformer.js.py-pyodide" class="level2">
<h2 class="anchored" data-anchor-id="why-transformer.js.py-pyodide">Why <code>transformer.js.py</code> + <code>pyodide</code></h2>
<p>There’s a good question here: <em>why not run JS directly?</em><br>
I don’t have a great answer, it’s all about trade-offs.</p>
<p>JS enables “native” usage which likely works better in real-time applications, as it runs JS-&gt;WASM rather than WASM-&gt;JS-&gt;WASM.<br>
What JS doesn’t have is a robust data science ecosystem, unlike Python. “Merging” the two through Pyodide makes sense, and further its fun! 🤓</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-3-contents" aria-controls="callout-3" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>A quick Pro/Con list of JS vs Pyodide
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-3" class="callout-3-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p><strong>Why JS?</strong></p>
<ul>
<li><strong>Pros:</strong>
<ul>
<li>Faster / Realtime (as no WASM/JS communication)</li>
<li>Native integration in webapps</li>
</ul></li>
<li><strong>Cons:</strong>
<ul>
<li>Not great data science tools</li>
</ul></li>
</ul>
<p><strong>Why Python (Pyodide)?</strong></p>
<ul>
<li><strong>Pros:</strong>
<ul>
<li>Great ecosystem (PIL.Image, numpy, altair, polars, …)</li>
<li>Familarity</li>
<li>Simpler PoC UI tools available (marimo, streamlit, solara, jupyterlite)</li>
</ul></li>
<li><strong>Cons:</strong>
<ul>
<li>Overhead moving data from Pyodide (WASM) to JS
<ul>
<li>Hard to make realtime because of this</li>
</ul></li>
</ul></li>
</ul>
</div>
</div>
</div>
</section>
<section id="sec-usage" class="level2">
<h2 class="anchored" data-anchor-id="sec-usage">Inference Customization</h2>
<p>To select a specific model define the name as you build the pipeline.</p>
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>infer_options.py</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="annotated-cell-2" data-filename="infer_options.py" style="background: #f1f3f5;"><pre class="sourceCode python code-annotation-code code-with-copy code-annotated"><code class="sourceCode python"><span id="annotated-cell-2-1">pipe <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> pipeline(</span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-2" data-target-annotation="1" onclick="event.preventDefault();">1</a><span id="annotated-cell-2-2" class="code-annotation-target">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"object-detection"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># task_name</span></span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-2" data-target-annotation="2" onclick="event.preventDefault();">2</a><span id="annotated-cell-2-3" class="code-annotation-target">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Xenova/yolos-tiny"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># model_name</span></span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-2" data-target-annotation="3" onclick="event.preventDefault();">3</a><span id="annotated-cell-2-4" class="code-annotation-target">  {</span>
<span id="annotated-cell-2-5">    dtype: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"q4"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># default "q8" for WASM.</span></span>
<span id="annotated-cell-2-6">    device: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"webgpu"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># default "WASM" (cpu)</span></span>
<span id="annotated-cell-2-7">    use_external_data_format: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"false"</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># default "false", set "true" to load &gt;= 2GB model</span></span>
<span id="annotated-cell-2-8">  }</span>
<span id="annotated-cell-2-9">)</span><div class="code-annotation-gutter-bg"></div><div class="code-annotation-gutter"></div></code></pre></div></div>
</div>
<dl class="code-annotation-container-grid">
<dt data-target-cell="annotated-cell-2" data-target-annotation="1">1</dt>
<dd>
<span data-code-cell="annotated-cell-2" data-code-lines="2" data-code-annotation="1">Find all available tasks and their linked model-list <a href="https://huggingface.co/docs/transformers.js/index#tasks">here</a>.</span>
</dd>
<dt data-target-cell="annotated-cell-2" data-target-annotation="2">2</dt>
<dd>
<span data-code-cell="annotated-cell-2" data-code-lines="3" data-code-annotation="2">Find all available Object Detection models <a href="https://huggingface.co/models?pipeline_tag=object-detection&amp;library=transformers.js">here</a>.</span>
</dd>
<dt data-target-cell="annotated-cell-2" data-target-annotation="3">3</dt>
<dd>
<span data-code-cell="annotated-cell-2" data-code-lines="4,5,6,7,8" data-code-annotation="3">Find all options <a href="https://huggingface.co/docs/transformers.js/en/api/utils/hub#utilshubmodelspecificpretrainedoptions--code-object-code">here</a> and <a href="https://huggingface.co/docs/transformers.js/en/api/utils/hub#utilshubmodelspecificpretrainedoptions--code-object-code">here</a>.</span>
</dd>
</dl>
<p>Simple right?<br>
I’m continuously impressed by how far we’ve gotten. On-device inference, even with acceleration, is a painless thing today. If you want simplicity I recommend <code>web</code> and otherwise to use the <em>mobile</em>/<em>native</em> releases or alternatively <a href="https://ai.google.dev/edge/litert">LiteRT</a> (previously TFLite).</p>
<p>What’s left?<br>
Improving the JS data science ecosystem, for now I prefer Pyodide because of the vast ecosystem. Though I’d like to congratulate <code>transformers.js</code> at successfully making inference simple for people who simply wants a blackbox. Personally I usually want to work with data before/after inference which requires better tools that Pyodide provides.</p>
</section>
<section id="sec-marimo" class="level2">
<h2 class="anchored" data-anchor-id="sec-marimo">WASM App using Marimo</h2>
<p>If you’ve read my blog you know I recently discovered <a href="https://marimo.io/">Marimo</a>, and as always with new tools you try to use them, perhaps a bit too much, whenever you can.<br>
I thought I’d give it a shot to integrate with <code>transformer.js.py</code> and run the inference fully on-device with WASM.<br>
It’s certainly not real-time, but ~5 seconds per image is OK I’d say.</p>
<div id="fig-infernce" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-infernce-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-video"><video id="video_shortcode_videojs_video1" class="video-js vjs-default-skin vjs-fluid" controls="" preload="auto" data-setup="{}" title=""><source src="on-device-preds.mp4"></video></div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-infernce-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Marimo WASM App running inference on two images.
</figcaption>
</figure>
</div>
<p>All in all I think this approach is quite neat and could provide very useful, especially for Proof-of-Concepts or Internal Tooling.</p>
<p>Run the app yourself via my <a href="https://marimo.io/p/@hlondogard/notebook-transformer-js-py-object-detection-wasm?show-code=false">marimo.io WASM notebook</a>. Show the code by clicking the three dots in top-right corner.</p>
<p>Thanks for this time,<br>
Hampus Londögård</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>through JS / KotlinJS perspective↩︎</p></li>
<li id="fn2"><p>Pyodide is CPython port to WASM, enabling Python running directly in the browser↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>python</category>
  <category>wasm</category>
  <category>js</category>
  <category>inference</category>
  <category>onnxruntime</category>
  <guid>https://blog.londogard.com/posts/2025-03-25-trimesh-marimo-pr/</guid>
  <pubDate>Wed, 19 Mar 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Transformers.js.py accelerated on-device inference with Python WASM</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-03-19-wasm-python-js-inference/</link>
  <description><![CDATA[ 





<p><a href="https://huggingface.co/docs/transformers.js/en/index">Transformers.js</a> is an ambitious project by HuggingFace to bring <em>transformers</em> to Web/JS and simplify inference on-device, running <code>onnxruntime-web</code> under-the-hood.<br>
I’ve written blogs and apps with <code>onnxruntime-web</code><sup>1</sup>, and I must say - I’m a sucker for efficient on-device inference!</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-03-19-wasm-python-js-inference/image.png" class="img-fluid figure-img"></p>
<figcaption>Marimo WASM App: Before/After Prediction</figcaption>
</figure>
</div>
<p>Test it live <a href="https://marimo.io/p/@hlondogard/notebook-transformer-js-py-object-detection-wasm?show-code=false">here</a>, and yes it runs on your device in the browser!</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Embedded Live Demo (running on device, in browser)
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-1" class="callout-1-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>I’m not sure why the width can’t resolve itself, if you can’t use it try <a href="https://marimo.io/p/@hlondogard/notebook-transformer-js-py-object-detection-wasm?show-code=false">here</a>.</p>
<iframe src="https://marimo.io/p/@hlondogard/notebook-transformer-js-py-object-detection-wasm?show-code=false&amp;embed=true" frameborder="0" width="100%" height="600px"></iframe>
</div>
</div>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-2-contents" aria-controls="callout-2" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>ONNX Runtime and <code>onnxruntime-web</code>
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-2" class="callout-2-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p><code>onnxruntime-web</code> helps run models efficiently directly in your browser (JS) through the default backend <em>WASM</em> (cpu).</p>
<p>Additionally you can accelerate inference via GPU/NPU when swapping backend to either <code>WebGL</code>, <code>WebGPU</code>, or <code>WebNN</code>. I think this is really cool as we can now develop Progressive Web Apps (PWA) that accelerate their inference using a smartphones NPU - crazy!</p>
<p>ONNX Runtime also supports many other ways to run inference, e.g.&nbsp;JVM, .NET, Python, and C++.</p>
</div>
</div>
</div>
<p>And recently I learned about a thin wrapper around <code>transformers.js</code>, namely <a href="https://github.com/whitphx/transformers.js.py">Transformers.js.py</a> that proxies the API to <a href="https://pyodide.org/en/stable/">Pyodide</a><sup>2</sup>.<br>
Right below 👇 I share how to run Object Detection.</p>
<section id="how-to-object-detection-inference" class="level2">
<h2 class="anchored" data-anchor-id="how-to-object-detection-inference">How to: Object Detection Inference</h2>
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>infer.py</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="annotated-cell-1" data-filename="infer.py" style="background: #f1f3f5;"><pre class="sourceCode python code-annotation-code code-with-copy code-annotated"><code class="sourceCode python"><a class="code-annotation-anchor" data-target-cell="annotated-cell-1" data-target-annotation="1" onclick="event.preventDefault();">1</a><span id="annotated-cell-1-1" class="code-annotation-target"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> transformers_js_py <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> import_transformers_js, as_url</span>
<span id="annotated-cell-1-2">transformers <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> import_transformers_js()</span>
<span id="annotated-cell-1-3">pipeline <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> transformers.pipeline</span>
<span id="annotated-cell-1-4"></span>
<span id="annotated-cell-1-5">img <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;URL_OR_PATH_TO_AN_IMAGE&gt;"</span></span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-1" data-target-annotation="2" onclick="event.preventDefault();">2</a><span id="annotated-cell-1-6" class="code-annotation-target">pipe <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> pipeline(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"object-detection"</span>)</span>
<span id="annotated-cell-1-7"></span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-1" data-target-annotation="3" onclick="event.preventDefault();">3</a><span id="annotated-cell-1-8" class="code-annotation-target">pred <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> pipe(as_url(img))</span>
<span id="annotated-cell-1-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pred) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># list of predictions [{"score": float, "label": str, "box": dict[str, int]}, ...]</span></span><div class="code-annotation-gutter-bg"></div><div class="code-annotation-gutter"></div></code></pre></div></div>
</div>
<dl class="code-annotation-container-grid">
<dt data-target-cell="annotated-cell-1" data-target-annotation="1">1</dt>
<dd>
<span data-code-cell="annotated-cell-1" data-code-lines="1,2,3" data-code-annotation="1">Import library and “import” <code>pipeline</code></span>
</dd>
<dt data-target-cell="annotated-cell-1" data-target-annotation="2">2</dt>
<dd>
<span data-code-cell="annotated-cell-1" data-code-lines="6" data-code-annotation="2">Set up <code>pipeline</code> object, downloading model and making things ready to run</span>
</dd>
<dt data-target-cell="annotated-cell-1" data-target-annotation="3">3</dt>
<dd>
<span data-code-cell="annotated-cell-1" data-code-lines="8" data-code-annotation="3">Run inference. <code>as_url</code> converts local/virtual files to a URL as <code>pipeline</code> object requires URL’s that can be opened in the JS context.</span>
</dd>
</dl>
<p>I share how to customize the inference in Section&nbsp;3 and a full-fledged WebApp with on-device inference using Marimo in Section&nbsp;4</p>
</section>
<section id="why-transformer.js.py-pyodide" class="level2">
<h2 class="anchored" data-anchor-id="why-transformer.js.py-pyodide">Why <code>transformer.js.py</code> + <code>pyodide</code></h2>
<p>There’s a good question here: <em>why not run JS directly?</em><br>
I don’t have a great answer, it’s all about trade-offs.</p>
<p>JS enables “native” usage which likely works better in real-time applications, as it runs JS-&gt;WASM rather than WASM-&gt;JS-&gt;WASM.<br>
What JS doesn’t have is a robust data science ecosystem, unlike Python. “Merging” the two through Pyodide makes sense, and further its fun! 🤓</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-3-contents" aria-controls="callout-3" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>A quick Pro/Con list of JS vs Pyodide
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-3" class="callout-3-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p><strong>Why JS?</strong></p>
<ul>
<li><strong>Pros:</strong>
<ul>
<li>Faster / Realtime (as no WASM/JS communication)</li>
<li>Native integration in webapps</li>
</ul></li>
<li><strong>Cons:</strong>
<ul>
<li>Not great data science tools</li>
</ul></li>
</ul>
<p><strong>Why Python (Pyodide)?</strong></p>
<ul>
<li><strong>Pros:</strong>
<ul>
<li>Great ecosystem (PIL.Image, numpy, altair, polars, …)</li>
<li>Familarity</li>
<li>Simpler PoC UI tools available (marimo, streamlit, solara, jupyterlite)</li>
</ul></li>
<li><strong>Cons:</strong>
<ul>
<li>Overhead moving data from Pyodide (WASM) to JS
<ul>
<li>Hard to make realtime because of this</li>
</ul></li>
</ul></li>
</ul>
</div>
</div>
</div>
</section>
<section id="sec-usage" class="level2">
<h2 class="anchored" data-anchor-id="sec-usage">Inference Customization</h2>
<p>To select a specific model define the name as you build the pipeline.</p>
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>infer_options.py</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="annotated-cell-2" data-filename="infer_options.py" style="background: #f1f3f5;"><pre class="sourceCode python code-annotation-code code-with-copy code-annotated"><code class="sourceCode python"><span id="annotated-cell-2-1">pipe <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> pipeline(</span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-2" data-target-annotation="1" onclick="event.preventDefault();">1</a><span id="annotated-cell-2-2" class="code-annotation-target">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"object-detection"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># task_name</span></span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-2" data-target-annotation="2" onclick="event.preventDefault();">2</a><span id="annotated-cell-2-3" class="code-annotation-target">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Xenova/yolos-tiny"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># model_name</span></span>
<a class="code-annotation-anchor" data-target-cell="annotated-cell-2" data-target-annotation="3" onclick="event.preventDefault();">3</a><span id="annotated-cell-2-4" class="code-annotation-target">  {</span>
<span id="annotated-cell-2-5">    dtype: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"q4"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># default "q8" for WASM.</span></span>
<span id="annotated-cell-2-6">    device: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"webgpu"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># default "WASM" (cpu)</span></span>
<span id="annotated-cell-2-7">    use_external_data_format: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"false"</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># default "false", set "true" to load &gt;= 2GB model</span></span>
<span id="annotated-cell-2-8">  }</span>
<span id="annotated-cell-2-9">)</span><div class="code-annotation-gutter-bg"></div><div class="code-annotation-gutter"></div></code></pre></div></div>
</div>
<dl class="code-annotation-container-grid">
<dt data-target-cell="annotated-cell-2" data-target-annotation="1">1</dt>
<dd>
<span data-code-cell="annotated-cell-2" data-code-lines="2" data-code-annotation="1">Find all available tasks and their linked model-list <a href="https://huggingface.co/docs/transformers.js/index#tasks">here</a>.</span>
</dd>
<dt data-target-cell="annotated-cell-2" data-target-annotation="2">2</dt>
<dd>
<span data-code-cell="annotated-cell-2" data-code-lines="3" data-code-annotation="2">Find all available Object Detection models <a href="https://huggingface.co/models?pipeline_tag=object-detection&amp;library=transformers.js">here</a>.</span>
</dd>
<dt data-target-cell="annotated-cell-2" data-target-annotation="3">3</dt>
<dd>
<span data-code-cell="annotated-cell-2" data-code-lines="4,5,6,7,8" data-code-annotation="3">Find all options <a href="https://huggingface.co/docs/transformers.js/en/api/utils/hub#utilshubmodelspecificpretrainedoptions--code-object-code">here</a> and <a href="https://huggingface.co/docs/transformers.js/en/api/utils/hub#utilshubmodelspecificpretrainedoptions--code-object-code">here</a>.</span>
</dd>
</dl>
<p>Simple right?<br>
I’m continuously impressed by how far we’ve gotten. On-device inference, even with acceleration, is a painless thing today. If you want simplicity I recommend <code>web</code> and otherwise to use the <em>mobile</em>/<em>native</em> releases or alternatively <a href="https://ai.google.dev/edge/litert">LiteRT</a> (previously TFLite).</p>
<p>What’s left?<br>
Improving the JS data science ecosystem, for now I prefer Pyodide because of the vast ecosystem. Though I’d like to congratulate <code>transformers.js</code> at successfully making inference simple for people who simply wants a blackbox. Personally I usually want to work with data before/after inference which requires better tools that Pyodide provides.</p>
</section>
<section id="sec-marimo" class="level2">
<h2 class="anchored" data-anchor-id="sec-marimo">WASM App using Marimo</h2>
<p>If you’ve read my blog you know I recently discovered <a href="https://marimo.io/">Marimo</a>, and as always with new tools you try to use them, perhaps a bit too much, whenever you can.<br>
I thought I’d give it a shot to integrate with <code>transformer.js.py</code> and run the inference fully on-device with WASM.<br>
It’s certainly not real-time, but ~5 seconds per image is OK I’d say.</p>
<div id="fig-infernce" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-infernce-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-video"><video id="video_shortcode_videojs_video1" class="video-js vjs-default-skin vjs-fluid" controls="" preload="auto" data-setup="{}" title=""><source src="on-device-preds.mp4"></video></div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-infernce-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Marimo WASM App running inference on two images.
</figcaption>
</figure>
</div>
<p>All in all I think this approach is quite neat and could provide very useful, especially for Proof-of-Concepts or Internal Tooling.</p>
<p>Run the app yourself via my <a href="https://marimo.io/p/@hlondogard/notebook-transformer-js-py-object-detection-wasm?show-code=false">marimo.io WASM notebook</a>. Show the code by clicking the three dots in top-right corner.</p>
<p>Thanks for this time,<br>
Hampus Londögård</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>through JS / KotlinJS perspective↩︎</p></li>
<li id="fn2"><p>Pyodide is CPython port to WASM, enabling Python running directly in the browser↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>python</category>
  <category>wasm</category>
  <category>js</category>
  <category>inference</category>
  <category>onnxruntime</category>
  <guid>https://blog.londogard.com/posts/2025-03-19-wasm-python-js-inference/</guid>
  <pubDate>Wed, 19 Mar 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Mini: DeepSeek’s smallpond - a distributed duckdb</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-03-07-deepseek-smallpond/</link>
  <description><![CDATA[ 





<p>There has been a lot of buss around <a href="https://www.deepseek.com/">DeepSeek</a> (R1) and their Open Source mission, and lately they released their full stack to train State-of-the-Art LLM’s.<br>
One of the tools is a <em>Distributed Data Processing</em> framework named <a href="https://github.com/deepseek-ai/smallpond"><em>“smallpond”</em></a> built on top of <a href="https://duckdb.org/"><em>DuckDB</em></a> <em>&amp;</em> <a href="https://www.ray.io/"><em>Ray</em></a>.<br>
Mike made an excellent write-up on his <a href="https://www.definite.app/blog/smallpond">blog.</a></p>
<p><strong>The summary?</strong> It’s a tool that you can’t even <a href="https://x.com/suchenzang/status/1895437762427560236">buy with millions $</a>, insanely valuable Open Source code! Draw-back? A lot of setup, early days with few (if any) guides.<br>
<strong>When should I use it?</strong> When you start to have more than 10 TB of data to query, especially above 1 PB.</p>
<p><strong>My thoughts</strong> are that</p>
<ol type="1">
<li><em>smallpond</em> brings the “modern data stack” closer to end-user for truly Big Data, but not close enough.
<ul>
<li>We see <em>Apache Arrow</em> and <em>Ray</em> (a lot more lean than say <a href="https://airflow.apache.org/">Apache Airflow</a>) as key technologies, and the engines are interchangable between DuckDB and Polars.</li>
</ul></li>
<li>There’s other competition trying similar, e.g.&nbsp;<a href="https://pola.rs/posts/polars-cloud-what-we-are-building/">Polars Cloud</a> (albeit potentially not Open Source it’s an exciting future)!</li>
<li>There’s other competition rather looking at vertical scaling, e.g.&nbsp;<a href="https://motherduck.com/">Motherduck</a> .</li>
</ol>
<p>At the end of the day Motherducks approach resonates a lot more to me, by storing data cleverly we can easily query huge amount of data efficiently on a single machine through metadata scanning, especially with vertical scaling. <em>It’s also the simplest approach.</em></p>
<p>But some days you might be in need of that <em>brute-force</em> because there isn’t time, competence or your problem simply requires loading and working with <em>insane amounts of data</em>, i.e.&nbsp;LLM training.</p>
<p>All in all <em>smallpond</em> and <em>3FS</em> are great additions to the open source community and extends the “distributed truly big data processing” which is a valuable target. Though I can’t help but think and hope that there’ll be even simpler tools moving forward.</p>



 ]]></description>
  <category>python</category>
  <category>data</category>
  <category>distributed</category>
  <guid>https://blog.londogard.com/posts/2025-03-07-deepseek-smallpond/</guid>
  <pubDate>Fri, 07 Mar 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Marimo WASM Apps</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-03-02-marimo-spreadsheets/</link>
  <description><![CDATA[ 





<p>This post will be short, I recently built a WASM app that allows you to 1) convert between Parquet/CSV/JSON and 2) explore the data using Marimos built-in tooling.</p>
<p>I shared an initial introduction to marimo in a <a href="../../posts/2025-02-17-marimo">blog earlier</a>.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>In short <strong>marimo</strong> is an up-and-coming Notebook that also have “App-mode” and can run using WASM.</p>
</div>
</div>
<section id="appconverter" class="level2">
<h2 class="anchored" data-anchor-id="appconverter">App:Converter</h2>
<p>There’s multiple libraries suppoted in Pyodide, Python WASM, among them: <code>polars</code>, <code>duckdb</code> and <code>pandas</code>.<br>
All these libraries are exceptional, with <code>pandas</code> as a exception 😉, to do Data Science and work with tabular data. They also have read/write support for <code>JSON</code>, <code>CSV</code> and <code>parquet</code> files.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Parquet</strong> is an awesome format for reading data, it’s <em>small, fast &amp; efficient</em> which in turn enables magnitudes better performance (10-100x easily).</p>
</div>
</div>
<p>These tools enable my simple <em>converter</em> that allows more people to easily move <em>from CSV/JSON to Parquet</em>, and in turn have faster plotting!</p>
<p>As always there’s some problems implementing:</p>
<ul>
<li><code>polars</code> doesn’t support parquet/JSON in WASM
<ul>
<li>–&gt; fall-back to <code>duckdb</code>.</li>
<li><code>duckdb</code> can’t read parquet from<code>io.BytesIO</code>
<ul>
<li>–&gt; fall-back to… <code>pandas</code> 🤦‍♂️.</li>
<li>Luckily we can quickly call <code>pl.from_pandas</code> to run <code>polars</code>!</li>
</ul></li>
</ul></li>
</ul>
<p>I hope to add more formats moving forwards, such as <code>ndjson</code> and <code>xlsx</code>.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-03-02-marimo-spreadsheets/image.png" class="img-fluid figure-img"></p>
<figcaption>Convert (src: parquet-file) that automatically infer available targets</figcaption>
</figure>
</div>
</section>
<section id="appexplore" class="level2">
<h2 class="anchored" data-anchor-id="appexplore">App:Explore</h2>
<p>Data Exploration - a important part and initial step when working with data of any type.<br>
When exploring your dataset it’s good to have a streamlined way of working. Marimo has some excellent tooling to quickly structure your data. <strong>I’ve added all these tools in my simple WASM app</strong>, things include like:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-03-02-marimo-spreadsheets/image-1.png" class="img-fluid figure-img"></p>
<figcaption><strong>DataFrame with Statistics in the header</strong></figcaption>
</figure>
</div>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-03-02-marimo-spreadsheets/image-2.png" class="img-fluid figure-img"></p>
<figcaption><strong>“Click” Plotting - select X,Y, …</strong></figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-03-02-marimo-spreadsheets/image-3.png" class="img-fluid figure-img"></p>
<figcaption>Another example</figcaption>
</figure>
</div>
</div>
</div>
</div>
<p>These tools combine into a quite neat exploration app. If you run this notebook locally you can easily hit up the <em>code cells</em> and modify the DataFrames manually and keep utilizing the nifty UI features such as statistics in DataFrame columns or visualization.</p>
<p>All in all this is a simple quick-starter, I think this app can be helpful for those who wants to explore their data, advaned or simple.</p>
</section>
<section id="result" class="level2">
<h2 class="anchored" data-anchor-id="result">Result</h2>
<div class="callout callout-style-simple callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>WASM Spreadsheet Explorer/Converter (CSV, JSON &amp; Parquet) App
</div>
</div>
<div class="callout-body-container callout-body">
<p><a href="../../pages/app_spreadsheet.qmd">Also available as stand-alone</a></p>

<iframe width="100%" height="720" src="../../assets/wasm/spreadsheet/app/index.html"></iframe>
</div>
</div>
</section>
<section id="outro" class="level1">
<h1>Outro</h1>
<p>I’ll keep adding more WASM apps with time. I love Pyodide.</p>
<p>Combining WASM with Marimo, or <code>stlite</code>, feels like such a natural fit.<br>
Marimo to me combines the perfection of Notebook Exploration with Apps, hence I opted for Marimo now.</p>
<p>Moving on I’ll add more in-depth blogs about Marimo and why it’s awesome, embedding WASM snippets and more.</p>
<p>Thanks for this time,<br>
Hampus Londögård</p>


</section>

 ]]></description>
  <category>python</category>
  <category>wasm</category>
  <category>app</category>
  <guid>https://blog.londogard.com/posts/2025-03-02-marimo-spreadsheets/</guid>
  <pubDate>Sun, 02 Mar 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>TIL: Programatically Fetch Python Class/File Dependencies</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-02-27-python-dependency-collector/</link>
  <description><![CDATA[ 





<section id="importcollector" class="level1">
<h1>ImportCollector</h1>
<p>It’s simple and requires 0 dependencies outside of the standard library.</p>
<p>This script will recursively traverse the dependencies of a <code>Class</code> or <em>python-script</em> and find all relevant dependencies from your local project.<br>
It’s useful in multiple types of project, such as (remote) Machine Learning training jobs and serverless deployments, where you don’t want to include irrelevant files.</p>
<p><script src="https://gist.github.com/Lundez/097753678fe475a6bc30ca31f4624536.js"></script></p>
</section>
<section id="why-i-built-this" class="level1">
<h1>Why I built this</h1>
<p>When I was deploying/building a <code>MLFlow Model</code> I found that their <code>infer_code_paths</code> functionality is bugged, as shared in <a href="https://github.com/mlflow/mlflow/issues/14071">mlflow/issues/14071</a> and my <a href="../../posts/2025-02-25-mlflow-model">blog about MLFlow Models</a>, and that I needed something better to really recursively fetch dependencies.</p>
<p>I found that through my nifty little script I could do this better than <code>mlflow</code> themselves. By updating the <code>load_context</code> function we could infer the <code>modules</code> by importing them, assisting <code>mlflow</code>’s <code>infer_code_paths</code> function.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> load_context(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, context):</span>
<span id="cb1-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># MLFlow bug where parent class is not added to `infer_code_paths`.</span></span>
<span id="cb1-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># https://github.com/mlflow/mlflow/issues/14071</span></span>
<span id="cb1-4">    imports <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_dependencies(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>))</span>
<span id="cb1-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> module <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> imports.modules:</span>
<span id="cb1-6">        importlib.import_module(module)</span></code></pre></div></div>
<p>This is it for this time,<br>
Hampus Londögård</p>


</section>

 ]]></description>
  <category>python</category>
  <category>dependencies</category>
  <guid>https://blog.londogard.com/posts/2025-02-27-python-dependency-collector/</guid>
  <pubDate>Thu, 27 Feb 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>MLFlow Models: Self-Contained ML Models with MLFlow</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-02-25-mlflow-model/</link>
  <description><![CDATA[ 





<section id="mlflow-models" class="level1">
<h1>MLFlow Models</h1>
<p>MLFlow is a popular tool to track your experiment to compare metrics, parameters and much more. It helps streamlining your job as a <em>data scientist</em> and <em>machine learning engineers</em>. Their <em>MLFlow Models</em> is a sub-project that helps making deployments smooth and integrates with their <em>Model Registry</em> that has versioned and tagged models which ties together with MLFlow Experiments.</p>
<p>All in all the MLFlow Models project helps building self-contained models in a streamlined fashion that integrates very well in the MLFlow ecosystem.</p>
<section id="mlflow-model" class="level2">
<h2 class="anchored" data-anchor-id="mlflow-model">MLFlow Model</h2>
<p>MLFlow Models is MLFlows “self-contained model” that can automatically build a Docker Container and run <em>inference</em> through the built-in <code>mlflow serve</code> command.<br>
It’s an interesting concept that’s not “phenomenal” or “innovating” but helps streamlining our lives, just like the “bread and butter” MLFlow Experiments. I love projects that make the average persons life easier. Advanced user, like I’d call myself, might find it “blocking” but the <code>PythonModel</code> concept I explain later is likely helpful for anyone out there!</p>
<p>MLFlow really hits that sweet point of keeping things simple and not going too far, except perhaps in the current LLM tracing which feels like the shot-gun methodology.</p>
<section id="why" class="level3">
<h3 class="anchored" data-anchor-id="why">Why?</h3>
<p>Keeping it short:</p>
<ol type="1">
<li>A <strong>“self-contained model”</strong> with all the code files and dependencies in a simple package</li>
<li><strong>Natively Integrated in MLFlow</strong> which is one of the biggest “MLOps” systems</li>
<li>All the MLFLow goodies enabled, such as <code>ModelRegistry</code>, <code>model-evaluation</code>, and <code>auto-Apache Spark UDF</code>.</li>
</ol>
</section>
<section id="flavours" class="level3">
<h3 class="anchored" data-anchor-id="flavours">Flavours</h3>
<p>MLFlow Model automatically support multiple formats: <em>Keras</em>, <em>PyTorch</em>, <em>scikit-learn</em>, and many more (<a href="https://mlflow.org/docs/latest/models.html#built-in-model-flavors">full list</a>).<br>
More interestingly they really support <em>ANY</em> model through their <code>PytonModel</code> which is what I opt to use.</p>
<section id="why-pythonmodel" class="level4">
<h4 class="anchored" data-anchor-id="why-pythonmodel">Why PythonModel</h4>
<p><code>PythonModel</code> allows you to get a streamlined format that supports custom models, including <em>Preprocessing</em> and <em>Postprocessing</em>. Quite excellent!</p>
<p>To keep it simple you define a <code>PythonModel</code> as follows:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">class</span> MyModel(mlflow.pyfunc.PythonModel):</span>
<span id="cb1-2">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> predict(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, context, model_input: np.ndarray, params: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>):</span>
<span id="cb1-3">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># model_input can also be pd.DataFrame, dict[str, np.ndarray], ...</span></span>
<span id="cb1-4">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> model_input</span></code></pre></div></div>
<p>There’s additionally a <code>load_context</code> method which lets you write how to load your model and other things. It’s run when “booting up”.</p>
<p>To log and load a model:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> mlflow</span>
<span id="cb2-2"></span>
<span id="cb2-3">model_path <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"my_model.py"</span></span>
<span id="cb2-4"></span>
<span id="cb2-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> mlflow.start_run():</span>
<span id="cb2-6">    model_info <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mlflow.pyfunc.log_model(</span>
<span id="cb2-7">        python_model<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>MyModel(),</span>
<span id="cb2-8">        artifact_path<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"my_model"</span>,</span>
<span id="cb2-9">    )</span>
<span id="cb2-10"></span>
<span id="cb2-11">my_model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mlflow.pyfunc.load_model(model_info.model_uri)</span></code></pre></div></div>
</section>
<section id="a-bugged-infer_code_paths" class="level4">
<h4 class="anchored" data-anchor-id="a-bugged-infer_code_paths">A bugged <code>infer_code_paths</code></h4>
<p>If you find, like me, that <code>infer_code_paths</code> don’t work well see my fix in this <a href="../../posts/2025-02-27-python-dependency-collector">blog-post</a>.</p>
<p>This problem seems to be very common if you use sub-classing or have custom dependencies that are called outside <code>load_context</code>, but my simple script helps you out!</p>
</section>
</section>
<section id="docker-containerization" class="level3">
<h3 class="anchored" data-anchor-id="docker-containerization">Docker Containerization</h3>
<p>It’s easily containerized calling the CLI or Python:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">mlflow</span> models build-docker <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-m</span> runs:/<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>run_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>/model <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-n</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>image_name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> --enable-mlserver</span></code></pre></div></div>
<p>and</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> mlflow</span>
<span id="cb4-2"></span>
<span id="cb4-3">mlflow.models.build_docker(</span>
<span id="cb4-4">    model_uri<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"runs:/</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>run_id<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">/model"</span>,</span>
<span id="cb4-5">    name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;image_name&gt;"</span>,</span>
<span id="cb4-6">    enable_mlserver<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb4-7">)</span></code></pre></div></div>
<p>Smooth! Obviously this might not be an optimal image, but it’ll be sufficient and it’s very easy for people to build <em>good enough</em> images. All in all a helpful feature!</p>
</section>
</section>
</section>
<section id="outro" class="level1">
<h1>Outro</h1>
<p>MLFlow Models provide a simple way to deploy models in a self-contained way.</p>
<p>I hope you try out MLFlow Models as they could end up helping you a lot.</p>
<p>~Hampus Londögård</p>


</section>

 ]]></description>
  <category>python</category>
  <category>mlflow</category>
  <category>deployment</category>
  <category>machine-learning</category>
  <guid>https://blog.londogard.com/posts/2025-02-25-mlflow-model/</guid>
  <pubDate>Tue, 25 Feb 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Gradio Client - An intro</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-02-20-gradio-client/</link>
  <description><![CDATA[ 





<section id="gradio-client" class="level1">
<h1>Gradio Client</h1>
<p>Most people in the AI-sphere (Deep Learning, LLM’s) are aware of the <a href="https://www.gradio.app/">Gradio</a> project (now under the <a href="https://huggingface.co/">huggingface</a> umbrella).</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Gradio in short
</div>
</div>
<div class="callout-body-container callout-body">
<p>Gradio is a simple Machine Learning App framework that provides easy components and reactivity. See for yourself:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> gradio <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> gr</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> greet(name):</span>
<span id="cb1-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hello "</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"!"</span></span>
<span id="cb1-5"></span>
<span id="cb1-6">demo <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gr.Interface(fn<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>greet, inputs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, outputs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>)</span>
<span id="cb1-7">demo.launch()</span></code></pre></div></div>
<p>One drawback is that <em>input, output</em>, and <em>function</em> are separated which reduces type-safety and removes all intellisense. This is a big one for me and makes me highly prefer <code>streamlit</code> &amp; <code>solara</code>. Though I can’t deny all the greatness Gradio provides to the table.</p>
</div>
</div>
<p>Even though Gradio is well-known some people might’ve missed that they’ve added two new projects in their fast pace!</p>
<ol type="1">
<li><a href="https://www.gradio.app/guides/getting-started-with-the-python-client"><strong><code>gradio_client</code></strong></a>: A client that can call any Gradio Application (!)
<ol type="1">
<li>Gradio clients are easily deployed on HuggingFace, and there exist a ton of them. All are now accessible in a REST-like interface (including the compute)!</li>
</ol></li>
<li><a href="https://www.gradio.app/guides/gradio-lite"><strong><code>gradio_lite</code></strong></a>: A WASM version of Gradio.
<ol type="1">
<li>Very simple to embed a Gradio app inside a HTML file or JS app.</li>
<li>They’ve implemented smooth solutions for things like Multi File, PIP Requirements and more.</li>
<li>WASM brings <em>Serverless</em> deployment with <em>Low Latency</em> and high <em>Privacy</em>.</li>
</ol></li>
</ol>
<section id="how-do-i-use-gradio-client" class="level2">
<h2 class="anchored" data-anchor-id="how-do-i-use-gradio-client">How do I use Gradio Client?</h2>
<p><strong>Step 1: Connect to a Client</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> gradio_client <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Client, handle_file</span>
<span id="cb2-2"></span>
<span id="cb2-3">client <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Client(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"abidlabs/whisper"</span>)</span></code></pre></div></div>
<p><strong>Step 2: Predict</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">client.predict(</span>
<span id="cb3-2">    audio<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>handle_file(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"audio_sample.wav"</span>)</span>
<span id="cb3-3">)</span>
<span id="cb3-4"></span>
<span id="cb3-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This is a test of the whisper speech recognition model."</span> </span></code></pre></div></div>
<p>Easy right?!</p>
<p>If we have multiple boxes / steps in the Gradio App we can call each of the components. The client usage of any Gradio App is <strong>easily found at the bottom.</strong></p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://images.amplenote.com/627f2784-ef96-11ef-a1f4-4f6319174ddb/0c668288-7e63-4469-9d64-946abf665652.png" class="img-fluid figure-img"></p>
<figcaption>Gradio Client Button in Gradio Apps (to the left)</figcaption>
</figure>
</div>
<p>And that opens a new menu which shows how to use each box of the App.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://images.amplenote.com/627f2784-ef96-11ef-a1f4-4f6319174ddb/5dd37151-9c78-4e80-af48-94d9e78ca5a0.png" class="img-fluid figure-img"></p>
<figcaption>The API documentation and How-To-Guide after clicking</figcaption>
</figure>
</div>
</section>
<section id="available-clients" class="level2">
<h2 class="anchored" data-anchor-id="available-clients">Available Clients</h2>
<p>Anyone can run this as it supports <em>regular REST requests</em>! They supply a <code>curl</code> sample on how to query the App. But they’ve got a <em>Python and JS native client.</em></p>
</section>
<section id="possibilities" class="level2">
<h2 class="anchored" data-anchor-id="possibilities">Possibilities</h2>
<p>Gradio implemented the whole thing in a way where you don’t need to do anything at all. In turn we now get user-friendly App that is easily deployed with an <em>automatically included REST API</em>. It’s two birds with one stone!</p>
<p>Combining this “App+API” deployment with the free, or paid, <a href="https://huggingface.co/spaces">HuggingFace Spaces</a> creates a high-value package!</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>HuggingFace Spaces
</div>
</div>
<div class="callout-body-container callout-body">
<p>HuggingFace (HF) Spaces is as simple deployments get:</p>
<ul>
<li>Free compute<sup>1</sup>
<ul>
<li><em>ZeroGPU available using Gradio SDK</em></li>
</ul></li>
<li>Natively Support <em>streamlit, Gradio, static (webapp)</em></li>
<li>Support any app via Docker<sup>2</sup></li>
</ul>
<p>HF Spaces can deploy apps via <em>Client</em>, <em>git</em>, or <em>Drag n’ Drop</em>.<br>
You get a public pretty URL <code>huggingface.co/spaces/&lt;USER&gt;/&lt;APP_NAME&gt;</code> to share. All in all a great tool that puts the power into developers hands!</p>
</div>
</div>
</section>
</section>
<section id="outro" class="level1">
<h1>Outro</h1>
<p>This blog is really short and to the point. The Gradio Client is sweet and I really wanted to share the experience.<br>
The combination of App+API written at the same time is exciting, when enhancing it with HuggingFace Spaces it all becomes magical. And Gradio’s “auto-share” when running locally that can then provide an API too is not too shabby 😉</p>
<p>This is it for this time,<br>
Hampus Londögård</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Pay for GPU and/or better CPU↩︎</p></li>
<li id="fn2"><p>Multiple templates to get started fast↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>python</category>
  <category>app</category>
  <category>rest</category>
  <category>machine-learning</category>
  <guid>https://blog.londogard.com/posts/2025-02-20-gradio-client/</guid>
  <pubDate>Mon, 17 Feb 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Marimo - A new Notebook/App on the block!</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-02-17-marimo/</link>
  <description><![CDATA[ 





<section id="marimo" class="level1 page-columns page-full">
<h1>Marimo</h1>
<p>Marimo is the “new” kid on the block. Based on what Marimo tries to achieve you can’t help yourself but comparing it too other frameworks such as <em>Gradio, Jupyter, Streamlit, Solara &amp; Panel</em>.</p>
<div class="callout callout-style-simple callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>A multitude of options
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>The fact that there’s a plethora of options</strong> to build WASM apps/tools and “<em>literate programming”</em> through notebook-style <strong>is nothing short of amazing</strong><br>
We’re in for a great time!</p>
<p>Drawback? What do I choose!</p>
</div>
</div>
<p>I’ll put a little focus on comparing, especially their WASM usage via <a href="https://pyodide.org/">pyodide</a>, where I the first time wrote about <a href="https://stlite.net/">stlite</a> in my <a href="https://blog.londogard.com/posts/2024-02-22-stlite/">blog</a> (Feb, 2024). Since I discovered WASM deployments via <em>base64-URLs</em> and <em>standalone-HTML-file</em> I was amazed at the opportunity to deploy simple-to-use tools for your colleagues.</p>
<p>Marimo is a reactive notebook with built-in UI components that can be turned into an app easily. It tries to become great at additional battles, such as WASM.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-2-contents" aria-controls="callout-2" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>My Final Thoughts
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-2" class="callout-2-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p><em>Marimo does great, it handles App use-case, Notebook and WASM phenomenally.</em> The UI components are sleek and combines in a smooth way. The one draw-back? You’ll have to rewire your brain a bit with <em>reactivity</em> rather than sequential execution!</p>
<p>Interested to learn more? Read on!</p>
</div>
</div>
</div>
<section id="usage" class="level2">
<h2 class="anchored" data-anchor-id="usage">Usage</h2>
<p>I’ll start by sharing <strong>my current go-to tool(s)</strong> for each area of use, and then try to fit Marimo into this.</p>
<ul>
<li><strong>Heavy Applications</strong>: <a href="https://solara.dev/">Solara</a>
<ul>
<li>Pros: ‘React’ style of programming, very efficient bindings and updates</li>
<li>Cons: Not the most modern UI,</li>
</ul></li>
<li><strong>Simple Applications</strong>: <a href="https://streamlit.io/">Streamlit</a>
<ul>
<li>Pros: Simple, Modern UI, Large Community</li>
<li>Cons: The “execute everything on each change” execution is quite inefficient and with caching reasoning grows harder with time</li>
</ul></li>
<li><strong>WASM Apps</strong>: Streamlit via stlite</li>
<li><strong>ML Demos with API</strong>: Gradio (and Streamlit)
<ul>
<li>Pros: Simple, provides <code>gradio_client</code> REST API by default (AMAZING)</li>
<li>Cons: I hate building Gradio apps</li>
</ul></li>
<li><strong>Notebook</strong>: Jupyter</li>
</ul>
<p><em>Marimo could possibly replace most in the list</em>, but especially WASM Apps and Notebook. Potentially it can take on ML Demos and Simple Applications too, and why not Heavy?</p>
<p>Marimo is perhaps too bold at trying to achieve it all, let’s dive into it!</p>
</section>
<section id="quick-subjective-rankings" class="level2">
<h2 class="anchored" data-anchor-id="quick-subjective-rankings">Quick (Subjective) Rankings</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 12%">
<col style="width: 23%">
<col style="width: 46%">
<col style="width: 17%">
</colgroup>
<tbody>
<tr class="odd">
<td><strong>Action</strong></td>
<td>🥇</td>
<td>🥈</td>
<td>🥉</td>
</tr>
<tr class="even">
<td>WASM</td>
<td>stlite, Marimo</td>
<td>py.cafe (streamlit &amp; solara), gradio</td>
<td>jupyter lite</td>
</tr>
<tr class="odd">
<td>Notebook</td>
<td>Jupyter &amp; Marimo</td>
<td>Streamlit</td>
<td>Solara, gradio</td>
</tr>
<tr class="even">
<td>App</td>
<td>Streamlit &amp; Solara</td>
<td>Marimo, gradio</td>
<td>Jupyter</td>
</tr>
</tbody>
</table>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-3-contents" aria-controls="callout-3" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>WASM Details on each tool
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-3" class="callout-3-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<ul>
<li><code>stlite</code> can be deploy as single HTML file, but depends on network then (fetching CDN resources).</li>
<li><code>marimo</code> can be deployed as a HTML folder that needs to be server, but requires no network.</li>
<li><code>stlite</code>, <code>marimo</code>, and <code>py.cafe</code> all enable “base64-url-apps”, i.e.&nbsp;you can have a single URL that contains the full application and can run on their webpage!</li>
<li><code>jupyterlite</code> is really good as a tool, but the share-ability is awful.</li>
<li><code>gradio</code> works decently in WASM, but it has a big pro which is its API when running the real app. The - <code>gradio_client</code> is an amazing initiative.</li>
</ul>
<p>All in all I’m amazed regarding the tools that are available to run sandboxed in your browser (all based on <a href="https://pyodide.org/en/stable/">pyodide</a>). We’re programming in a really cool part of history!</p>
</div>
</div>
</div>
<p>If we put scores on each (3,2,1 for 1st, 2nd, and 3rd) we end up with the following:</p>
<ol type="1">
<li><p><strong>Marimo &amp; Streamlit:</strong> <strong>8pts</strong></p></li>
<li><p><strong>Solara:</strong> 6pts</p></li>
<li><p><strong>Jupyter &amp; Gradio:</strong> 5pts</p></li>
</ol>
<p>It seems Marimo ends up covering all needs quite well based on my initial research.<br>
Streamlit ends up in the top because of its strong community, and <code>stlite</code> really helps the WASM story-line.</p>
<p>But how do you actually use Marimo, <strong><em>and can it beat Streamlit by having a smarter execution system?</em></strong></p>
</section>
<section id="marimo-intro" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="marimo-intro">Marimo Intro</h2>
<p>Marimo has two execution “environments”, Python and WASM.</p>
<section id="python" class="level3">
<h3 class="anchored" data-anchor-id="python">Python</h3>
<p>This is essentially like running a Jupyter Notebook in your local python environment. It starts a marimo kernel that handles your execution:</p>
<pre><code>marimo edit # open marimo editor and app</code></pre>
</section>
<section id="wasm" class="level3 page-columns page-full">
<h3 class="anchored" data-anchor-id="wasm">WASM</h3>
<p>Running in a sandboxed Python (pyodide) environment <em>inside</em> your browser! 🤯</p>
<p>I’m amazed at how easy you can share “tools” with internal teams today using WASM.</p>
<ol type="1">
<li>A single HTML file without serving needs (á la <code>stlite</code>, utilizing CDN assets)</li>
<li>A single URL that contains the code using base64-encoded string in the URL (á la <code>stlite</code>, <code>py.cafe</code>, and <code>marimo</code>)</li>
<li>A stand-alone web app (folder with HTML file and assets) that you serve</li>
</ol>
<p>It’s such an easy way to deploy tools, and everything is sandboxed inside the browser. No need to go through IT security or have an deployment done - an amazing feat!</p>

<div class="no-row-height column-margin column-container"><div class="">
<p><strong>Resource Comparison</strong></p>
<table class="caption-top table">
<tbody>
<tr class="odd">
<td>WASM App</td>
<td>RAM</td>
</tr>
<tr class="even">
<td>Marimo</td>
<td><strong>400 MB</strong></td>
</tr>
<tr class="odd">
<td>stlite</td>
<td>600 MB</td>
</tr>
<tr class="even">
<td>jupyterlite</td>
<td>&gt;1GB</td>
</tr>
<tr class="odd">
<td>gradio_lite</td>
<td>522 MB</td>
</tr>
<tr class="even">
<td>pyodide (via <a href="https://pydantic.run/">pydantic.run</a> no UI or dependencies)</td>
<td><strong>200 MB</strong></td>
</tr>
</tbody>
</table>
</div></div><section id="marimo-wasm" class="level4">
<h4 class="anchored" data-anchor-id="marimo-wasm">Marimo WASM</h4>
<p>Marimo solves WASM quite brilliantly. Their built-in package handler makes it a breeze to add dependencies.<br>
The app looks just the same, compared to say <code>gradio</code> that degrades quite a lot with <code>gradio_lite</code>.</p>
<p>Finally as shared in the margin the resources used by Marimo is in the lower span compared to other similar apps.</p>
</section>
</section>
</section>
</section>
<section id="marimo-1" class="level1">
<h1>Marimo</h1>
<p>Marimo is easy, you simply define and use a component like following:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># cell 1</span></span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> marimo <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> mo</span>
<span id="cb2-3">slider <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mo.ui.slider(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb2-4"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Select your step: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>slider<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb2-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">---</span></span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># cell 2</span></span>
<span id="cb2-7"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"You've selected </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>slider<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> which doubled is </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>slider<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>value <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span></code></pre></div></div>
<p>Here we defined a slider component, we display it using markdown and in our second cell it’s neatly displayed and updated automatically because of reactivity!<br>
Bonus? You can swap the order of the cells and the code will still be valid, because of said reactivity. This is also what enforces the <em>reproducibility</em>. The code follows a DAG based on the variables.<br>
Drawback? You can’t update a variable outside the cell that defines it.</p>
<section id="a-basic-app-example" class="level2">
<h2 class="anchored" data-anchor-id="a-basic-app-example">A basic App Example</h2>
<p>I’ll share examples from <a href="https://docs.marimo.io/#a-reactive-programming-environment">docs.marimo.io</a> which is a great resource to get started.</p>
<p><strong>UI Components:</strong><br>
<video src="https://docs.marimo.io/_static/readme-ui.mp4" class="img-fluid" controls=""><a href="https://docs.marimo.io/_static/readme-ui.mp4">Marimo Slider &amp; Reactivity</a></video></p>
<p><strong>DataFrame Explorer:</strong><br>
<video src="https://docs.marimo.io/_static/docs-df.mp4" class="img-fluid" controls=""><a href="https://docs.marimo.io/_static/docs-df.mp4">Marimo DataFrame Explorer - I love it!</a></video><br>
<strong>SQL Mixin</strong><br>
<img src="https://raw.githubusercontent.com/marimo-team/marimo/main/docs/_static/readme-sql-cell.png" class="img-fluid" alt="Mixing DataFrame’s and SQL in Marimo"></p>
<p><strong>Plotting Callbacks</strong><br>
<video src="https://cms.marimo.io/landing/3.mp4" class="img-fluid" controls=""><a href="https://cms.marimo.io/landing/3.mp4">Embedding Selection Callback</a></video></p>
<p><strong>Standouts:</strong></p>
<ol type="1">
<li>Deterministic Execution Order (annoying but helpful)
<ol type="1">
<li>It’s a tad bit confusing to have it possible to have cells in random order, but <strong>at least it’s reproducible</strong> compared to Jupyter!</li>
</ol></li>
<li>Built-in Package Management (especially handy for WASM)</li>
<li>Pretty elements for a notebook (comparing with ipywidgets..)
<ol type="1">
<li>What’s even cooler is that you can easily combine UI components in a markdown string. Making a seamless flow!</li>
</ol></li>
</ol>
</section>
<section id="marimo-editor" class="level2">
<h2 class="anchored" data-anchor-id="marimo-editor">Marimo Editor</h2>
<p>While the editor is excellent I found it quite poor in picking up local project files outside the script itself for auto-completion. This is where <a href="https://docs.marimo.io/guides/editor_features/watching/">watching</a> helps:</p>
<pre><code>marimo edit --watch </code></pre>
<p><br>
Which enables editing in your local IDE and watch changes in the browser. I think this is a nice balance where you can opt to edit directly in IDE or in browser depending on your current need. But VS Code / PyCharm’s built in Notebooks are unbeatable in the User Experience (i.e.&nbsp;Autocompletion + Visualization)! 🤓</p>
<p><strong>If Marimo could pick up IntelliSense from the IDE that’d be a great improvement!</strong> Marimo handles “project” IntelliSense especially poor.</p>
</section>
<section id="marimo-gotchas" class="level2">
<h2 class="anchored" data-anchor-id="marimo-gotchas">Marimo Gotchas</h2>
<p>There’s a few things one need to think about when developing a Marimo app/notebook.</p>
<section id="reactive-execution" class="level3">
<h3 class="anchored" data-anchor-id="reactive-execution">Reactive Execution</h3>
<p>The reactive nature of Marimo makes it reproducible, but building Apps with reactive execution makes it simple to accidentally “trigger” actions when you didn’t anticipate to. Especially as you might be used to Jupyter and not having “auto-run”.</p>
<p><strong>Fixes:</strong></p>
<ol type="1">
<li>Set expensive/dangerous actions behind a button (define it in a function)</li>
<li>Apply <code>mo.stop</code> to stop execution.</li>
<li>Disable cell (<a href="https://docs.marimo.io/guides/reactivity/#disabling-cells">example</a>)</li>
<li>Make “lazy execution” of cell or all cells. This will neatly gray cells that are out-of-sync</li>
</ol>
<p><img src="https://images.amplenote.com/384cdeec-d88b-11ef-b6ab-a76117c9f257/ec4f09a0-44df-433b-a0b1-f49b1c295418.png" class="img-fluid"></p>
<p><img src="https://images.amplenote.com/384cdeec-d88b-11ef-b6ab-a76117c9f257/08977551-7517-40c8-bf57-829cf7167b9c.png" class="img-fluid"></p>
</section>
<section id="ui-vs-root-namespace" class="level3">
<h3 class="anchored" data-anchor-id="ui-vs-root-namespace">UI vs root namespace</h3>
<p>Marimo mixes <code>mo.ui.\*</code> and <code>mo.\*</code> namespace for different things.</p>
<p><code>mo.ui.\*</code> includes reactive UI components, e.g.&nbsp;button &amp; slider, while <code>mo.\*</code> includes display UI components such as image or video.</p>
<p>This is quite confusing and I think the namespacing issue is a larger one than one might anticipate, as you tend to get lost on where to find what you wish to draw.</p>
<p>What’s cool though is that, just like Jupyter, Marimo tries to auto-display element using nice visualization.</p>
</section>
<section id="only-final-element-is-visible" class="level3">
<h3 class="anchored" data-anchor-id="only-final-element-is-visible">Only final element is visible</h3>
<p>Only the final component added is actually displayed, in my opinion all <code>mo.ui</code> components should be displayed if they’re added. It’d make more sense.</p>
<p>One can wrap elements inside a markdown text, accordion or other type of “display multiple elements”.</p>
</section>
</section>
</section>
<section id="outro" class="level1">
<h1>Outro</h1>
<p>I think Marimo all in all does really well, there’s a few sharp edges to resolve but I might replace Jupyter really soon with this. The DataFrame Explorer - Amazing. The callbacks for charts and more - Superb!</p>
<p>It’s like a harder-to-reason but better Streamlit if that makes sense? With more components it’ll be golden!</p>
<p>Thanks for this time,<br>
Hampus Londögård</p>


</section>

 ]]></description>
  <category>python</category>
  <category>app</category>
  <category>notebook</category>
  <guid>https://blog.londogard.com/posts/2025-02-17-marimo/</guid>
  <pubDate>Mon, 17 Feb 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Image to Lego (Xmas Project)</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2025-01-09-img-2-lego/</link>
  <description><![CDATA[ 





<p>Merry Christmas, Happy Holidays and a Good New Year! 🥳</p>
<blockquote class="blockquote">
<p>Disclaimer: I’m not affiliated with LEGO and this is a personal project.</p>
</blockquote>
<p>Ever wished you could turn a photo into a LEGO masterpiece? This holiday season, while my son napped, I did!<br>
I’ve been passionately working on what might be my most entertaining project since I built my own Baby Monitor (<a href="../../posts/2022-11-06-babymonitor-pt-1">1</a>, <a href="../../posts/2023-02-06-baby-monitor-pt-p2">2</a>): <strong>a tool that transforms any image into a buildable 3D LEGO model, complete with layer-by-layer instructions, basically like a real LEGO set!</strong></p>
<p>The idea had been brewing in my mind for a few years, but a recent request from my friend Oscar J finally gave me the motivation to bring it to life!</p>
<section id="result-demo" class="level1">
<h1>Result / Demo</h1>
<p>I built an Gradio application, mainly as a learning excercise. Usually I prefer Streamlit or Solara for my applications, but Gradio is growing fast and I wanted to try it in a “real project”.</p>
<div class="quarto-layout-panel" data-layout="[[2,1], [1,1]]">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 66.7%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-01-09-img-2-lego/build_img.png" class="img-fluid figure-img"></p>
<figcaption>Visualization of the Image to LEGO Flow</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-01-09-img-2-lego/lego_build_instruct.gif" class="img-fluid figure-img"></p>
<figcaption>LEGO Build Instruction GIF</figcaption>
</figure>
</div>
</div>
</div>
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-01-09-img-2-lego/lego_bricks_1.png" class="img-fluid figure-img"></p>
<figcaption>Visualization of LEGO Bricks (notice how it’s bricks and not Voxels)</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2025-01-09-img-2-lego/lego_bricks_2.png" class="img-fluid figure-img"></p>
<figcaption>Visualization of LEGO Bricks (notice how it’s bricks and not Voxels)</figcaption>
</figure>
</div>
</div>
</div>
</div>
<div class="callout callout-style-simple callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Complete walkthrough: From image input to 3D LEGO model with instructions
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-1" class="callout-1-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<div class="quarto-video"><video id="video_shortcode_videojs_video1" class="video-js vjs-default-skin vjs-fluid" controls="" preload="auto" data-setup="{}" title=""><source src="full_flow.mp4"></video></div>
</div>
</div>
</div>
<p>Not to brag, but the results are quite magical in my opinion.<br>
Intrigued? Keep reading to learn how I built this image-to-LEGO generator!</p>
</section>
<section id="project-layout" class="level1">
<h1>Project Layout</h1>
<p>My first step was to research existing solutions. While there are tools for image-to-3D model conversion and voxelization, I couldn’t find anything specifically to build LEGO models with accurate, buildable bricks and “instructions”. This meant I had to build most of the pipeline from scratch. Here’s the steps I identified:</p>
<ol type="1">
<li><strong>Turn image into 3D object (Deep Learning):</strong> Use a deep learning model to generate a 3D mesh from a single 2D image.</li>
<li><strong>Voxelize Mesh (Algorithmic):</strong> Convert the 3D mesh into a voxelized representation, essentially breaking it down into cubes.</li>
<li><strong>Colorize Voxels with LEGO approved colors (Algorithmic):</strong> Map the colors of the voxels to the closest matching colors from a predefined LEGO color palette.</li>
<li><strong>Merge Voxels into LEGO sized bricks (Algorithmic):</strong> Combine adjacent voxels into standard LEGO brick shapes, optimizing for larger bricks while preserving the overall shape.
<ol type="a">
<li>This involves finding the right balance between using larger, more efficient bricks and accurately representing the details of the original model.</li>
</ol></li>
</ol>
<p>This list might make it sound easy, but it really is an interesting and challenging problem!</p>
<section id="image-to-mesh-from-2d-photo-to-3d-model" class="level2">
<h2 class="anchored" data-anchor-id="image-to-mesh-from-2d-photo-to-3d-model">Image to Mesh: From 2D Photo to 3D Model</h2>
<p>Getting from a single 2D image to a 3D model isn’t exactly a walk in the park. Traditionally, people have done this by algorithmically stitching together a bunch of photos taken from different angles, creating a 3D point cloud.</p>
<p>That multi-image approach might work for professionals, but for your average LEGO builder? I don’t think so.<br>
It’s a hassle to take all those photos, and getting them to stitch together correctly? Not fun.</p>
<p>I wanted a smoother experience, one where you can just use a <strong>single image</strong> to get your 3D model, and that’s why I went with deep learning. Basically, you train a model on tons of images and their corresponding 3D models. Then, when you give it a new image, it can make educated guesses about the 3D shape, even the parts you can’t see. We’re shifting the heavy lifting of data collection from the user during the model’s use to the training phase, which makes things much easier. For this project I’m using a two-step process:</p>
<ol type="1">
<li><strong>Generate Multi-View Images:</strong> A diffusion model takes the single input image and generates multiple views of the object as if it were photographed from different angles.</li>
<li><strong>Reconstruct 3D Mesh from Multi-View Images:</strong> Another deep learning model takes these generated views and creates a coherent 3D mesh, essentially filling in the gaps between the different perspectives.</li>
</ol>
<p>It’s not perfect, and the model might struggle with unusual objects or bad lighting. But, it’s a lot better than manually stitching images, and it gets us closer to that one-click LEGO dream. Now, with a 3D mesh in hand, we can move on to the next steps!</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>The perfect dataset easily available
</div>
</div>
<div class="callout-body-container callout-body">
<blockquote class="blockquote">
<p>A <em>sweet bonus</em> is that we can build our dataset inside Game Engines very easily! Combining real data and synthetic data built in a Game Engine is perfect and makes it easy to have a <em>great</em> dataset.</p>
</blockquote>
</div>
</div>
<section id="steps-for-a-deep-learning-approach" class="level3">
<h3 class="anchored" data-anchor-id="steps-for-a-deep-learning-approach">Steps for a Deep Learning approach</h3>
<p>How do we actually turn a image into a mesh through Deep Learning? The idea is quite straight-forward and people without knowledge should understand what is happening under-the-hood, though not mathematically.</p>
<section id="step-1-turn-image-into-multi-angle-images" class="level4">
<h4 class="anchored" data-anchor-id="step-1-turn-image-into-multi-angle-images">Step 1: Turn image into multi-angle images</h4>
<p>To get around the need for multiple input images, the first step in my approach is to generate them artificially. We use a Diffusion Model for this. This model is able to look at your single image and create new images of the same object, but from different angles, by generalizing from a large diverse dataset that it has trained on.</p>
</section>
<section id="step-2-generate-mesh-model-from-multi-angle-images" class="level4">
<h4 class="anchored" data-anchor-id="step-2-generate-mesh-model-from-multi-angle-images">Step 2: Generate mesh (model) from multi-angle images</h4>
<p>With the generated multi-angle, e.g.&nbsp;6, images we stitch/generate a mesh structure to work with through another Deep Learning Model! The model at hands needs to fill the sparse data into a 3D mesh. This can be done in multiple ways depending on how you’d like to define the resulting mesh.</p>
<p>Once we have our 3D mesh we can head to the next step, i.e.&nbsp;voxelation.</p>
</section>
<section id="step-3-mesh-to-voxels" class="level4">
<h4 class="anchored" data-anchor-id="step-3-mesh-to-voxels">Step 3: Mesh to Voxels</h4>
<p>This is a “solved” problem and it’s easy to voxelize using <em>trimesh</em>, a python library. Trimesh delivers the voxels, or cubes, in grey-scale and we need to add colors ourselves.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> voxelize(mesh_path: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> Path, resolution: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>):</span>
<span id="cb1-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Voxelize the mesh based on a resolution parameter. Resolution is how many bricks it should contain, i.e. 16 creates a 16x16 base plate."""</span></span>
<span id="cb1-3">    mesh <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> trimesh.load(mesh_path)</span>
<span id="cb1-4">    bounds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mesh.bounds</span>
<span id="cb1-5">    voxel_size <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (bounds[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> bounds[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> resolution  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># pitch</span></span>
<span id="cb1-6">    voxels <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mesh.voxelized(pitch<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>voxel_size)</span></code></pre></div></div>
</section>
<section id="step-4-colorize-voxels" class="level4">
<h4 class="anchored" data-anchor-id="step-4-colorize-voxels">Step 4: Colorize Voxels</h4>
<p>With the voxels in place, we need to color them. We start by assigning each voxel the ‘true’ color from the original mesh, found by identifying the nearest point on the mesh’s surface to the voxel’s center. That is the color won’t be an “approved” LEGO color for now.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> tree_knearest_colors(mesh, voxels):</span>
<span id="cb2-2">    tree <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cKDTree(mesh.vertices)</span>
<span id="cb2-3">    _, vertex_indices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tree.query(voxels.points, k<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb2-4"></span>
<span id="cb2-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> mesh.visual.vertex_colors[vertex_indices]</span></code></pre></div></div>
<p>The <code>cKDTree</code> is an <em>approximate nearest neighbour</em>, which means it’s fast and good enough.</p>
</section>
<section id="step-5-quantize-colors-into-lego-palette" class="level4">
<h4 class="anchored" data-anchor-id="step-5-quantize-colors-into-lego-palette">Step 5: Quantize Colors into LEGO palette</h4>
<p>As mentioned our colors is based on the 3D model rather than the LEGO palette, we need to transform them into LEGO palette to be able to order have LEGO bricks.<br>
This task could be done in multiple ways, I opted for the simple Euclidean distance of the RGB array. A better approach would be to utilize <a href="https://en.wikipedia.org/wiki/CIELAB_color_space">LAB</a> color-space as that’s more similar to what humans perceive. LAB conversion isn’t as smooth as I had hoped (possible via <code>colormath</code>), and I wanted to wrap up my application, hence the RGB Euclidean Distance.</p>
<blockquote class="blockquote">
<p>I did try LAB and didn’t see a better result, but I didn’t put too much effort into it and there might be a better way to make it work.</p>
</blockquote>
</section>
<section id="step-6-merge-voxels-into-lego-bricks" class="level4">
<h4 class="anchored" data-anchor-id="step-6-merge-voxels-into-lego-bricks">Step 6: Merge Voxels into LEGO Bricks</h4>
<p>Our final step is to go from voxels (1x1 bricks) into bigger LEGO bricks. No sane person would enjoy building a LEGO with only 1x1 bricks! 😂</p>
<p>My current approach is dead simple, greedy starting from largest area bricks and iteratively trying smaller sizes. I’m utilizing a vectorized approach rather than the traitional “graph-based” one, e.g.&nbsp;Depth-First-Search (DFS).</p>
<p>Our problem has some constraints which means that it fits vectorized approaches very well. The constraints to merge voxels follows:</p>
<ol type="1">
<li>Same color</li>
<li>Same z-level</li>
</ol>
<p>The code is quite simple when treating it as a vectorized problem and utilizing numpy indexing.</p>
<ol type="1">
<li>Treat each Z-level as a matrix</li>
<li>Treat each color as a number inside the matrix</li>
<li>Apply equality</li>
<li>Validate if we can fit our brick, say 2x6, if <code>np.all()</code> then it’s true
<ol type="a">
<li>Only apply this validation starting from an existing coordinate of the voxels</li>
</ol></li>
<li>Place brick when a brick found, iteratively moving down in size</li>
</ol>
</section>
</section>
</section>
</section>
<section id="future-work" class="level1">
<h1>Future Work</h1>
<p>There’s still a lot of work to improve the resulting lego model. The biggest flaws are that there’s too many small bricks and that the colors are not great…</p>
<section id="improve-colors" class="level2">
<h2 class="anchored" data-anchor-id="improve-colors">Improve colors</h2>
<p>The first problem that would solve a lot is to have better colors.</p>
<p>With better colors we’d be able to merge into bigger bricks, as we’re not allowed to put different colors into the same brick. The end-goal would be to reduce A) Shadows and B) Gradients, simplifying the model into something of lower color resolution.</p>
<p>I have a hard time seeing that I can solve this fully through algorithms (already having tried LAB color space) and believe that I’d need to train a new Deep/Machine Learning model to achieve better results. That model would be able to better opt for which color to use and potentially even reduce weird colors that don’t fit the greater picture. Additionally it could have higher resolution for small important things like eyes.</p>
<p>That is potential fixes:</p>
<ul>
<li>Apply ML to predict fewer colors</li>
<li>Apply some type of kernel that remove gradients but keep edges</li>
<li>…and probably something else not in my mind right now!</li>
</ul>
</section>
<section id="bigger-bricks" class="level2">
<h2 class="anchored" data-anchor-id="bigger-bricks">Bigger bricks</h2>
<p><strong>First fix: Better Colors</strong><br>
See previous section.</p>
<p><strong>Second fix: Filling Empty Space</strong><br>
Our 3D model is currently empty in the middle, this void could be filled to enable larger bricks. Sometimes we fill with 2 1x1 bricks rather than using a larger brick because we’re not allowed to place it in the empty void.</p>
<p>This task is most likely simple to solve algorithmically, but some “voids” we wouldn’t want to fill. It should only be filled if hidden from most angles, i.e.&nbsp;we wouldn’t want to fill the void inside a leg of the Eifel Tower as that’s supposed to be a empty void rather than filled.</p>
</section>
</section>
<section id="outro" class="level1">
<h1>Outro</h1>
<p>This was (and is) a very exciting project that I’ll keep on working on whenever I get a short burst of time. There’s some important improvements to add before the project is truly <em>User Friendly™</em>.</p>
<p>The research in what type of approaches are available was a fun one and to implement the algorithms brought me back to my time at Apple, but with the knowledge gained from working in a more mathematical setting later, i.e.&nbsp;applying vectorization.</p>
<p>Finally the result is impressive and something I can use already today. I can’t wait to order bricks for my first LEGO build!</p>
<p>Thanks for this time,<br>
Hampus Londögård</p>


</section>

 ]]></description>
  <category>python</category>
  <category>project</category>
  <category>packaging</category>
  <guid>https://blog.londogard.com/posts/2025-01-09-img-2-lego/</guid>
  <pubDate>Thu, 09 Jan 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Pixi - Real World Usage</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2024-12-17-pixi-real-usage/</link>
  <description><![CDATA[ 





<p>Managing dependencies and environments across multiple platforms can be a nightmare. That’s why I was thrilled to discover <a href="pixi.sh/dev/">Pixi</a>. I’ve previous talked about Pixi on LinkedIn/Twitter, but haven’t used it in any “serious” project until recently and so far it has worked exceptional!</p>
<p>Imagine a tool that combines the speed and efficiency of <a href="https://github.com/astral-sh/uv"><em>uv</em></a> with the robust package management of <a href="https://github.com/mamba-org/mamba"><em>mamba</em></a>. That’s Pixi in a nutshell. Built from the expertise drawn from as the mamba creators and utilizing <code>uv</code> for PyPi dependenciess, Pixi offers a streamlined, powerful way to manage Python environments. Compared to <em>mamba</em>, <em>pixi</em> takes things one step further as their PyPi-dependencies are tested with conda on top of the additional tools brought by pixi, such as <em>tasks</em>.</p>
<p>Cherry on top? Pixi is lightning fast and enables multi-platform &amp; multi-environment inside a single file where everything is synced together.</p>
<blockquote class="blockquote">
<p>Multi-platform, multi-environment means that we can sync dependencies between osx-arm64, linux-64, CUDA, CPU, … - a standout feature!</p>
</blockquote>
<section id="pixi-docker-builds" class="level2">
<h2 class="anchored" data-anchor-id="pixi-docker-builds">Pixi Docker Builds</h2>
<p>After solving your local environment in a easy yet producible manner the next step is to solve it for your cloud workloads - <em>containerization</em>.</p>
<p>Containerization is an important part of a developers toolkit in the modern world. To run cloud workloads it’s very common to deploy as a container, in Data Science this is for everything like Training, Inference and Data Pipelines.</p>
<p>With pixi it’s quite straight-forward and they provide ready-to-use images through the <a href="https://github.com/prefix-dev/pixi-docker">pixi-docker</a> registry. There’s multiple base-images, including CUDA, to get started - it can’t be any simpler!</p>
<section id="pixi-sample-docker-builds" class="level3">
<h3 class="anchored" data-anchor-id="pixi-sample-docker-builds">Pixi Sample Docker Builds</h3>
<p>Simple starter:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">docker</span> pull ghcr.io/prefix-dev/pixi:latest</span></code></pre></div></div>
<p>Find the different tags on <a href="https://github.com/prefix-dev/pixi-docker?tab=readme-ov-file#pulling-the-images">Pixi Docker tags page</a>.</p>
<p>Efficient Production build by using Docker Multi-Stage Build: <a href="https://github.com/prefix-dev/pixi-docker?tab=readme-ov-file#usage-with-shell-hook">prefix-docker/shell-hook</a>.</p>
</section>
<section id="pixi-docker-build-on-aws-sagemaker" class="level3">
<h3 class="anchored" data-anchor-id="pixi-docker-build-on-aws-sagemaker">Pixi Docker Build on AWS Sagemaker</h3>
<p>Sagemaker can be quite challenging to work with. While deploying custom Docker builds is easiest using their own base image, this image is often bloated with unnecessary dependencies. Additionally, to run <code>@remote</code> jobs on AWS, you need to include a <code>conda</code> or <code>mamba</code> environment - something that <code>pixi</code> doesn’t inherently use.</p>
<p><strong>So, how do we integrate Pixi with Sagemaker?</strong></p>
<p>Here’s a workaround to make them play nicely together:</p>
<ol type="1">
<li><strong>Include <code>micromamba</code>:</strong> Add <code>micromamba</code> (available on <code>conda-forge</code>) as a dependency in your <code>pixi.toml</code>. This will allow us to create a conda-like environment within our Pixi setup.
<ul>
<li>In the future this could be done using a simple shell script, which is a planned improvement in my own projects.</li>
</ul></li>
<li><strong>Add <code>micromamba</code> to <code>$PATH</code>:</strong> Ensure that the <code>micromamba</code> executable installed by Pixi is added to your system’s <code>$PATH</code>. This will make it accessible to Sagemaker.</li>
<li><strong>Set Environment Variables:</strong> Configure necessary environment variables like <code>CONDA_PREFIX</code> to point to the appropriate location where <code>micromamba</code> will manage your environment.</li>
</ol>
<p>With these steps, you’re ready to run your Pixi-managed projects on Sagemaker!</p>
<p>In my experiments, this approach significantly reduced the size of my CUDA images from around 12 GB down to 4.5 GB - a massive improvement in terms of storage and deployment speed!</p>
</section>
</section>
<section id="pixi-multi-platformenvironment" class="level2">
<h2 class="anchored" data-anchor-id="pixi-multi-platformenvironment">Pixi Multi-Platform/Environment</h2>
<p>One of Pixi’s standout features is its seamless support for multi-platform and multi-environment projects. While I initially planned to delve deeper into this, prefix.dev recently published an excellent guide on the topic. I highly recommend checking out their documentation on <a href="https://pixi.sh/dev/features/pytorch/#mixing-macos-and-cuda-with-pypi-dependencies">combining different OS’s and environments (CPU, CUDA) with PyTorch</a> for a comprehensive overview.</p>
<section id="some-personal-comments" class="level3">
<h3 class="anchored" data-anchor-id="some-personal-comments">Some Personal Comments</h3>
<p>Personally I find this part of pixi one of the biggest strengths, especially how easy it is to work with! To build a docker image you simply follow the basic example above, opting for <code>--feature=cuda</code>.<br>
The part of keeping lock-files on everything, while allowing certain OS:es missing out on dependencies makes it very practical in real-world scenarios!</p>
</section>
</section>
<section id="pixi-build-slimmming" class="level2">
<h2 class="anchored" data-anchor-id="pixi-build-slimmming">Pixi Build Slimmming</h2>
<p>When containerizing your code, it’s crucial to keep builds slim. Here are a few tricks to help you minimize your Pixi-based Docker images:</p>
<ol type="1">
<li><strong>Leverage <code>.dockerignore</code>:</strong> Create a <code>.dockerignore</code> file to exclude unnecessary files and directories (e.g., <code>.git</code>, <code>__pycache__</code>, tests) from your Docker build context.</li>
<li><strong>Optimize Dependencies:</strong>
<ul>
<li>Carefully consider each dependency and remove any that are not strictly required for production.</li>
<li>Utilize multiple environments within your <code>pixi.toml</code>, e.g.&nbsp;<code>prod</code> and <code>dev</code> environments. This allows you to exclude dev-specific dependencies (test, lint, ..) from your production container.</li>
</ul></li>
<li><strong>Employ Multi-Stage Docker Builds:</strong> Multi-stage builds reduces the image size. Use a build stage to install dependencies and compile your application, and then copy only the necessary artifacts to a smaller, leaner final image. The <code>pixi-docker</code> project provides guidance on using <a href="https://github.com/prefix-dev/pixi-docker?tab=readme-ov-file#usage-with-shell-hook">multi-stage builds with shell-hook</a>.</li>
</ol>
</section>
<section id="pixi-vs-uv" class="level1">
<h1>Pixi vs uv</h1>
<p>While <em>uv</em> has gained significant traction in the Python community, I believe <em>Pixi</em> offers a more compelling solution for my specific needs, especially when it comes to complex, real-world projects.</p>
<p>Why?</p>
<ol type="1">
<li><strong><code>tasks</code> are awesome.</strong> They might not be perfect but they’re great to me!</li>
<li><strong>Multi-platform and Multi-environment projects</strong> (personal opinion) somehow ends up easier in Pixi
<ul>
<li>I really tried to embrace the <code>uv</code> approach as I appreciate it as more lightweight. But Pixi is somehow “smoother”.</li>
</ul></li>
<li><strong>Pixi has base-images with CUDA</strong>
<ul>
<li>Both tools are easy to build from a raw base-image too, so it’s not a huge problem</li>
</ul></li>
<li><strong>Access to <code>conda</code> packages</strong>
<ul>
<li>Some hate it, but I like getting pre-built binaries.</li>
<li>It’s quite interesting to install shell tools via <code>conda</code> for container deployment.</li>
</ul></li>
<li><strong>Possible to work with other languages than Python</strong></li>
</ol>
<section id="what-is-the-one-big-uv-pro" class="level3">
<h3 class="anchored" data-anchor-id="what-is-the-one-big-uv-pro">What is the one big <code>uv</code> pro?</h3>
<p><a href="https://docs.astral.sh/uv/guides/scripts/#declaring-script-dependencies">UV’s Inline Script Dependencies</a></p>
<p>I think this feature is really cool, <del>but as pixi utilize <code>uv</code> you can use it in <code>pixi</code> too! ;)</del> but it’s quite easy to replicate in <code>pixi</code> as well (including with <code>uv</code>)! ;)</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>How to run in ‘inline script’ in Pixi
</div>
</div>
<div class="callout-body-container callout-body">
<p>Simply call <code>pixi exec uv run a.py</code>. See the <a href="https://pixi.sh/latest/reference/cli/#exec">docs (cli/#exec)</a> where you’re able to also run shell-scripts with a <a href="https://pixi.sh/latest/advanced/shebang/">shebang</a>. This will actually install <code>uv</code> in a temporary env, and then use that <code>uv</code>.</p>
<p>A bonus of <code>exec</code> is that if you instead use the “pixi-native” <del><code>--scope</code></del> <code>--spec</code> it supports conda too, e.g.&nbsp;<code>pixi exec -s polars -s altair python</code> to run a temporary python venv with <code>polars</code> &amp; <code>altair</code>.</p>
<p><strong>Edit:</strong> Added this callout 2025-03-07 and updated 2025-03-10 based on feedback from <em>markusschlenker</em>.<br>
<strong>Edit 2:</strong> I want to add that you can also easily add <code>uv</code> as a global tool (like installing it), through <code>pixi global install uv</code>.</p>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># /// script</span></span>
<span id="cb2-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># dependencies = [</span></span>
<span id="cb2-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   "requests&lt;3",</span></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   "rich",</span></span>
<span id="cb2-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ]</span></span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ///</span></span>
<span id="cb2-7"></span>
<span id="cb2-8"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> requests</span>
<span id="cb2-9"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rich.pretty <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pprint</span>
<span id="cb2-10"></span>
<span id="cb2-11">resp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> requests.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://peps.python.org/api/peps.json"</span>)</span>
<span id="cb2-12">data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> resp.json()</span>
<span id="cb2-13">pprint([(k, v[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"title"</span>]) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k, v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> data.items()][:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>])</span></code></pre></div></div>
</section>
</section>
<section id="outro" class="level1">
<h1>Outro</h1>
<p>If you’re a Python developer struggling with dependency management, environment inconsistencies, or cumbersome container builds, I urge you to give Pixi a try. It’s a powerful tool that has the potential to streamline your workflow and make you a happier developer. Pixi has certainly made a significant difference in mine!</p>
<p>Thanks for this time, Hampus Londögård</p>


</section>

 ]]></description>
  <category>python</category>
  <category>dependencies</category>
  <category>packaging</category>
  <guid>https://blog.londogard.com/posts/2024-12-17-pixi-real-usage/</guid>
  <pubDate>Tue, 17 Dec 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Data Loading - Comparing Common Tooling</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2024-12-03-data-loading-comparison/</link>
  <description><![CDATA[ 





<p>This blog was supposed to be more in-depth but my enthusiasm was drastically cut and I felt like splitting it up into multiple smaller one, whereas <a href="../../posts/2024-10-24-data-loading-daft/">daft one</a> is already uploaded.</p>
<blockquote class="blockquote">
<p>I started writing a “recipe-book” for <code>daft</code> where I realized it wasn’t as smoothly integrated as a lot of other tools. I believe that the <code>DataFrame</code> format is both a winning and loosing concept, it’s very helpful but when you need to use two columns the way <code>Ray</code>, <code>HuggingFace Datasets</code> and others map data using <code>dict</code> is a winning concept for both <em>element by element</em> and <em>batch</em> mapping. With a <code>dict</code> way to map <code>DataFrame</code> I think that <code>daft</code> might end up the perfect tool.</p>
<p>For now I believe Daft is better utilized as an ETL framework, but in the near future it might become great for ML too.</p>
</blockquote>
<p>Anyhow, today I’ll compare the developer experience and performance of different tools for data loading.</p>
<ol type="1">
<li><a href="https://huggingface.co/docs/datasets/index">HuggingFace Datasets</a></li>
<li><a href="https://docs.ray.io/en/latest/data/data.html">Ray Data</a></li>
<li><a href="https://getdaft.io/">Daft</a></li>
<li><a href="https://pytorch.org/">“PyTorch Native” (Dataset &amp; DataLoader)</a></li>
</ol>
<p>All of the chosen tools are quite awesome, but HuggingFace and Ray can export to TensorFlow additionally. Although Ray currently cannot handle <code>RaggedTensor</code> which is required for models with variable output - a letdown!</p>
<section id="quick-introduction" class="level2">
<h2 class="anchored" data-anchor-id="quick-introduction">Quick Introduction</h2>
<p><strong>Hugging Face Datasets</strong> offers easy access to a vast library of datasets, with efficient memory handling through streaming and memory-mapping. Its API simplifies data loading and transformation for direct use with PyTorch and TensorFlow.</p>
<p><strong>Ray Data</strong> enables scalable, distributed data processing across multiple nodes, ideal for large datasets. It integrates with Ray’s ML tools for parallel training and distributed transformations. It’s the tool for Large Language Model training, even embraced by OpenAI in their ChatGPT <a href="https://thenewstack.io/how-ray-a-distributed-ai-framework-helps-power-chatgpt/">source</a>.</p>
<p><strong>Daft</strong> is a high-performance data processing library with lazy evaluation, optimized for structured data formats like Parquet and Arrow. It’s a strong choice for single-node and multi-node data preparation with PyTorch compatibility. It utilizes Ray to achieve multi-node behavior.</p>
<p><strong>PyTorch’s Dataset and DataLoader</strong> offer a simple and flexible way to load data with minimal memory overhead, ideal for in-memory and custom datasets. It’s lightweight but lacks distributed and lazy loading features.</p>
<table class="caption-top table">
<caption>Table Summarization</caption>
<colgroup>
<col style="width: 28%">
<col style="width: 24%">
<col style="width: 9%">
<col style="width: 4%">
<col style="width: 32%">
</colgroup>
<thead>
<tr class="header">
<th>Feature</th>
<th>Hugging Face Datasets</th>
<th>Ray Data</th>
<th>Daft</th>
<th>PyTorch Dataset + DataLoader</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Parallel Processing</td>
<td>+</td>
<td>+++</td>
<td>++</td>
<td>+</td>
</tr>
<tr class="even">
<td>Distributed Processing</td>
<td>0</td>
<td>+++</td>
<td>+++</td>
<td>0</td>
</tr>
<tr class="odd">
<td>Caching &amp; Memory Mapping</td>
<td>+++</td>
<td>+</td>
<td>+</td>
<td>0</td>
</tr>
<tr class="even">
<td>Lazy Loading</td>
<td>+++</td>
<td>++</td>
<td>+++</td>
<td>+++ (depends)</td>
</tr>
<tr class="odd">
<td>Simple to Use</td>
<td>+++</td>
<td>+</td>
<td>++</td>
<td>+++</td>
</tr>
<tr class="even">
<td>Built-in Dataset Access</td>
<td>+++</td>
<td>0</td>
<td>0</td>
<td>+++</td>
</tr>
<tr class="odd">
<td>Custom Transformations</td>
<td>++</td>
<td>+++</td>
<td>+++</td>
<td>+++</td>
</tr>
<tr class="even">
<td>ML Framework Support</td>
<td>+++</td>
<td>+++</td>
<td>++</td>
<td>++</td>
</tr>
</tbody>
</table>
</section>
<section id="mini-benchmark" class="level2">
<h2 class="anchored" data-anchor-id="mini-benchmark">Mini Benchmark</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 32%">
<col style="width: 13%">
<col style="width: 13%">
<col style="width: 6%">
<col style="width: 20%">
<col style="width: 12%">
</colgroup>
<thead>
<tr class="header">
<th>Tool</th>
<th>Num_worker</th>
<th>Pin_memory</th>
<th>Cache</th>
<th>Configuration</th>
<th>Time</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>HF Element</strong></td>
<td>None</td>
<td>None</td>
<td>False</td>
<td>.map</td>
<td>6m48s</td>
</tr>
<tr class="even">
<td></td>
<td>None</td>
<td>None</td>
<td>True</td>
<td>.with_transform</td>
<td><strong>3m23s</strong></td>
</tr>
<tr class="odd">
<td><strong>HF Batched</strong></td>
<td>None</td>
<td>None</td>
<td>False</td>
<td>.map</td>
<td>7m14s</td>
</tr>
<tr class="even">
<td></td>
<td>None</td>
<td>None</td>
<td>True</td>
<td>.map</td>
<td><strong>3m22s</strong></td>
</tr>
<tr class="odd">
<td><strong>Torch Dataset/Loader</strong></td>
<td>None</td>
<td>None</td>
<td>-</td>
<td>Default</td>
<td><strong>3m20s</strong></td>
</tr>
<tr class="even">
<td><strong>Daft</strong></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>daft-default</td>
<td>14m55s</td>
</tr>
<tr class="odd">
<td></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>daft-native</td>
<td><strong>3m30s</strong></td>
</tr>
<tr class="even">
<td><strong>Ray</strong></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>Default</td>
<td>7m41s</td>
</tr>
</tbody>
</table>
<p>Running on full sized images we get a bit more interesting results:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 6%">
<col style="width: 17%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th>Tool</th>
<th>Num_worker</th>
<th>Pin_memory</th>
<th>Cache</th>
<th>Configuration</th>
<th>Time</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Additional Tests</strong></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>torch</td>
<td>4m19s</td>
</tr>
<tr class="even">
<td></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>hf_with_transf</td>
<td>4m40s</td>
</tr>
<tr class="odd">
<td></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>hf_map</td>
<td>8m14s, cached: 7m21s</td>
</tr>
<tr class="even">
<td></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>daft</td>
<td><strong>3m49s</strong></td>
</tr>
</tbody>
</table>
</section>
<section id="developer-experience-dx" class="level2">
<h2 class="anchored" data-anchor-id="developer-experience-dx">Developer Experience (DX)</h2>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">HuggingFace Datasets</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">PyTorch “native”</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-3" aria-controls="tabset-1-3" aria-selected="false" href="">Daft</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-4" aria-controls="tabset-1-4" aria-selected="false" href="">Ray</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> test_elem_by_elem(num_workers: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, pin_memory: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">bool</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, cache: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">bool</span>):</span>
<span id="cb1-2">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> cache:</span>
<span id="cb1-3">        datasets.disable_caching()</span>
<span id="cb1-4"></span>
<span id="cb1-5">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> _preprocess(data: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>):</span>
<span id="cb1-6">        imgs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [utils.PREPROCESS_TRANSFORMS(x.convert(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RGB"</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>]]</span>
<span id="cb1-7">        data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> imgs</span>
<span id="cb1-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> data</span>
<span id="cb1-9"></span>
<span id="cb1-10">    ds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> datasets.load_from_disk(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"./imagenette_full_size"</span>)</span>
<span id="cb1-11"></span>
<span id="cb1-12">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> _augment(data: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>):</span>
<span id="cb1-13">        tensor <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _preprocess(data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>])</span>
<span id="cb1-14">        data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.AUGMENTATIONS(tensor)</span>
<span id="cb1-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> data</span>
<span id="cb1-16"></span>
<span id="cb1-17">    ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train"</span>].with_transform(_augment)</span>
<span id="cb1-18">    ds_valid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"validation"</span>].with_transform(_preprocess)</span>
<span id="cb1-19"></span>
<span id="cb1-20">    kwargs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(</span>
<span id="cb1-21">        num_workers<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>num_workers <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,</span>
<span id="cb1-22">        persistent_workers<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">bool</span>(num_workers),</span>
<span id="cb1-23">        pin_memory<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>pin_memory,</span>
<span id="cb1-24">        batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>,</span>
<span id="cb1-25">    )</span>
<span id="cb1-26">    dls_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.utils.data.DataLoader(ds_train, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>kwargs)</span>
<span id="cb1-27">    dls_valid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.utils.data.DataLoader(ds_valid, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>kwargs)</span>
<span id="cb1-28">    </span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">class</span> ImagenetteDataset(Dataset):</span>
<span id="cb2-2">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">__init__</span>(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, hf_dataset, preprocess<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, augment<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>):</span>
<span id="cb2-3">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.hf_dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> hf_dataset</span>
<span id="cb2-4">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.preprocess <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> preprocess</span>
<span id="cb2-5">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.augment <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> augment</span>
<span id="cb2-6"></span>
<span id="cb2-7">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">__len__</span>(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>):</span>
<span id="cb2-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.hf_dataset)</span>
<span id="cb2-9"></span>
<span id="cb2-10">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">__getitem__</span>(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, idx):</span>
<span id="cb2-11">        data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.hf_dataset[idx]</span>
<span id="cb2-12"></span>
<span id="cb2-13">        image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>].convert(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RGB"</span>)</span>
<span id="cb2-14"></span>
<span id="cb2-15">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Apply preprocessing and augmentation if specified</span></span>
<span id="cb2-16">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.preprocess:</span>
<span id="cb2-17">            image: torch.Tensor <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.preprocess(image)</span>
<span id="cb2-18">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.augment:</span>
<span id="cb2-19">            image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.augment(image)</span>
<span id="cb2-20"></span>
<span id="cb2-21">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> {</span>
<span id="cb2-22">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>: image,</span>
<span id="cb2-23">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"label"</span>: data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"label"</span>],</span>
<span id="cb2-24">        }</span>
<span id="cb2-25">train_dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ImagenetteDataset(</span>
<span id="cb2-26">    ds[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train"</span>], preprocess<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>utils.PREPROCESS_TRANSFORMS</span>
<span id="cb2-27">)</span>
<span id="cb2-28"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(train_dataset))</span>
<span id="cb2-29">valid_dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ImagenetteDataset(</span>
<span id="cb2-30">    ds[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"validation"</span>], preprocess<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>utils.PREPROCESS_TRANSFORMS</span>
<span id="cb2-31">)</span>
<span id="cb2-32"></span>
<span id="cb2-33"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create DataLoader instances</span></span>
<span id="cb2-34">kwargs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(</span>
<span id="cb2-35">    num_workers<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>num_workers <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,</span>
<span id="cb2-36">    persistent_workers<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">bool</span>(num_workers),</span>
<span id="cb2-37">    pin_memory<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>pin_memory,</span>
<span id="cb2-38">    batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>,</span>
<span id="cb2-39">)</span>
<span id="cb2-40">dls_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DataLoader(train_dataset, shuffle<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>kwargs)</span>
<span id="cb2-41">dls_valid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DataLoader(valid_dataset, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>kwargs)</span></code></pre></div></div>
</div>
<div id="tabset-1-3" class="tab-pane" aria-labelledby="tabset-1-3-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> load_imagenette_datasets_daft(dataset_path<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"./imagenette_full_size"</span>):</span>
<span id="cb3-2">    ds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> datasets.load_from_disk(dataset_path)</span>
<span id="cb3-3">    extract_img_bytes <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daft.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>).struct.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bytes"</span>).alias(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>)</span>
<span id="cb3-4">    ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daft.from_arrow(ds[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train"</span>].data.table).select(</span>
<span id="cb3-5">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"label"</span>, extract_img_bytes</span>
<span id="cb3-6">    )</span>
<span id="cb3-7"></span>
<span id="cb3-8">    ds_valid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daft.from_arrow(ds[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"validation"</span>].data.table).select(</span>
<span id="cb3-9">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"label"</span>, extract_img_bytes</span>
<span id="cb3-10">    )</span>
<span id="cb3-11"></span>
<span id="cb3-12">    img_decode_resize <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb3-13">        daft.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>).image.decode(mode<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RGB"</span>).image.resize(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">224</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">224</span>)</span>
<span id="cb3-14">    )</span>
<span id="cb3-15"></span>
<span id="cb3-16">    ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_train.with_column(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>, img_decode_resize)</span>
<span id="cb3-17">    ds_valid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_valid.with_column(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>, img_decode_resize)</span>
<span id="cb3-18"></span>
<span id="cb3-19">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> to_f32_tensor(ds: daft.DataFrame):</span>
<span id="cb3-20">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> ds.with_column(</span>
<span id="cb3-21">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>,</span>
<span id="cb3-22">            daft.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">apply</span>(</span>
<span id="cb3-23">                <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x: (x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">255.0</span>).transpose(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb3-24">                return_dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>daft.DataType.tensor(daft.DataType.float32()),</span>
<span id="cb3-25">            ),</span>
<span id="cb3-26">        )</span>
<span id="cb3-27"></span>
<span id="cb3-28">    ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> to_f32_tensor(ds_train)</span>
<span id="cb3-29">    ds_valid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> to_f32_tensor(ds_valid)</span>
<span id="cb3-30"></span>
<span id="cb3-31">    ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_train.to_torch_iter_dataset()</span>
<span id="cb3-32">    ds_valid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_valid.to_torch_iter_dataset()</span>
<span id="cb3-33"></span>
<span id="cb3-34">    dls_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.utils.data.DataLoader(ds_train, batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>)</span>
<span id="cb3-35">    dls_valid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.utils.data.DataLoader(ds_valid, batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>)</span>
<span id="cb3-36">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> dls_train, dls_valid</span></code></pre></div></div>
</div>
<div id="tabset-1-4" class="tab-pane" aria-labelledby="tabset-1-4-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> load_imagenette_datasets_ray(dataset_path<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"./imagenette_full_size"</span>):</span>
<span id="cb4-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load the Arrow dataset with Ray</span></span>
<span id="cb4-3">    ds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> datasets.load_from_disk(dataset_path)</span>
<span id="cb4-4"></span>
<span id="cb4-5">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> extract_img_to_pil(data):</span>
<span id="cb4-6">        image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bytes"</span>]</span>
<span id="cb4-7">        data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> PIL.Image.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(io.BytesIO(image)).convert(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RGB"</span>)</span>
<span id="cb4-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> data</span>
<span id="cb4-9"></span>
<span id="cb4-10">    ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ray.data.from_huggingface(ds[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train"</span>]).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">map</span>(extract_img_to_pil)</span>
<span id="cb4-11">    ds_val <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ray.data.from_huggingface(ds[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"validation"</span>]).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">map</span>(extract_img_to_pil)</span>
<span id="cb4-12"></span>
<span id="cb4-13">    preprocess_transforms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> transforms.Compose(</span>
<span id="cb4-14">        [</span>
<span id="cb4-15">            utils.PREPROCESS_TRANSFORMS,</span>
<span id="cb4-16">        ]</span>
<span id="cb4-17">    )</span>
<span id="cb4-18">    augmentation_transforms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.AUGMENTATIONS</span>
<span id="cb4-19"></span>
<span id="cb4-20">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Apply transformations in Ray</span></span>
<span id="cb4-21">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> preprocess_image(batch):</span>
<span id="cb4-22">        batch[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [preprocess_transforms(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> batch[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>]]</span>
<span id="cb4-23"></span>
<span id="cb4-24">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> batch</span>
<span id="cb4-25"></span>
<span id="cb4-26">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> augment_image(elem):</span>
<span id="cb4-27">        elem[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> augmentation_transforms(elem[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>])</span>
<span id="cb4-28"></span>
<span id="cb4-29">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> batch</span>
<span id="cb4-30"></span>
<span id="cb4-31">    ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_train.map_batches(preprocess_image).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">map</span>(augment_image)</span>
<span id="cb4-32"></span>
<span id="cb4-33">    ds_val <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_val.map_batches(preprocess_image)</span>
<span id="cb4-34"></span>
<span id="cb4-35">    d_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_train.to_torch(</span>
<span id="cb4-36">        label_column<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"label"</span>,</span>
<span id="cb4-37">        batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>,</span>
<span id="cb4-38">        local_shuffle_buffer_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">512</span>,</span>
<span id="cb4-39">        prefetch_batches<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,</span>
<span id="cb4-40">    )</span>
<span id="cb4-41">    d_valid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_val.to_torch(label_column<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"label"</span>, batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>, prefetch_batches<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb4-42"></span>
<span id="cb4-43">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> d_train, d_valid</span></code></pre></div></div>
</div>
</div>
</div>
<p>I think most of the frameworks ends up at a similar place in the experience.</p>
<p><strong>Quick DX Ranking</strong></p>
<ol type="1">
<li>PyTorch &amp; HuggingFace Datasets</li>
<li>Daft</li>
<li>Ray (albeit I believe it to be the most scalable solution as you can truly tinker in detail)</li>
</ol>
<p>I enjoyed <em>Daft</em> a lot with its multi-modal syntax, inspired by polars with namespaces (e.g.&nbsp;<code>.image.decode()</code>), which can be phenomenal. Working with DataFrame’s is a cool addition, where you can drop into python simply by using <code>apply</code>.<br>
Working with <em>Daft</em> more and more I noticed that the DataFrame syntax sometimes becomes a big blocker and the simplicity of HF Datasets and Ray using <code>dict</code>’s in <code>.map</code> statements results in easier code and smoother integration with existing libraries.<br>
Additionally HF Datasets / PyTorch DataLoaders feels more pythonic, where the latter is real simple. I can’t put my finger on it but they just seem easier to debug and understand.</p>
<p>It’ll sure be interesting to follow the progress being made, and I’m happy the dust isn’t settled yet!</p>


</section>

 ]]></description>
  <category>data</category>
  <category>loading</category>
  <guid>https://blog.londogard.com/posts/2024-12-03-data-loading-comparison/</guid>
  <pubDate>Tue, 03 Dec 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Data Loading - Daft</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2024-10-24-data-loading-daft/</link>
  <description><![CDATA[ 





<p>I know I’ve been praising <code>polars</code> a lot lately, and I’m still in love. <code>polars</code> will be my continued go-to library for Data Analysis of Tabular data, and when building ETL (data pipelines) in 99% of the cases.</p>
<p>However, when you work with Deep Learning and multi-modal data you need something to take the data from your Delta Lake, or wherever you store your data, and supply it to the model. That’s where tools like <code>daft</code> can shine.</p>
<p>These Data Loading and Processing steps need to be highly optimized to utilize the underlying compute optimally, not wasting $$ on unused GPU’s. The jobs should keep a high % utilization and not be bound by I/O or CPU. In other words: <em>you want to always have data ready when the GPU has time to process more data</em>.</p>
<p>There’s a lot of tools to do the job, I’ll go through a few of them in upcoming blogs with a simple yet common workload: <em>Image Classification</em>. Image Classification is simple but the data can quickly grow large to not fit in-memory anymore.</p>
<ol type="1">
<li>Can’t fit in-memory =&gt; I/O needs to be optimized</li>
<li>Expensive transforms &amp; augmentations =&gt; CPU needs to be optimized</li>
</ol>
<p>An even better example would be <em>Object Detection</em> as it has ragged (variable) dimensions, i.e.&nbsp;one image has 2 people and another has 1, but it’s a bit more complex data processing. I’ll include this in my next blog where I give “recipes” on how to use Daft.</p>
<p>Finally, this blog will be quite brief and not as in-depth as I hoped, but there’ll be more blogs coming later!</p>
<section id="daft" class="level1">
<h1>Daft</h1>
<p>Today I’ll introduce one of the newer alternatives in the field, <a href="https://getdaft.io/">daft</a>.</p>
<p>Daft is what you can only call a merger between <code>polars</code>, <code>spark</code> and Deep Learning. If they had been more inspired by <code>polars</code> in the Developer Experience (DX) I’d have called it a “lovechild”, but for now they don’t have the nice-to-haves like <code>pl.with_column(new_col_name=pl.col("other_col")*2)</code> named syntax and other things like <code>pl.col("col").replace(dict_to_replace)</code> and a lot of other things.</p>
<p>What <em>daft</em> does have is a <em>multi-modal</em> namespace, unlike <code>polars</code> which solely focuses on traditional data-types. This is <em>really</em> interesting albeit not that fleshed out yet. It’s enjoyable and has potential to grow!</p>
<p>Further, to quote <em>daft</em> themselves:</p>
<blockquote class="blockquote">
<p><em>Daft provides a snappy and delightful local interactive experience, but also seamlessly scales to petabyte-scale distributed workloads.</em></p>
</blockquote>
<p>The <em>petabyte-scale</em> comes from the fact that you can run <em>daft</em> on top of <em>Ray</em> which is a distributed framework that tries to take on Spark. It’s famously used at OpenAI while training their models.</p>
</section>
<section id="coding-with-daft" class="level1">
<h1>Coding with Daft</h1>
<p>Coding with <code>daft</code> is an experience. I only ran locally but it held up really well to “native” PyTorch, even surpassing it in one case!</p>
<p>I’ll share my experience and implementations below!</p>
<section id="reading-data" class="level2">
<h2 class="anchored" data-anchor-id="reading-data">Reading Data</h2>
<p>Like most modern projects <em>daft</em> includes a smooth integration to <em>Apache Arrow</em>.</p>
<blockquote class="blockquote">
<p>Apache Arrow is “The universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics”</p>
</blockquote>
<p>The Arrow integration gives <em>daft</em> multiple ways to read a dataset, and the dataset doesn’t even have to be in-memory because of the Arrow data structure which can easily be streamed via “memory-map-mode” (<code>mmap</code>).</p>
<p>To “read” an Arrow table you simply call <code>from_arrow</code>, as I do below reading a HuggingFace Datasets Arrow Table.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1">ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daft.from_arrow(ds[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train"</span>].data.table)</span></code></pre></div></div>
<p>To “read” other formats from disk you simply use <code>read_(delta|csv|...)</code>, as below.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daft.read_deltalake(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"some-table-uri"</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># read_(csv|parquet|json|...)</span></span></code></pre></div></div>
<p>Finally it has very tight integration with Ray, which is very neat when you wish to scale to Big Data.</p>
</section>
<section id="data-transforms---multi-modal-and-whatnot" class="level2">
<h2 class="anchored" data-anchor-id="data-transforms---multi-modal-and-whatnot">Data Transforms - multi-modal and whatnot</h2>
<p>To modify a DataFrame you work very similar to <code>polars</code>. There’s <code>Expression</code>’s which is a way to have a lazy non-evaluated expression, like a SQL query before you run it. I’ve spoken about <code>Expression</code>’s before and I really love them, they make code decoupling a lot easier and can simplify a query to something beautiful.</p>
<p>See my example of extracting image from a struct that has a field with bytes.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># expression: lazy non-executed method</span></span>
<span id="cb3-2">extract_img_bytes <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daft.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>).struct.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bytes"</span>).alias(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>)</span>
<span id="cb3-3"></span>
<span id="cb3-4">ds_train.select(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"label"</span>, extract_img_bytes)</span></code></pre></div></div>
<blockquote class="blockquote">
<p>Select column <code>label</code> and <code>image</code>, where <code>image</code> extracts <code>image.bytes</code> into <code>image</code>.</p>
</blockquote>
<p>From here I’d like to decode the image into something which we can work with, unlike bytes, and that’s easy using the multi-modal namespace (<code>.image</code>).</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">img_decode_resize <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> daft.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>).image.decode(mode<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RGB"</span>).image.resize(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">224</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">224</span>)</span>
<span id="cb4-2"></span>
<span id="cb4-3">ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_train.with_column(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>, img_decode_resize)</span></code></pre></div></div>
<blockquote class="blockquote">
<p>Transforms <code>image</code> by decoding it into <code>RGB</code> and then resizing to <code>224x224</code>.</p>
</blockquote>
<p>Quite cool right? There’s some great potential here!</p>
<p>How do we apply more complex operations? UDF’s! It’s just as easy as in <code>polars</code>, simply call <code>apply</code>.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> rescale_transpose(x: np.array):</span>
<span id="cb5-2">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> (x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">255.0</span>).transpose(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb5-3"></span>
<span id="cb5-4">ds_train.with_column(</span>
<span id="cb5-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>,</span>
<span id="cb5-6">    daft.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>).<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">apply</span>(</span>
<span id="cb5-7">        rescale_transpose,</span>
<span id="cb5-8">        return_dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>daft.DataType.tensor(daft.DataType.float32()),</span>
<span id="cb5-9">    ),</span>
<span id="cb5-10">)</span></code></pre></div></div>
<blockquote class="blockquote">
<p>Applying a custom transformation. Images are represented as <code>np.array</code> and you need to define <code>return_dtype</code>.</p>
</blockquote>
<p>With all this available we’re good to go for a Deep Learning training pipeline!</p>
</section>
<section id="producing-a-pytorch-dataset" class="level2">
<h2 class="anchored" data-anchor-id="producing-a-pytorch-dataset">Producing a PyTorch Dataset</h2>
<p>The final part of our pipeline is to move the data into <code>torch.Tensor</code>. There’s one big gotcha - don’t apply <code>num_workers</code> as <em>daft</em> already applies multi-thread/processing optimizations!</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">ds_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ds_train.to_torch_iter_dataset()</span>
<span id="cb6-2"></span>
<span id="cb6-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span><span class="al" style="color: #AD0000;
background-color: null;
font-style: inherit;">NOTE</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: don't apply num_workers even though PyTorch warns!</span></span>
<span id="cb6-4">dls_train <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.utils.data.DataLoader(ds_train, batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>)</span></code></pre></div></div>
<p>And that’s a wrap! We got all the steps to finalize the deal. How about a comparison?</p>
</section>
</section>
<section id="mini-benchmark" class="level1">
<h1>Mini Benchmark</h1>
<p>Comparing speeds with “native” PyTorch DataLoaders is interesting and shows that Daft is on-par in speed when using their new <em>native execution engine</em> (<em>swordfish</em>). When I increase image size, i.e.&nbsp;larger data to process, I see Daft even surpassing PyTorch DataLoaders (!).</p>
<p><strong>N.B.</strong> I’m running the full training from a HuggingFace Dataset backed by Arrow. It’s the same underlying data structure for all tests except “Folder File” one, but things might just be different if we start discussing file-loading (rather than from bytes) or even remote data.</p>
<section id="numbers" class="level2">
<h2 class="anchored" data-anchor-id="numbers">Numbers</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 13%">
<col style="width: 13%">
<col style="width: 6%">
<col style="width: 18%">
<col style="width: 13%">
</colgroup>
<thead>
<tr class="header">
<th>Tool</th>
<th>Num_worker</th>
<th>Pin_memory</th>
<th>Cache</th>
<th>Configuration</th>
<th>Time</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Torch Dataset/Loader</strong></td>
<td>None</td>
<td>None</td>
<td>-</td>
<td>Default</td>
<td><strong>3m20s</strong></td>
</tr>
<tr class="even">
<td></td>
<td>None</td>
<td>None</td>
<td>-</td>
<td>Default</td>
<td>3m26s</td>
</tr>
<tr class="odd">
<td></td>
<td>4</td>
<td>True</td>
<td>-</td>
<td>Default</td>
<td>4m9s</td>
</tr>
<tr class="even">
<td></td>
<td>2</td>
<td>True</td>
<td>-</td>
<td>Default</td>
<td>3m44s</td>
</tr>
<tr class="odd">
<td><strong>Daft</strong></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>daft-default</td>
<td><strong>14m55s</strong></td>
</tr>
<tr class="even">
<td></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>daft-native</td>
<td><strong>3m30s</strong></td>
</tr>
</tbody>
</table>
<p>Running on full sized images we get a bit more interesting results:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 39%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 6%">
<col style="width: 16%">
<col style="width: 11%">
</colgroup>
<thead>
<tr class="header">
<th>Tool</th>
<th>Num_worker</th>
<th>Pin_memory</th>
<th>Cache</th>
<th>Configuration</th>
<th>Time</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Full Size</strong></td>
<td>4</td>
<td>True</td>
<td>-</td>
<td>torch</td>
<td>4m19s</td>
</tr>
<tr class="even">
<td><strong>Full Size</strong></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>daft</td>
<td><strong>3m49s</strong></td>
</tr>
<tr class="odd">
<td><strong>Image Folder &amp; Files (160p)</strong></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>torch</td>
<td><strong>3m31s</strong></td>
</tr>
<tr class="even">
<td><strong>Image Folder &amp; Files (160p)</strong></td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>daft</td>
<td><strong>3m26s</strong></td>
</tr>
</tbody>
</table>
<p>To read a file locally using <em>daft</em> you simply do the same as you’d do with remote.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">df.with_column(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"image"</span>, daft.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"path"</span>).url.download())</span></code></pre></div></div>
</section>
</section>
<section id="remote-data" class="level1">
<h1>Remote data</h1>
<p>Working with remote data is a common and interesting use-case. I think based on this research that <em>daft</em> has a good chance of performing really well, as the local files also did great.</p>
</section>
<section id="final-thoughts" class="level1">
<h1>Final Thoughts</h1>
<p>Even if <em>daft</em> has a way to go for Deep Learning training it really holds great promise. If they make the export easier to PyTorch and perhaps add TensorFlow I believe it could grow into a valuable competitor to HuggingFace Datasets et. al.</p>
<p>As Ray is what drives OpenAI’s training I believe Daft stands on some really good scalable underlying tech and can perhaps be what joins Data Engineering and Data Science together as one, for real - a big leap forward!</p>
<p>Thanks for this time, Hampus</p>
<p><strong>Extra:</strong> all code is available on the git-repo for this blog, see <code>code/data_loading</code>.</p>


</section>

 ]]></description>
  <category>TIL</category>
  <category>daft</category>
  <category>data</category>
  <guid>https://blog.londogard.com/posts/2024-10-24-data-loading-daft/</guid>
  <pubDate>Tue, 19 Nov 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>ZenML or ClearML? Which MLOps tool strikes best?</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2024-05-05-zenml-vs-clearml/</link>
  <description><![CDATA[ 





<p>Making it as few words as possible.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th><strong>Tool</strong></th>
<th><strong>Pro</strong></th>
<th><strong>Con</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>ClearML</strong></td>
<td>Simple &amp; Everything “fits”</td>
<td>Locked into ClearML, i.e.&nbsp;cannot use best tool for the job</td>
</tr>
<tr class="even">
<td><strong>ZenML</strong></td>
<td>Composable &amp; Extendable</td>
<td>Multiple tools to get job done (e.g.&nbsp;MLFlow not visualized in UI)</td>
</tr>
</tbody>
</table>
<section id="similarities" class="level2">
<h2 class="anchored" data-anchor-id="similarities">Similarities:</h2>
<p>There’s a lot of similarities, it’s quite easy to get started.</p>
<section id="building-pipelines" class="level3">
<h3 class="anchored" data-anchor-id="building-pipelines">Building Pipelines</h3>
<p>They both have the possibility to use decorators which makes the code very simple to read, alas the ClearML way of doing things is not quite as smooth as ZenML.</p>
<blockquote class="blockquote">
<p><strong>ZenML</strong> builds <em>pipelines</em> and <em>tasks/components</em> in a simpler better way.</p>
</blockquote>
</section>
<section id="tracking-experiments" class="level3">
<h3 class="anchored" data-anchor-id="tracking-experiments">Tracking Experiments</h3>
<p>To track experiments I believe both solutions got you covered. ClearML’s experiment tracker is quite good and works as you’d expect, while ZenML you decide which tool you want to use (I opted to MLFlow).</p>
<p>ZenML supports: Comet, MLFlow, Neptune, WandB, &amp; Custom. ClearML supports: ClearML.</p>
<blockquote class="blockquote">
<p>It’s a <strong>draw</strong>, ZenML supports “better” trackers BUT <strong>ClearML has a native integration which makes things a lot easier.</strong></p>
</blockquote>
</section>
<section id="orchestrators" class="level3">
<h3 class="anchored" data-anchor-id="orchestrators">Orchestrators</h3>
<p>Both have a simple to use orchestrator. Once again ZenML leans back towards the giants while ClearML uses a built-in native orchestrator that binds everything together.</p>
<blockquote class="blockquote">
<p>It’s a <strong>draw</strong>.</p>
</blockquote>
</section>
</section>
<section id="ui" class="level2">
<h2 class="anchored" data-anchor-id="ui">UI</h2>
<p>One of the more important parts of a tool is the UI. Here I believe in a way ZenML is strong as they “off-load” each components UI to the component itself, i.e.&nbsp;MLFlow tracing is shown in MLFlow UI.</p>
<p>The UI itself of each tool, i.e.&nbsp;WandB, is much better than ClearML’s offering in my opinion.<br>
<strong>But</strong> the integration of ClearML as a tool “solve all” is a HUGE timesaver and I think could outweigh using the “better” tooling. Integrating everything from Experiment Comparison to Report Building is an quite amazing feat that I think is worthwhile applauding.</p>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<p>First and foremost, I see both Open Source offering moving more and more towards a SaaS. This is clearly visible by locking certain features in the UI (ZenML, the new UI is beautiful but locked down without your Cloud offering). It’s also shown by supplying additional superb features even when self-hosted. I do understand the need to pay your bills, but it’s sad to see Open Source moving to this either way.</p>
<p>See <a href="https://www.zenml.io/open-source-vs-cloud">ZenML comparison (Open Source &lt;&gt; Cloud)</a> and <a href="https://clear.ml/pricing">ClearML one</a>.</p>
<p>Sometimes the best option is to opt for the “cloud-native” one, i.e.&nbsp;AWS/Azure/GCP tools. But I love open source… :)</p>
<p><strong>Anyhow, to finalize here’s my judgement:</strong></p>
<ul>
<li>If you prefer to keep your stack as simple as possible: ClearML.</li>
<li>If you prefer to keep your stack customized having the best tool for each part: ZenML.</li>
</ul>
<p>I cannot pick a winner, ZenML enables simpler transition and better tooling all in all, but the full-on integration of ClearML with “everything working together” is quite magical and similar to the cloud-native options (AWS Sagemaker/Azure MLStudio/GCP Vertex).</p>
<p>Find the code for each framework running MNIST: ….</p>
<p>Thanks for this time, Hampus</p>
<section id="clearml" class="level3">
<h3 class="anchored" data-anchor-id="clearml">ClearML</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1">task <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Task.init(</span>
<span id="cb1-2">    project_name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MNIST Digit Recognition"</span>,</span>
<span id="cb1-3">    task_name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Simple NN model with PyTorch Lightning"</span>,</span>
<span id="cb1-4">    task_type<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>Task.TaskTypes.training,</span>
<span id="cb1-5">    output_uri<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>,</span>
<span id="cb1-6">)</span>
<span id="cb1-7"></span>
<span id="cb1-8"></span>
<span id="cb1-9"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">class</span> SimpleNN(pl.LightningModule):</span>
<span id="cb1-10">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">__init__</span>(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>):</span>
<span id="cb1-11">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">super</span>(SimpleNN, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>).<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">__init__</span>()</span>
<span id="cb1-12">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.fc1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> nn.Linear(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">784</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">512</span>)</span>
<span id="cb1-13">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.dropout <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> nn.Dropout(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>)</span>
<span id="cb1-14">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.fc2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> nn.Linear(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">512</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb1-15"></span>
<span id="cb1-16">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> forward(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, x):</span>
<span id="cb1-17">        x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.flatten(x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-18">        x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.fc1(x)</span>
<span id="cb1-19">        x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.relu(x)</span>
<span id="cb1-20">        x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.dropout(x)</span>
<span id="cb1-21">        x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.fc2(x)</span>
<span id="cb1-22">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> torch.log_softmax(x, dim<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-23"></span>
<span id="cb1-24">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> training_step(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, batch, batch_idx):</span>
<span id="cb1-25">        data, target <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> batch</span>
<span id="cb1-26">        output <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>(data)</span>
<span id="cb1-27">        loss <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> nn.functional.cross_entropy(output, target)</span>
<span id="cb1-28">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.log(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train_loss"</span>, loss)</span>
<span id="cb1-29">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> loss</span>
<span id="cb1-30"></span>
<span id="cb1-31">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> test_step(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, batch, batch_idx):</span>
<span id="cb1-32">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>(batch[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb1-33"></span>
<span id="cb1-34">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> configure_optimizers(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>):</span>
<span id="cb1-35">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> optim.Adam(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.parameters(), lr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.001</span>)</span>
<span id="cb1-36"></span>
<span id="cb1-37"></span>
<span id="cb1-38">params_dictionary <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"epochs"</span>: <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>}</span>
<span id="cb1-39">task.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>(params_dictionary)</span>
<span id="cb1-40"></span>
<span id="cb1-41">transform <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> transforms.Compose(</span>
<span id="cb1-42">    [transforms.ToTensor(), transforms.Normalize((<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>,), (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>,))]</span>
<span id="cb1-43">)</span>
<span id="cb1-44"></span>
<span id="cb1-45">train_dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> datasets.MNIST(</span>
<span id="cb1-46">    root<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"./data"</span>, train<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, transform<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>transform, download<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb1-47">)</span>
<span id="cb1-48">test_dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> datasets.MNIST(root<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"./data"</span>, train<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>, transform<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>transform)</span>
<span id="cb1-49"></span>
<span id="cb1-50">train_loader <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.utils.data.DataLoader(</span>
<span id="cb1-51">    dataset<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>train_dataset, batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">128</span>, shuffle<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb1-52">)</span>
<span id="cb1-53">test_loader <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.utils.data.DataLoader(</span>
<span id="cb1-54">    dataset<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>test_dataset, batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">128</span>, shuffle<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb1-55">)</span>
<span id="cb1-56"></span>
<span id="cb1-57">model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> SimpleNN()</span>
<span id="cb1-58">trainer <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pl.Trainer(max_epochs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>params_dictionary[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"epochs"</span>])</span>
<span id="cb1-59">trainer.fit(model, train_loader)</span>
<span id="cb1-60">trainer.test(dataloaders<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>test_loader)</span></code></pre></div></div>
</section>
<section id="zenml" class="level3">
<h3 class="anchored" data-anchor-id="zenml">ZenML</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@zenml.step</span></span>
<span id="cb2-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> load_mnist() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Tuple[</span>
<span id="cb2-3">    Annotated[torch.utils.data.DataLoader, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train_loader"</span>],</span>
<span id="cb2-4">    Annotated[torch.utils.data.DataLoader, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"test_loader"</span>],</span>
<span id="cb2-5">]:</span>
<span id="cb2-6">    ...</span>
<span id="cb2-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> train_loader, test_loader</span>
<span id="cb2-8"></span>
<span id="cb2-9"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@zenml.step</span></span>
<span id="cb2-10"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> train_model(</span>
<span id="cb2-11">    train_loader: torch.utils.data.DataLoader, test_loader: torch.utils.data.DataLoader</span>
<span id="cb2-12">):</span>
<span id="cb2-13">    model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> SimpleNN()</span>
<span id="cb2-14">    trainer <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pl.Trainer()</span>
<span id="cb2-15">    trainer.fit(model, train_loader)</span>
<span id="cb2-16">    trainer.test(dataloaders<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>test_loader)</span>
<span id="cb2-17"></span>
<span id="cb2-18"></span>
<span id="cb2-19"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@zenml.pipeline</span></span>
<span id="cb2-20"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> train_pipeline():</span>
<span id="cb2-21">    train_loader, test_loader <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_mnist()</span>
<span id="cb2-22">    train_model(train_loader, test_loader)</span>
<span id="cb2-23"></span>
<span id="cb2-24"></span>
<span id="cb2-25"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">__name__</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"__main__"</span>:</span>
<span id="cb2-26">    train_pipeline()</span></code></pre></div></div>


</section>
</section>

 ]]></description>
  <category>mlops</category>
  <guid>https://blog.londogard.com/posts/2024-05-05-zenml-vs-clearml/</guid>
  <pubDate>Sun, 05 May 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Streamlit Fragments - Make the Dashboard Dream come true</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2024-04-17-streamlit-fragments/</link>
  <description><![CDATA[ 





<p>An old coworker gave me a shout-out that <a href="https://docs.streamlit.io/develop/quick-reference/changelog">Streamlits latest (1.33.0)</a> release added Fragments.</p>
<p>Fragments simply put enables creation of <em>indepedently updated</em> fragments inside your streamlit application. Further they add a simple <code>run_every</code>which simplify dashboards (continuously fetching data).</p>
<p>As always, the <a href="https://docs.streamlit.io/develop/api-reference/execution-flow/st.fragment">documentation</a> explains a lot of how it works.</p>
<section id="play-around" class="level2">
<h2 class="anchored" data-anchor-id="play-around">Play Around</h2>
<p>First I play around with fragments, testing the most simple use-case – and I’m sold!</p>
<blockquote class="blockquote">
<p><strong>N.B.</strong> this is already possible in other tools such as Solara that has a better reactive approach, but streamlit has a bigger user-base and I love to see a solution to this long-standing problem!</p>
</blockquote>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>The code
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-1" class="callout-1-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> streamlit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> st</span>
<span id="cb1-3"></span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> main():</span>
<span id="cb1-6">    st.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"# Main Function"</span>)</span>
<span id="cb1-7">    st.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hello, World! (main)"</span>)</span>
<span id="cb1-8">    st.toggle(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Toggle me!"</span>)</span>
<span id="cb1-9"></span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@st.experimental_fragment</span>()</span>
<span id="cb1-12"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> first_fragment():</span>
<span id="cb1-13">    st.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"## First Fragment"</span>)</span>
<span id="cb1-14">    random_choice <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.choice([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"b"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"c"</span>])</span>
<span id="cb1-15">    st.write(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Random choice: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>random_choice<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb1-16">    st.toggle(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Toggle me! (1st Fragment)"</span>)</span>
<span id="cb1-17"></span>
<span id="cb1-18"></span>
<span id="cb1-19"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@st.experimental_fragment</span>(run_every<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2s"</span>)</span>
<span id="cb1-20"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> second_fragment():</span>
<span id="cb1-21">    st.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"## Second Fragment"</span>)</span>
<span id="cb1-22">    st.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hello, World! (2nd Fragment)"</span>)</span>
<span id="cb1-23">    random_choice <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.choice([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"b"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"c"</span>])</span>
<span id="cb1-24">    st.write(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Random choice: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>random_choice<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb1-25">    st.toggle(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Toggle me! (2nd Fragment)"</span>)</span>
<span id="cb1-26"></span>
<span id="cb1-27"></span>
<span id="cb1-28"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">__name__</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"__main__"</span>:</span>
<span id="cb1-29">    main()</span>
<span id="cb1-30">    c1, c2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> st.columns(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb1-31"></span>
<span id="cb1-32">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> c1:</span>
<span id="cb1-33">        first_fragment()</span>
<span id="cb1-34">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> c2:</span>
<span id="cb1-35">        second_fragment()</span></code></pre></div></div>
</div>
</div>
</div>
<p>This enables the following behavior:</p>
<ol type="1">
<li>Toggling “main” will refresh everything</li>
<li>Toggling a fragment will <em>only</em> refresh that fragment</li>
<li>Second fragment will refresh every 2 seconds</li>
</ol>
<blockquote class="blockquote">
<p><strong>What is refreshed?</strong> The <em>Random choice</em> letter is updated to a random letter (a, b, or c).</p>
</blockquote>
<p>All in all this is what we’d probably do in a Dashboard. See the following GIF’s:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2024-04-17-streamlit-fragments/assets/toggle_fragments.gif" class="img-fluid figure-img"></p>
<figcaption>Streamlit Fragments and Toggling (simplest use-case) - note the ‘Random Choice’ changing.</figcaption>
</figure>
</div>
</section>
<section id="adding-complexity" class="level2">
<h2 class="anchored" data-anchor-id="adding-complexity">Adding Complexity</h2>
<p>As always it’s a lot more fun to test these things in scenarios that are closer to real-life, and that’s what I intend to do!</p>
<ol type="1">
<li>Fetching data from a data storage</li>
<li>Displaying different graphs</li>
<li>Sharing state from main</li>
</ol>
<p>In this graph we have a <em>Amplitude Multiplier</em> (main) that affects both fragments, additionally we have a sine wave where the frequency is editable and will only re-render (re-compute) that fragment (first). Finally there’s a Stock Fragment (second) which automatically updates every 2 seconds, unless locked it’ll randomly select a stock, if locked we can still change stock and it’ll only re-render that fragment (second).</p>
<p>See the GIF below! 👇</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.londogard.com/posts/2024-04-17-streamlit-fragments/assets/complex_fragments.gif" class="img-fluid figure-img"></p>
<figcaption>Sine wave and Stocks, with automatic Stock Refresh</figcaption>
</figure>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-2-contents" aria-controls="callout-2" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Code
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-2" class="callout-2-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> streamlit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> st</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> polars <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pl</span>
<span id="cb2-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> plotly.express <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> px</span>
<span id="cb2-5"></span>
<span id="cb2-6"></span>
<span id="cb2-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> main() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>:</span>
<span id="cb2-8">    st.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"# Main Function"</span>)</span>
<span id="cb2-9">    st.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hello, World! (main)"</span>)</span>
<span id="cb2-10">    multiplier <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> st.slider(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Amplitude Multiplier"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">10.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)</span>
<span id="cb2-11"></span>
<span id="cb2-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> multiplier</span>
<span id="cb2-13"></span>
<span id="cb2-14"></span>
<span id="cb2-15"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@st.cache_resource</span></span>
<span id="cb2-16"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> get_stocks() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> pl.DataFrame:</span>
<span id="cb2-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> pl.read_csv(</span>
<span id="cb2-18">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://raw.githubusercontent.com/vega/datalib/master/test/data/stocks.csv"</span></span>
<span id="cb2-19">    )</span>
<span id="cb2-20"></span>
<span id="cb2-21"></span>
<span id="cb2-22"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@st.experimental_fragment</span>()</span>
<span id="cb2-23"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> first_fragment(multiplier: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>):</span>
<span id="cb2-24">    st.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"## First Fragment"</span>)</span>
<span id="cb2-25">    sine_frequency <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> st.slider(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sine Frequency"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">10.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)</span>
<span id="cb2-26">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create sine wave with multiplier height and sine_frequency as frequency</span></span>
<span id="cb2-27">    t <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.linspace(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> np.pi <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> sine_frequency, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb2-28">    y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> multiplier <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> np.sin(t)</span>
<span id="cb2-29"></span>
<span id="cb2-30">    df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pl.DataFrame({<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"t"</span>: t, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>: y})</span>
<span id="cb2-31">    st.plotly_chart(</span>
<span id="cb2-32">        px.line(df, x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"t"</span>, y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>, title<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sine wave"</span>), use_container_width<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb2-33">    )</span>
<span id="cb2-34"></span>
<span id="cb2-35"></span>
<span id="cb2-36"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@st.experimental_fragment</span>(run_every<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2s"</span>)</span>
<span id="cb2-37"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> second_fragment(multiplier: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>):</span>
<span id="cb2-38">    st.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"## Second Fragment"</span>)</span>
<span id="cb2-39">    c1, c2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> st.columns(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb2-40"></span>
<span id="cb2-41">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> c1:</span>
<span id="cb2-42">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> st.checkbox(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Lock company"</span>):</span>
<span id="cb2-43">            st.session_state[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ticker_select"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.choice(</span>
<span id="cb2-44">                [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AAPL"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GOOG"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AMZN"</span>]</span>
<span id="cb2-45">            )</span>
<span id="cb2-46">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> c2:</span>
<span id="cb2-47">        ticker <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> st.selectbox(</span>
<span id="cb2-48">            <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Company (symbol)"</span>, [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AAPL"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GOOG"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AMZN"</span>], key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ticker_select"</span></span>
<span id="cb2-49">        )</span>
<span id="cb2-50">    stocks <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_stocks()</span>
<span id="cb2-51">    stocks <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stocks.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">filter</span>(pl.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"symbol"</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> ticker).with_columns(</span>
<span id="cb2-52">        pl.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"price"</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> multiplier</span>
<span id="cb2-53">    )</span>
<span id="cb2-54"></span>
<span id="cb2-55">    st.plotly_chart(</span>
<span id="cb2-56">        px.line(stocks, x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"date"</span>, y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"price"</span>, title<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Stock price (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ticker<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>),</span>
<span id="cb2-57">        use_container_width<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb2-58">    )</span>
<span id="cb2-59"></span>
<span id="cb2-60"></span>
<span id="cb2-61"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">__name__</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"__main__"</span>:</span>
<span id="cb2-62">    multiplier <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> main()</span>
<span id="cb2-63">    c1, c2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> st.columns(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb2-64"></span>
<span id="cb2-65">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> c1:</span>
<span id="cb2-66">        first_fragment(multiplier)</span>
<span id="cb2-67">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> c2:</span>
<span id="cb2-68">        second_fragment(multiplier)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="drawbacks" class="level2">
<h2 class="anchored" data-anchor-id="drawbacks">Drawbacks</h2>
<p>This solution doesn’t fit every scenario, and as usual with Streamlit, integrating it introduces complexity via state management. Fragments add another level atop the existing <code>st.state</code>, potentially introducing more intricacies and headaches.</p>
<p>Other solutions such as Solara and Panel has this more built into the solution, but then again their entry threshold is a lot higher!</p>
</section>
<section id="outro" class="level1">
<h1>Outro</h1>
<p>Any other questions? Please go ahead and ask!</p>
<p>This development is exciting and will for sure give Streamlit new life in “efficiency”. I, for one, am happy to see all new Data Apps fighting!</p>
<p>Finally, all the code is available on this blogs <a href="https://github.com/londogard/londogard">github</a> under <em>code_snippets</em>.</p>
<p>/ Hampus Londögård</p>


</section>

 ]]></description>
  <category>streamlit</category>
  <guid>https://blog.londogard.com/posts/2024-04-17-streamlit-fragments/</guid>
  <pubDate>Wed, 17 Apr 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>TIL: Pixi by prefix.dev</title>
  <dc:creator>Hampus Londögård</dc:creator>
  <link>https://blog.londogard.com/posts/2024-03-20-til-pixi.html</link>
  <description><![CDATA[ 





<p>This is a very short one. Keeping it for myself!</p>
<p>For my recent minor projects I’ve been utilizing Pixi to run my virtual environments and it actually works great!</p>
<p>It’s simple to start and keep going. What’s even better?</p>
<ol type="1">
<li>Supports <a href="https://prefix.dev/blog/introducing_multi_env_pixi">multiple environments</a> (e.g.&nbsp;CUDA + CPU)</li>
<li>Supports <a href="https://pixi.sh/latest/advanced/multi_platform_configuration/">multiple platforms</a> (e.g.&nbsp;osx-arm64 and linux-64)!</li>
<li>Fast (3x faster than micromamba, 10x faster than conda!)</li>
<li>Integrates better with pypi</li>
<li>Has tasks (e.g.&nbsp;<code>pixi run test</code> or <code>pixi run inference</code>) that you define yourself</li>
<li>Lockfiles, it’s painful to use micromambas lockfiles. Hence dual file system as in poetry/nodeJS etc is great!</li>
</ol>
<p>Helpful right? Indeed!</p>
<section id="simple-get-started" class="level1">
<h1>Simple get-started</h1>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pixi</span> init <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create toml and lock files</span></span>
<span id="cb1-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pixi</span> add python polars <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># add python and polars as dependencies</span></span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pixi</span> shell <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># activates the virtual environment</span></span>
<span id="cb1-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Alternatively, "pixi run python ..."</span></span></code></pre></div></div>
</section>
<section id="add-task" class="level1">
<h1>Add Task</h1>
<p>Tasks are really awesome to reduce the threshold to enter projects. It’s simpler than a spread of bash-scripts or other things.</p>
<p>One standardized way to do things! :)</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pixi</span> task add gui <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"solara run app.py"</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Adds task</span></span>
<span id="cb2-2"></span>
<span id="cb2-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pixi</span> run gui <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># runs Solara App</span></span></code></pre></div></div>
</section>
<section id="outro" class="level1">
<h1>Outro</h1>
<p>Please read the <a href="https://pixi.sh/latest/">docs</a> to learn more!</p>


</section>

 ]]></description>
  <category>TIL</category>
  <category>python</category>
  <guid>https://blog.londogard.com/posts/2024-03-20-til-pixi.html</guid>
  <pubDate>Wed, 20 Mar 2024 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
