January 23rd, 2025

Edge AI Survival Guide: How not to get left behind (or commit too early)

The movie Apollo 13 opens with Jim Lovell (played by Tom Hanks) telling an audience of listeners about the developments of the space program:

It’s the best part of each one of us: the belief that anything is possible. Things like a computer that can fit into a single room and hold millions of pieces of information.

This miniaturization of things has continued in computing, leading to increasingly capable devices with smaller and smaller form factors. This trend has empowered the modern Edge, and we expect it to power the future of AI at the Edge.

This is an area that is still very early and has a rapidly changing landscape. Few organizations have large-scale, time-proven deployments today. However, there are some questions you can ask yourself if your organization is contemplating the pursuit of AI at the Edge.

What is AI at the Edge?

First, let’s be clear about what we mean by AI at the Edge for the purposes of this article.

We define AI as machine learning model inferences taking place in [near] real-time at a potentially disconnected edge location. This post will focus on the challenges of interference at the Edge.

One consensus from our early experience is that round-trip inference to the cloud will not work for use cases that require high volumes of data (frequent images, video) and low latency. Add to that the frequent offline operational needs for critical business applications that drove the industry to the traditional edge in the first place, and you quickly see that AI inference at the Edge will be an eventual reality.

With today’s technology, training large models that require significant GPU capacity at the Edge is likely a non-starter due to budget constraints. There are certainly exceptions for simple or lightweight models where all training data is site-specific. Still, for most organizations, large-scale model training will need to occur in the cloud due to GPU cost and benefits from larger, multi-site consolidated datasets. We’ll touch on training data collection and egress further below.

Considerations for those getting started

The Business Case

First things first: What are you trying to do? We’d recommend deploying minimal hardware assets necessary to achieve your business objectives. Some emerging use cases that seem promising are Computer Vision for operational telemetry information, predictive models (maintenance, goods production forecasts) or–more generally–enabling faster and higher-quality business decisions at the edge. We expect to see Large Language Models at the Edge in the future (some can already run on constrained devices) as well, though our group has yet to see any at-scale deployments yet.

In short, don’t get caught up in the hype cycle. Make sure you have a clear business case and invest just enough to achieve it.

Cost at Scale

Model inferences that require Edge GPUs have the potential to scale cost models beyond reason, particularly for large-scale edge deployments targeting thousands or more sites.

Do you need GPU capability at the Edge? If so, how many? Can you manage GPU slicing to share, or do you need to pin them to particular use cases? If so, how will you scale use case n+1? If the use case is truly worth doing, will it be okay if it disappears for a time due to a single point of failure? If not, are you willing to deploy two or more devices per site for HA? How does this scale to future use cases?

As you can imagine, the Edge GPU story can get expensive very fast. This is not to say that GPU at the edge is an anti-pattern… rather we would underscore the extreme importance of a business case where the value derived exceeds the total cost of ownership. Even if you find these use cases, be careful to ponder how you will scale future use cases. Do you have physical space to add more devices? network ports? Will you be okay with a wide variety of hardware solutions and GPUs over time?

Local Edge vs. Device Edge

Can you push some of the initial inference to the device vs. centralizing it and avoid the need to scale up your edge compute capacity to run many different models? For example, imagine computer vision inference on the camera at the device edge, with metadata sent to the local edge for synthesis with other data and downstream applications.

The tradeoff with this approach is highly distributed model management on an order of magnitude more devices. This works well when models are fairly homogenous from site-to-site and device-to-device but is quickly a nightmare when models become heterogeneous (which is likely).

Training Data Collection and Egress

Another consideration before starting an Edge AI journey is training data collection. Most organizations developing edge models will benefit from consolidating training data from many edge sites to improve model accuracy.

This is challenging due to storage constraints at the edge (making retention challenging) and limited bandwidth for shipping data to the cloud for above-site training (making it hard to get the data out).

  • If your sites are mostly connected, consider shipping data out during times of low bandwidth utilization (perhaps nights, weekends, holidays, and non-peak times).
  • If your sites are often disconnected, consider using a physical device like an AWS Snowball or Google Transfer Appliance to store data for a season and then physically move it to the cloud for initial training. This activity should become increasingly rare as a model matures.
  • Consider techniques to capture and egress only that data which is useful for model refinement (inference failures, low confidence, etc) feedback loops.

Vertical Solutions

Some organizations may have a single solution where AI at the Edge is necessary and may be able to acquire a full-stack solution from a vendor to meet this need. While the ideal architecture would house all components within the primary site Edge infrastructure, there may be business value in diverging from this pattern in rare cases.
If you are procuring a vertical solution to augment your core edge infrastructure, consider the following.

  • Physical space: do we have a place to put this thing(s)? Presumably, there is some degree of HA present in the vendor solution.
  • Interoperability: Can you treat the vertical solution as a client of the edge ecosystem much like you would an IOT device or other connected component? How will integrations work?
  • Disconnected States: can you achieve your business outcomes in online and offline states? How does the vertical solution integrate with the rest of the technology ecosystem at the site? Can data pass between the vertical solution and the site edge without a WAN connection? If not, is this tolerable?
  • Security and Support Models: Can the vertical solution comply with your edge network requirements, especially if the vendor needs to provide remote support for their solution?

Data Foundation

Data management is critical to successful AI everywhere. How is your data quality at the edge? Do you have garbage that gets cleaned up in the cloud later? How will that work for your desired use cases? Do you understand how to gather the data you need in near-real-time within your site?

Now is the time to get started on your data management foundation to ensure you are ready for AI use cases when they arrive.

Time Horizon

What is your time horizon for deploying a traditional Edge and for AI at the Edge?

Prepare for the future

If you are about to make an edge investment, don’t forget to consider AI / ML applications that may be coming in the future. How do you prepare for that future? There are a few approaches to consider:

  • Buy capable nodes today: Do you future-proof by buying capable nodes now that will be able to handle the workloads of the future? If you build it, they will come. Just know they might not, and you will probably waste a lot of money. When they do come, they might have totally different needs than what you can support if you invest today. In other words, we think this is a bad idea.
  • Timed Refresh: Deploy a traditional Edge infrastructure and plan to wait on any large-scale AI deployments until the next hardware refresh (3-7 years). This comes with the benefit of time, during which progress will be made. Things will likely get more capable, simpler, and cheaper over time. You can always pivot if a killer use case emerges for your business.
  • Split the Gap Strategy: Deploy nodes with something like an iGPU to provide some capability at marginal cost to enable simpler AI edge solutions. Work within those constraints until a timed refresh or premature refresh to replace one-to-many nodes.

A lot of potential value from the Edge can be gained even without AI, so don’t wait for AI… but don’t forget about it either.

Building the Runway

So you have a plane you want to fly? Great. You’ll need a runway. If you are currently starting in a jungle, you have work to do to build your runway first. The same principle holds with AI at the edge.

AI at the Edge is a rocket ride you probably won’t want to miss, so start preparing today. Experiment and learn in the wild. At the same time, be careful about committing too much too early.

How will you know the time is right? When you start to see a line of business cases that add value, make sense at the edge, and are cost-viable… the time to go all-in has come. Until then, exercise caution and restraint.

If you want to survive the AI era at the Edge, you don’t want to get left behind… but don’t get too far ahead either.

In our next post, we will share the best practices for Kubernetes Edge deployments. Be sure to subscribe for updates and follow us on LinkedIn.

The Edge Monsters: Brian Chambers, Michael Henry, Erik Nordmark, Joe Pearson, Alex Pham, Jim Teal, Dillon TenBrink, Tilly Gilbert, Anna Boyle & Michael Maxey

Prepared by Anna Boyle, STL Partners

OUR SPONSORS

CONTINUE READING

SUBSCRIBE NOW

Contact us