Whatâs hot at the edge this summer?
Do the differentiated heavy lifting
About a decade ago, Amazon Web Services popularized the phrase âundifferentiated heavy liftingâ as a way to reference all of the difficult work that goes into being able to run large-scale, production infrastructure. Today, the idea of letting hyperscale cloud providers do that heavy lifting so that you can focus on your business application is assumed.
Should it be? In many cases, the answer remains yes⊠but in some cases that heavy lifting may actually be differentiating for your business, whether that is in terms of cost economics, privacy, ownership, or bringing compute to exactly the places you want it.
We are, of course, referring to edge computing. Weâve recognized that many industries, from retail to industrial automation to national defense, really benefit from differentiating their capabilities by building on top of on-prem edge environments or distributed edge clouds. In these cases, doing the differentiated heavy lifting is not wasted effort. Sometimes your organization is the only one that can execute on the vision and leverage it to differentiate in the market.
Monster Tips
- Love the cloud, but in some cases, consider doing the differentiated heavy lifting.
More Flavors of Edge
It likely goes without saying, but weâve been hearing a lot about how the edge is not a crazy idea after all. In fact, we are hearing a lot of chatter this summer about everyone from enterprises to nation states wanting to do more at the edge and less in the cloud, especially in retail, industrial automation, and defense.
- On-prem edge: The on-prem edge is the modern approach to managing edge computing workloads that are âclose to the actionâ, often deployed in a large volume of identical or similar footprints. These are typically geographically diverse, and workloads often serve a particular site or location. These environments are typically connected, but latency or reliability interests drive edge deployment. This is the edge that we typically talk about.
- Distributed edge cloud: The distributed edge cloud is an emerging approach to workload placement and management, deploying a modern edge architecture in a highly-geographically distributed network of points of presence to run workloads that would typically be run in the cloud. You can think of this as a sort of repatriation play, but using a leaner and scrappier edge model instead of a full datacenter model.
- AI-Powered Edge: We brought traditional compute close to the action. The AI workloads, often requiring GPUs, are next. Keep reading for a full section on this topic below.
We see a number of factors driving these behaviors:
- Cost concerns
- Bringing compute close to the data and close to the business
- Regulatory requirements
- Concerns about complete dependence on hyperscale cloud providers
- Preparation for a world with more AI, robotics, and automation at the edge
Monster Tips
- Now is a great time to start your edge journey, especially if you are in an area rich with opportunity (retail, automation, defense).
AI Models at Edge
AI at the edge remains a very hot topic. While Large Language Models have a lot of attentionâespecially since gpt-oss:20b was releasedâComputer Vision (CV) model inference continues to be the dominant real-world edge AI deployment. Numerous organizations continue to experiment with and find success in the deployment of CV solutions to aid in data gathering and operational automation use cases.
Why consider AI at the edge? We see two primary reasons today:
- To create new operationally useful data (often via computer vision)
- To take action based on data within the edge environment (likely agentic in the future: make a plan, reason, call tools, take actions)
We see #1 beginning to happen today, often via Computer Vision solutions using the NVIDIA Jetson family of devices, which is highly regarded among our group. We see #2 just beginning, so letâs turn our attention to language models now.
Are LLMâs the next wave? The most capable open source models still require fairly deep pockets, which is problematic at edge scale (thousands or more copies). Want to run gpt-oss:20b in your edge environment? You are likely looking at a 24GB+ GPU to do the job and achieve reliable performance and full context length. Here are some potential options:
Nvidia
- RTX 5090 (32GB) $2500+
- RTX 4090 (24GB) $2500
- RTX A5000 (24GB) – $1,800
- RTX A4500 (20GB) – $1,200+
- A30 (24GB) – $4,500
- A10 (24GB) – $2,600
- L4 (24GB) – $2,400
AMD
- Radeon Pro W7800 (32GB) – $2,000
- Radeon RX 7900 XTX (24GB) – $950
Keep in mind these are just the prices for the GPU card and do not include the rest of the build out, which includes a CPU, motherboard, NVME storage, fans, cooling, and more. Plus building, shipping, and replacing over time.
Looking at these numbers and following the principle of âwork on what is barely possible or a little too expensiveâ, the edge LLM space is likely ripe for experimentation, but likely too early for full-scale, cost-effective rollout.
That brings us to the recent trend towards Small Language Models (SLMs), triggered by a position paper from researchers at NVIDIA. The paper posits that âsmall language models are the future of agentic AIâ. Small language models are loosely defined as those with fewer than 10B parameters. Given that agentic AI is a big opportunity at the edge and that SLMs are smaller, faster, and cheaper to fine-tune, and are able to run on less expensive hardware, they may have a near-term home at the edge.
Monster Tips
- Figure out if you need AI models in your edge environment, whether CV or Language or otherwise (there are lots of ML use cases, but CV and Language push into more demanding compute environments, hence the focus here).
- Are you comfortable with open source models in your edge environment? Which open source models is your company open to? Experiment with those models on an array of hardware options. For some GPUs and models, you can do some initial testing in the cloud without buying and assembling lots of hardware components.
- Check out an array of models from OpenAIâs gpt-oss:20b to Mistralâs ministral-3b-instruct to Googleâs Gemma models to Alibabaâs qwen models.
- Are you okay with having a single point-of-failure in an expensive GPU? If it is worth this sizable investment, can you really tolerate absence, or do you need two?
- Now is the time to experiment with edge-deployed LLMs in labs or at a very small scale. They are too expensive for most organizations to deploy to thousands of footprints, but that is likely to change with time.
- Now is the time to learn about SLMs and start experimenting in labs and perhaps in the field if some GPU capacity is available.
- Edge Monsters have had a lot of success with the NVIDIA Jetson family for CV inference purposes.
Invest in the Solid Foundation
As edge compute solutions grow in usage and number of copies, running more and more business-critical workloads, the quality of the foundation grows exponentially. We continue to see companies with successful deployments invest in refining the âcoreâ of their edge solutions.
What comprises the foundation?
- Networking: What paths do you have to your edge nodes? How do you get essential data in and out of your environment?
- Compute: What compute foundation are you building on? Are you planning for AI workloads now or at the next refresh? Do you have a hardware root of trust from which to drive secure onboarding?
- Workload management: Bare metal or VMs? Kubernetes or something else?
- Observability: Can you detect issues with nodes, clusters, applications, shared services, and other devices in your edge environment? Can you get that data out?
- App Deployment: How do applications get deployed reliably to the edge? How about rollbacks? Canary deploys?
- Cloud Interoperability: How does your edge environment co-exist with your datacenter or cloud footprint?
- Storage: Will you persist data at the edge? If so, how? Or will you go ephemeral?
Monster Tips:
- Invest in a solid foundation and keep refining the stack as the technology ecosystem evolves, whether thatâs improving your observability posture with Open Telemetry or deploying a better storage solution.
- Keep following Edge Monsters! đ
We hope you have had an excellent summer full of healthy nodes, running clusters, and minimal network outages. Even if you had your share of the inevitable failures, we know your organization kept running without a blip due to your amazing edge architecture. đ«Ą
Be sure to subscribe for updates and follow us on LinkedIn.
The Edge Monsters: Jim Beyers, Colin Breck, Brian Chambers, Michael Henry, Chris Milliet, Erik Nordmark, Joe Pearson, Jim Teal, Dillon TenBrink, Tilly Gilbert, Anna Boyle & Michael Maxey