From Wahala to Workaround: Deploying AI-Powered Services with AWS Lambda

How we navigated SDK frustrations, serverless quirks, and built a cleaner AI integration path at Wowzi.

Jun 23, 2025

Serverless Architecture Workflow Diagram

At Wowzi, we’re constantly exploring ways to improve the experience of creators and advertisers through intelligent, scalable infrastructure. As part of that journey, our engineering team set out to introduce AI-powered services that could support content workflows across our platform: from content generation to moderation and insights.

"Wahala" is a Nigerian Pidgin English word that primarily means trouble or problem. It can also refer to inconvenience, fuss, or bother. The meaning can vary depending on context, but it generally indicates a negative situation or difficulty.

The goal was to build a serverless service that could interface with LLMs behind the scenes and deliver fast, flexible responses at scale. Of course, it is retrieval augmented. We opted for a Lambda-first architecture to maintain our commitment to elastic infrastructure and reduce operational overhead. The plan was simple: integrate a general-purpose AI SDK and swap out model providers as needed.

Or so we thought.

The Setup

To accelerate development and ensure long-term flexibility, we chose to start with the openai Python SDK. It had a clean abstraction, support for multiple model types, and offered the promise of provider-agnostic usage if you stayed close to the OpenAI-compatible API standards.

Given our serverless-first approach, the natural next step was to deploy it within an AWS Lambda Layer, targeting Python 3.11 with the ARM64 architecture for cost and performance benefits.

Where Things Got Complicated

Things unraveled fast.

First, the openai package brought in pydantic-core, a compiled dependency with tight coupling to specific versions of Python and system architecture. Installing it locally was no problem, but once deployed into Lambda, we started encountering runtime errors like:

“cannot import name 'from_json' from 'pydantic_core._pydantic_core'”

This suggested binary mismatches. So we went back to basics and rebuilt everything inside Docker containers that mirrored Lambda’s runtime; even targeting manylinux2014_aarch64 for full compatibility.

That fixed the original import error…, but introduced a new one: Lambda could no longer find the openai module itself. Now you see the wahala.

We verified the zip structure, checked that python/ was at the correct level, ensured proper architecture matching, and even manually inspected every path Lambda was seeing during execution. The module was clearly there. Lambda simply refused to acknowledge it.

The Pivot

At this point, we paused and asked a deeper question:

Do we really need the SDK?

The value we were chasing was provider flexibility and a clean API interface. But that could just as easily be achieved by directly integrating with the REST APIs of each LLM provider.

And so, we pivoted.

Enter nemotalk: A Clean, Custom Abstraction

We built a small internal utility called nemotalk: a lightweight function that communicates directly with NVIDIA’s REST APIs. It handles authentication and response parsing without depending on any SDK or heavyweight Python packages. A couple other functions took handle the retrieval of data for prompt augmentation.

What we gained:

Portability: Switching from NVIDIA to another provider (e.g., Perplexity, Mistral, or open-source LLMs) is as simple as changing a base URL and a model identifier.

Zero SDK headaches: No more worrying about compiled dependencies breaking Lambda.

Control: We now own the interaction logic, which means better logging, failover strategies, and customization options going forward.

The end result is an AI-powered service that runs cleanly on AWS Lambda, aligns with our serverless infrastructure, and opens the door to faster iteration across multiple LLM platforms.

What’s Next

As our AI-assisted tooling matures, this foundation will allow us to deliver more intelligent features across the Wowzi platform. And as the LLM ecosystem evolves, we’re well-positioned to adapt quickly, with no need to rewrite large chunks of code or re-architect deployments.

This project is just one in a series of engineering efforts we’ll be sharing here on the Wowzi Tech blog, as we continue building scalable, modern infrastructure for Africa’s growing creator economy.

Stay tuned for more stories from the edge of AI, cloud, and creator tooling.

A guest post by

Paschal Amah

Exploring the intersection of software engineering, data, and product innovation. CTO at Wowzi. Building tech that powers the creator economy in Africa.

Wowzi’s Substack

Discussion about this post

Ready for more?