How Postman built, launched their Gen AI app, Postbot: A technical case study

By focusing on familiar products, prompt and workflow management, and effectively handling non-determinism, Postman has successfully implemented LLMs into production with Postbot.

Generative AI is becoming popular among SaaS companies, finding uses in many areas from improving customer experience to automating processes and enhancing product discovery.

Building with Gen AI is easier than ever, whether you’re using something ready-made or creating your own AI stack. Regardless, you’ll have to navigate the unique challenges that come with developing a proof-of-concept Gen AI app and taking it to production — especially given the non-deterministic nature of this technology.

When using generative AI technologies, it’s like dealing with a moving target. The output isn’t always accurate, latencies can be unusually high, and there’s no straightforward way to troubleshoot issues. The process of moving from experimentation to production can be long and tedious, too.

However, companies like Postman have successfully turned these challenges around. Postman launched their Gen AI app, Postbot, which is designed to debug and interpret APIs, expedite test writing, and analyze large datasets. Rajaswa Patil, an AI researcher at Postman, shared their experience of building Postbot and rolling it out to a user base of 30 million developers.

This article explores Postman’s experience in creating their first Gen AI product, highlighting both the challenges faced and the success achieved in taking it to production.

Postman’s foray into Gen AI

Postman, a major player in the global Indian SaaS ecosystem, is known for being at the forefront of technology. Even so, they’ve adopted a thoughtful approach towards integrating Gen AI, continually keeping their focus on resolving user problems rather than getting swayed by tech buzz.

They’ve looked into several aspects, like the efficiency and stability of the model, balancing non-deterministic and deterministic functionality, and distinguishing between AI-enabled and AI-native features.

The challenges they’ve faced aren’t unusual. Many firms I’ve encountered in the last year are grappling with the concept of transitioning potentially transformative proof-of-concepts to full production. Yet, what sets Postman apart is their approach and depth of adoption. The insights from their journey offer valuable lessons for many of us, so strap in!

Meet Postbot:

Postman’s AI assistant, Postbot, is designed to increase user productivity within Postman. The goal is to eventually build an autonomous agent for all things API, which is pretty lofty!

So, how did Postman manage to implement Postbot? What hurdles did they encounter while moving Postbot to production in the last nine months? What were the essential learnings from this experience? Let’s explore!

Postman’s initial focus was on two key areas for Postbot’s rollout:

Task automation
Feature discovery

Task automation with Postbot involves using the assistant to simplify the scripting process for testing API requests, visualizing responses, and creating API documentation.

This means that Postbot not only comprehends your API but also comprehends every aspect of the Postman platform.

Keeping up with the Generative AI Ecosystem:

Cost-effective models and elongated context: Generating tests for API collections and visualizing API responses often demand longer prompts. The Gen AI ecosystem is presently witnessing the evolution of more affordable models capable of this.

Code completion by Codelllma/Mistral7B, Better completion model GPT 3.5 turbo: Inline autocomplete feature

Function calling, Assistant APIs, and Open AI’s gradual discontinuation of legacy completion models: This indicates a shift towards developing more autonomous interfaces. Aligning with this trend, Postman is revamping the architecture of Postbot. This upgraded version will have its own context, execute user actions, and handle multi-step workflows within the application on behalf of the user.

Challenges:

1. Domain familiarity:

“Postman Flows” is a newly released, no-code product that enables users to build API workflows visually. Users can drag-and-drop components and build ‘flows’ for any use case. Being able to create these flows with AI would be awesome, but a considerable obstacle here is the lack of domain familiarity.

Since pre-training data of the models used don’t contain the syntax and semantics of the flow structure, it becomes relatively hard to use LLMs within the “Flows” product.

Contrast this to the “Collections” or “Testing” products that have been in the open domain for a while now. They are widely adopted, and most LLMs have already been exposed to them. Adding effective AI capabilities to these two products bore fruit faster.

To extract maximum value from LLMs, it is beneficial to use widely adopted products rather than trying to implement LLMs in new ones as done with Collections and test script generations with Postbot.

2. Prompt and workflow management:

As more teams contribute to Postbot’s development within Postman, the integration of additional backend services and new use cases has escalated, complicating the backend services.

Postman is beginning to decouple prompt management from backend services alongside implementing a version control mechanism for prompts and workflows. This strategy reduces backend complexity and heightens collaborative flexibility. Subsequently, implementing real-time prompt observability enhances prompt management, aiding a seamless development process. Ultimately, Postman’s focus on prompt management and observability has led to a continual improvement cycle such that every iteration of the product will only get better from here..

3. Overcoming Non-determinism:

The error-prone nature of natural language interfaces, with potential intent identification issues, yields a trade-off between determinism and customization.. Natural language lets users customize intent, but it is very non-deterministic in nature leading to issues in the flow.

Postman chose to decrease non-determinism as much as possible at the start to prevent system-wide error propagation

This was achieved using fixed UI call-to-actions, linking these CTAs to AI workflows or prompts in the backend, thereby bypassing the entire identification pipeline.

Although customization takes a hit, more flexible CTAs like suggestion trees, chat redirects and slash commands address this challenge.

Product & Implementation:

Seemingly minor features can significantly enhance underlying feature usage, even if they don’t initially appear AI-focused.

For example, embedding AI into the feature led to a 200% increase in API response visualization! This tripled within a month of introducing automated visualization of API responses to parse JSON documents instead of the complicated handlebars script implementation.

Design should cater to a broader user demographic with varying use cases, requiring a balanced combination of fixed UI CTAs and Natural Language interfaces.

“20% of the users use natural language whereas 80% still use fixed UI CTAs. 28% of all events occur through natural language, whereas 72% use fixed UI CTAs”

A global chat UI enhances control while expressing intent and customizing instructions to the AI assistant. Moreover, it aids in mining user expectations and understanding their workflows and usage habits. Product-wide support and fallback lead to efficient feature usage, whether or not they’re AI-automatable.

AI should serve as a fallback, not an immediate solution to all problems.

“Intent identification was achieved by initially adopting a zero-shot LLM strategy, which provided a natural language interface but resulted in a meager 28% accuracy with high latency.

Postman, then, transitioned to a non-LLM static classifier that increased accuracy to 85% with a reduction in latency to 0.3 seconds.

To accommodate the training, Postman leveraged LLMs to generate synthetic data, enabling the classifier to learn and adapt without the need for large volumes of real-world data.“

Without defaulting to LLMs, high-impact workflows can be effectively managed. Postman’s “Fix-Request Workflow” implementation ensures a smooth process for users to achieve a ‘200 OK’ status on their API requests, demonstrating strategic AI use in product development.

Rajaswa explains “The objective of implementing a Fix-Request workflow was to ensure a smooth process for users to attain a ‘200 OK’ status on their API requests.

Our team faced the challenge of handling various errors, ranging from authentication issues to parameter-related problems. We devised a two-pronged approach to address this challenge”

Firstly, they incorporated built-in checks to identify common errors such as typos or incorrect environments
Secondly, they reserved LLMs as a fallback for complex cases to provide users with relevant documentation or web-based solutions. By effectively managing high-impact workflows without defaulting to LLMs, they streamlined the user experience and significantly improved their crucial performance metric – time to ‘200 OK’.

By focusing on familiar products, prompt and workflow management, and effectively handling non-determinism, Postman has successfully implemented LLMs into production with Postbot.

This article is based on a talk hosted by Lightspeed SF and curated by Portkey on ‘LLMs in Production.’ Special thanks to Rohit Agarwal (Founder, Portkey) & Vrushank Vyas (GTM, Portkey) for helping put together this case study.