Analyze and score every sales call with AI: an n8n + Whisper + GPT-4o pipeline

·9 min read
Updated on April 26, 2026

Every sales call contains signals: recurring objections, skipped script stages, opportunities never qualified. Most end up in an audio file no one ever replays.

What follows is a pipeline that transcribes each call, scores it against your sales script, and attaches the analysis to the right contact in your CRM — automatically, the moment the conversation ends. No more listening to 200 calls a month to coach the team: the scoring lands flat, ready to act on.

The context: a marketing agency in Moldova, calls in Russian, Romanian and English. The architecture transposes to any sales team that runs its own playbook.

The problem: a sales call = lost data

Without an analysis system, a sales call leaves three traces: the rep's memory, maybe a few notes, and an audio file no one opens. Three traces that evaporate within days.

The cost is invisible but real. Coaching is done on instinct, on the two or three calls a manager had time to listen to. Recurring objections never make it back to marketing. Hot leads that deserved an immediate follow-up slip through because no one re-read the transcript. And when a rep leaves, their knowledge of objections leaves with them.

Across 200 calls a month, that's mathematically several deals per quarter walking out the door. The pipeline below fixes exactly that.

The pipeline: from PBX to CRM in 7 steps

The architecture for analyzing a sales call fits in a single n8n workflow, triggered automatically on each new recording.

Calls originate from the client's PBX. An upstream workflow (which I didn't build) syncs the MP3 recordings and triggers my pipeline via an Execute Workflow Trigger. From there, everything is automated.

Step one: parse the call metadata. Phone number, start time, duration, recording URL. The phone number is cleaned and formatted for CRM lookup.

Step two: download the audio file from the PBX. A simple HTTP Request that fetches the MP3.

Step three: transcribe with OpenAI Whisper. The whisper-1 model receives the audio file and a context prompt indicating the expected languages and business domain. This prompt makes a real difference in transcription quality, especially for proper nouns and technical vocabulary. Without it, Whisper interprets company and product names creatively.

Step four: build the GPT-4o request. This is the core of the system. The JavaScript Code node assembles a system prompt containing three elements: the agency's context (services, pricing, client cases), the complete sales script (stages, objections, scripted responses), and the analysis instructions. GPT-4o must produce a structured JSON with a summary, a script compliance score (1 to 10), script stages followed or missed, concrete recommendations, and next actions.

One detail that took time to calibrate: the analysis depth adapts to call duration. A call under 3 minutes gets a brief summary. Between 3 and 6 minutes, the analysis is detailed with quotes. Beyond 6 minutes, it's exhaustive with a review of each script stage. This was a client request after the first tests, and it noticeably improved the relevance of the feedback.

Step five: search for the contact in Twenty CRM by phone number, via the GraphQL API. If the contact exists, update it. If not, create it with the call metadata.

Steps six and seven: create a markdown-formatted note (with summary, score, recommendations) and attach it to the contact via Twenty's Note Targets. The result, for each call, is an up-to-date contact record with the complete analysis history.

Transcribing multilingual calls: what works with Whisper

This agency's calls switch between Russian, Romanian and sometimes English, often within the same conversation. Whisper handles this natively, without needing to specify the language upfront. Detection is automatic.

What makes the difference is the prompt field in the Whisper API. It's not a prompt in the GPT sense, it's more of a context guide. By indicating the expected languages, the company name, and the business domain, transcription quality improves measurably. Proper nouns are better recognized, domain vocabulary is more faithful. Without this prompt, Whisper produces technically correct transcriptions but with errors on the terms that matter.

The other thing to handle is the analysis language. I added an instruction in the GPT-4o system prompt so that the analysis is written in the same language as the conversation. A Russian call produces a Russian analysis. Romanian in Romanian. It seems obvious, but without this explicit instruction, GPT-4o systematically responds in English.

Why Twenty CRM (not HubSpot or Pipedrive) for this project

The client had no CRM. One had to be chosen. I ruled out HubSpot and Pipedrive and proposed Twenty, a relatively recent open-source CRM, for several reasons.

HubSpot has a solid API but the plans that expose useful automations quickly start at several hundred euros per month for an SME. Pipedrive is more affordable on price, but its REST API is more rigid and attaching Notes to a contact via webhook is messier than on Twenty.

On Twenty, three arguments. First, I use it myself. I know its API, its strengths and its limitations. As a freelancer, using a tool you master on a client project is the difference between delivering with confidence and debugging blind.

Second, Twenty exposes clean, well-documented REST and GraphQL APIs with simple header authentication. For an n8n integration project, that's ideal. CRUD operations on contacts and notes work as expected, which isn't always the case with older CRMs whose APIs were bolted on after the fact.

Third, Twenty is available as a hosted service ($9/user/month) and self-hosted (free). The client chose the hosted version to avoid managing a server. If you want to self-host, Twenty installs cleanly on the same stack as the one described in this article.

Connecting WhatsApp, Instagram and Messenger to the CRM via Meta API

The initial scope was the call pipeline. But scope evolved. The agency also wanted to receive WhatsApp messages, Instagram DMs and Facebook Messenger messages directly in the CRM.

This plunged me into the Meta Business Manager ecosystem for the first time. Configuring a Facebook App, WhatsApp Cloud API with Embedded Signup, linking a Facebook Page and Instagram Business account, submitting to Meta's App Review for Messenger and Instagram production permissions.

What I learned: the Meta configuration is an administrative maze, not a technical one. The APIs work. The documentation is accurate. But the path to a "production" state goes through verification forms, review wait times (5 to 15 business days for Messenger/Instagram), and prerequisites that aren't always obvious on first reading.

I built three additional n8n workflows, one per channel (WhatsApp, Instagram DM, Facebook Messenger), that receive incoming messages via webhook and push them into Twenty CRM. The architecture is the same for all three: Meta webhook, payload parsing, contact lookup or creation, note creation.

What this project taught me (and what you can save)

Managing a project through an intermediary (a project manager between the integrator and the end client) adds a non-trivial communication layer: filtered feedback, misaligned expectations, drawn-out back-and-forth. For an SME outsourcing an automation project, the right move is to keep a direct channel with the integrator on technical questions, and reserve the PM for scope decisions.

Scope creep on a fixed-price project is a classic trap. Each "quick additional question" that accumulates ends up doubling the time spent. From the client side, it's tempting — from the delivery side, that's often where quality drops. The fix: a contract that explicitly lists what's included and a simple protocol for additions (a costed amendment, not a quick message in the team chat).

What I delivered

Summary of what's running today at this client's:

  • A configured and operational Twenty CRM
  • One n8n call analysis workflow (AI transcription, scoring against the sales script, automatic CRM push, no human in the loop)
  • Three n8n messaging reception workflows (WhatsApp, Instagram DM, Facebook Messenger)
  • A configured Meta Business Manager with linked accounts and App Review passed
  • Complete technical documentation

The call pipeline processes each recording end-to-end, from raw MP3 to analysis note attached to the right contact in the CRM. On the same API integration principle, I wrote a field report on Pennylane invoicing automation with n8n — different client, same pipeline logic.

Takeaways if you want the same setup

The pipeline transposes to any sales team that records its calls (most modern PBXs do, sometimes without anyone noticing). What depends on your context: the exact PBX, the target CRM, the granularity of your sales script. What's standard: the Whisper + GPT-4o + n8n workflow + CRM API chain, and the cost ballpark (a few dozen euros per month in API fees for SME volume).

If you want the same kind of system for your team, that's exactly the kind of project I take on under the Automation & Workflows page. A first 30-minute call to scope the project, no commitment: book a slot.

PS on Upwork

I took this project on Upwork without expecting any income from it. The logic: with no track record or reviews, a freelance profile shows up in zero searches. A clean first delivery unlocks the first 5-star review, which unlocks visibility, which unlocks the next missions. It's an investment in platform credibility, not income — useful to know if you're starting out as a freelancer.


FAQ

How do I automatically analyze phone calls with n8n?

The pipeline has three stages: fetch the audio recording from the PBX, transcribe it with OpenAI Whisper (whisper-1 model, with a context prompt to guide speech recognition), then send the transcript to GPT-4o with a structured system prompt containing the sales script and business context. GPT-4o returns a structured JSON with summary, script compliance score, and recommendations.

Can Whisper transcribe multilingual calls?

Yes. Whisper handles multilingual content natively. The API's prompt field lets you specify expected languages and business context, which measurably improves accuracy on proper nouns and domain-specific vocabulary. In my case, calls switched between Russian, Romanian and English, and Whisper handled it correctly without specifying the language upfront.

How do I connect n8n to Twenty CRM?

Twenty CRM exposes both a REST API and a GraphQL API. To search for a contact by phone number, the GraphQL API with a filter on phones.primaryPhoneNumber is most efficient. To create or update a contact and attach a note, the REST API works well. Authentication is done via API key header.

How much does an AI sales call analysis system cost?

For a typical SME volume of 200 calls per month (5-minute average duration), expect roughly $35-55 per month in OpenAI API (Whisper for transcription, GPT-4o for analysis), $5-25 per month to host n8n (self-hosted on a $5/month VPS or n8n Cloud around $20/month), plus the CRM cost (Twenty self-hosted is free, hosted starts at $9/user/month). Initial integration cost depends on scope — a full pipeline like the one described here is measured in days of work, not months.


Questions? Feel free to reach out on LinkedIn or use the contact form.

English version Read in French