This workflow automates the validation of AI-generated responses against source documents in PDF format. It fetches test cases from Google Sheets, retrieves corresponding PDFs from Google Drive, uses GPT-4 via OpenRouter as an expert judge to assess factual accuracy, and logs structured verdicts (Pass/Fail) with reasoning back into the spreadsheet. Designed for teams requiring consistent, auditable evaluation of LLM outputs.
## Who it´s for
- AI quality assurance specialists validating model responses
- LLM application developers needing automated output validation
- MLOps teams implementing generative model output controls
- Legal and compliance teams verifying AI alignment with official documents
## What the automation does
- Manually triggered to pull AI input/output pairs from Google Sheets
- Downloads and extracts text from referenced PDFs in Google Drive
- Uses GPT-4 via OpenRouter as a judge to compare AI output against source content
- Receives structured JSON verdict with decision and explanation
- Writes results back to the same Google Sheet
- Adds a half-second delay between iterations to avoid API rate limits
## What´s included
- Ready-to-use n8n workflow with LangChain agent logic
- Manual trigger handling and batch processing sequence
- Integrations with Google Sheets, Google Drive, OpenRouter, and LLM Evaluation API
- Basic setup and adaptation guide
## Requirements for setup
- n8n instance with workflow execution access
- Google account with enabled Google Drive and Google Sheets APIs
- OpenRouter API key for GPT-4 access
- Access to Google Sheet containing test cases and Google Drive folder with PDFs
## Benefits and outcomes
- Eliminates manual review of AI responses
- Ensures objective, repeatable accuracy assessment
- Centralizes test results in Google Sheets for trend analysis
- Enables benchmarking multiple LLMs on identical tasks
- Detects hallucinations, omissions, and inaccuracies in AI outputs
- Supports audit and regulatory compliance requirements
## Important: template only
Important: you are purchasing a ready-made automation workflow template only. Rollout into your infrastructure, connecting specific accounts and services, 1:1 setup help, custom adjustments for non-standard stacks and any consulting support are provided as a separate paid service at an individual rate. To discuss custom work or 1:1 help, contact via Telegram: @gleb923.
LLM accuracy evaluation
AI output validation
fact-check AI responses
compare AI output to source
LLM judge workflow
automated LLM testing
factual accuracy assessment
PDF document processing
Google Drive API integration
Google Sheets logging
manual workflow trigger
GPT-4 via OpenRouter
n8n automation
LangChain agent
detect AI hallucinations
batch evaluation of LLMs
No feedback yet