LLM Accuracy Evaluation from Docs

Content: 02177.zip (44.77 KB)
Uploaded: 15.01.2026

Positive responses: 0
Negative responses: 0

Sold: 0
Refunds: 0

Seller: Automatizator
information about the seller and its items

# Evaluate LLM accuracy against PDFs from Google Drive with results in Sheets

This automation validates the factual accuracy of LLM responses by comparing them to source documents stored as PDFs in Google Drive. It pulls test cases from Google Sheets, downloads and extracts text from referenced PDFs, then sends the context to a judge LLM via OpenRouter. The model evaluates correctness, returns a structured Pass/Fail decision with reasoning, and the result is logged back in the spreadsheet. Designed for systematic, repeatable audits of LLM performance.

## Who it´s for
- AI evaluation specialists testing text generation accuracy
- Machine learning engineers automating LLM audits
- Chatbot development teams validating response quality on benchmark data
- Legal and compliance teams ensuring reliability of AI outputs

## What the automation does
- Triggered manually upon user request
- Fetches test cases from Google Sheets: input prompt, LLM output, PDF link
- Downloads the PDF from Google Drive and extracts text using Google Drive API
- Sends the query, response, and document content to a judge LLM via OpenRouter
- Receives a structured JSON verdict: Pass/Fail, reasoning, detected hallucinations or omissions
- Logs results back into the same Google Sheet for audit and analysis
- Includes a half-second delay between calls to respect API rate limits

## What´s included
- Ready-to-use n8n workflow with LangChain agent logic
- Manual trigger and batch processing setup
- Integrations with Google Sheets, Google Drive, and OpenRouter
- Basic textual guide for setup and adaptation

## Requirements for setup
- n8n instance with workflow execution capability
- Google account with enabled Google Drive and Google Sheets APIs
- OpenRouter account with access to a judge LLM model
- Access to source PDF documents and test case data in Google Sheets

## Benefits and outcomes
- Eliminates manual review of LLM outputs
- Reduces audit time from hours to minutes
- Ensures consistent, objective evaluation using a standardized judge model
- Enables scalable testing across multiple LLM versions or providers
- Transparent audit trail: all decisions saved with explanations
- Supports compliance with AI governance standards

## Important: template only
Important: you are purchasing a ready-made automation workflow template only. Rollout into your infrastructure, connecting specific accounts and services, 1:1 setup help, custom adjustments for non-standard stacks and any consulting support are provided as a separate paid service at an individual rate. To discuss custom work or 1:1 help, contact via chat

No feedback yet