RAG

RAG Is Evolving into GraphRAG

Apr 18, 2026·13 min read

A legal-tech company has thousands of contracts, policies, and case notes. Classic vector RAG retrieves similar text chunks, but answers still miss cross-document relationships such as parties, obligations, jurisdiction links, and timeli...

RAG

RAG Is Evolving into GraphRAG

Scenario

Why Classic RAG Breaks at Scale

Classic RAG is strong for local semantic similarity. It is weaker when the question depends on explicit relationships.

Typical failure patterns:

entity ambiguity ("Acme Holdings" vs "Acme Holdings LLC")
long-range reasoning across many documents
missing temporal/causal chains
weak explainability for why a given answer was produced

For legal and compliance contexts, these gaps directly affect trust and auditability.

GraphRAG Concept

GraphRAG combines:

vector retrieval for semantic recall
graph retrieval for relationship-aware context

The graph layer stores entities and edges (for example, COMPANY -> HAS_OBLIGATION -> CLAUSE). Retrieval becomes a two-stage process: semantic candidate generation + graph neighborhood expansion.

AWS Architecture Options

Option A: Vector-only RAG (lowest complexity)

S3 + chunking + vector index
Lower effort, lower relationship fidelity

Option B: GraphRAG with Neptune + vector store (recommended for this scenario)

S3 for raw docs
extraction pipeline (Lambda/Step Functions)
graph in Neptune
vector index in OpenSearch Serverless vector collection
answer orchestration in FastAPI

Option C: Hybrid with Aurora pgvector + Neptune

useful when SQL joins are already central
slightly more operational tuning

graph TD DOCS[Legal Documents in S3] --> ETL[Extraction Pipeline] ETL --> ENT[Entity + Relation Extractor] ENT --> G[(Amazon Neptune Graph)] ETL --> CHUNK[Text Chunker + Embeddings] CHUNK --> V[(OpenSearch Serverless Vector Collection)] API[FastAPI Retrieval API] --> V API --> G API --> LLM[LLM Inference Layer] API --> CW[CloudWatch Logs/Metrics]

Storage Design Choices

Graph store: Neptune

best for traversals, paths, and relationship-heavy retrieval
supports Gremlin/openCypher/SPARQL query styles

Vector store: OpenSearch Serverless vector collection

simple managed option for high-scale semantic lookup

Raw content: S3

source of truth for documents and extracted artifacts

Step-by-Step Tutorial

1) Create S3 bucket and prefixes

export AWS_REGION=us-east-1
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export PROJECT=legal-graphrag
export DOC_BUCKET=${PROJECT}-${ACCOUNT_ID}-${AWS_REGION}

aws s3api create-bucket --bucket "$DOC_BUCKET" --region "$AWS_REGION"
aws s3api put-object --bucket "$DOC_BUCKET" --key raw/
aws s3api put-object --bucket "$DOC_BUCKET" --key extracted/
aws s3api put-object --bucket "$DOC_BUCKET" --key chunks/

$env:AWS_REGION = "us-east-1"
$env:ACCOUNT_ID = (aws sts get-caller-identity --query Account --output text)
$env:PROJECT = "legal-graphrag"
$env:DOC_BUCKET = "$($env:PROJECT)-$($env:ACCOUNT_ID)-$($env:AWS_REGION)"

aws s3api create-bucket --bucket $env:DOC_BUCKET --region $env:AWS_REGION
aws s3api put-object --bucket $env:DOC_BUCKET --key raw/
aws s3api put-object --bucket $env:DOC_BUCKET --key extracted/
aws s3api put-object --bucket $env:DOC_BUCKET --key chunks/

2) Create Neptune cluster (graph layer)

aws neptune create-db-subnet-group \
  --db-subnet-group-name legal-graphrag-subnets \
  --db-subnet-group-description "Subnet group for GraphRAG" \
  --subnet-ids subnet-aaaa1111 subnet-bbbb2222

aws neptune create-db-cluster \
  --db-cluster-identifier legal-graphrag-cluster \
  --engine neptune \
  --db-subnet-group-name legal-graphrag-subnets \
  --vpc-security-group-ids sg-0123456789abcdef0 \
  --backup-retention-period 7

aws neptune create-db-instance \
  --db-instance-identifier legal-graphrag-instance-1 \
  --db-instance-class db.r6g.large \
  --engine neptune \
  --db-cluster-identifier legal-graphrag-cluster

3) Create OpenSearch Serverless vector collection

aws opensearchserverless create-collection \
  --name legal-graphrag-vectors \
  --type VECTORSEARCH \
  --description "Vector collection for legal GraphRAG"

Then configure encryption, network, and data access policies for least privilege before indexing documents.

4) Entity and relationship extraction script

extract_entities.py

import json
import re
from dataclasses import dataclass

@dataclass
class Triple:
    source: str
    relation: str
    target: str


def naive_extract(text: str) -> list[Triple]:
    triples: list[Triple] = []
    orgs = re.findall(r"\b[A-Z][A-Za-z0-9& ]+(?:LLC|Inc|Ltd|Corp|Bank)\b", text)
    for i in range(len(orgs) - 1):
        triples.append(Triple(orgs[i].strip(), "RELATED_TO", orgs[i + 1].strip()))
    return triples


def main(in_path: str, out_path: str) -> None:
    with open(in_path, "r", encoding="utf-8") as f:
        docs = json.load(f)

    output = []
    for doc in docs:
        triples = naive_extract(doc["text"])
        output.append({
            "doc_id": doc["doc_id"],
            "triples": [t.__dict__ for t in triples]
        })

    with open(out_path, "w", encoding="utf-8") as f:
        json.dump(output, f, indent=2)


if __name__ == "__main__":
    main("sample_docs.json", "graph_triples.json")

In production, replace naive extraction with an LLM-assisted extractor plus deterministic validation rules.

5) Load graph triples into Neptune (example)

load_to_neptune.py

import json
from gremlin_python.driver import client

NEPTUNE_ENDPOINT = "wss://your-neptune-endpoint:8182/gremlin"

g = client.Client(NEPTUNE_ENDPOINT, "g")

with open("graph_triples.json", "r", encoding="utf-8") as f:
    items = json.load(f)

for item in items:
    for t in item["triples"]:
        src = t["source"].replace("'", "")
        rel = t["relation"]
        tgt = t["target"].replace("'", "")

        q = f"""
        g.V().has('Entity','name','{src}').fold().coalesce(unfold(), addV('Entity').property('name','{src}')).as('a')
         .V().has('Entity','name','{tgt}').fold().coalesce(unfold(), addV('Entity').property('name','{tgt}')).as('b')
         .coalesce(__.select('a').outE('{rel}').where(inV().as('b')), __.addE('{rel}').from('a').to('b'))
        """
        g.submit(q).all().result()

print("Loaded triples into Neptune")

6) Build retrieval orchestrator (FastAPI)

retrieval_api.py

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI(title="Legal GraphRAG API")

class Ask(BaseModel):
    question: str


def vector_search(question: str) -> list[str]:
    # Replace with OpenSearch vector query.
    return ["chunk_12", "chunk_87", "chunk_203"]


def graph_expand(seed_entities: list[str]) -> list[str]:
    # Replace with Neptune traversal query.
    return ["obligation_clause_45", "governing_law_section_9"]


@app.post("/ask")
def ask(req: Ask):
    chunk_ids = vector_search(req.question)
    seed_entities = ["Acme Holdings LLC"]
    related_nodes = graph_expand(seed_entities)

    context = {
        "vector_chunks": chunk_ids,
        "graph_context": related_nodes,
    }
    # Pass combined context to your model layer.
    return {"context_used": context, "answer": "Generated answer placeholder"}

7) Retrieval flow in production

Normalize user question and run policy checks.
Run vector search for semantic candidates.
Extract seed entities from question and top chunks.
Expand graph neighborhood with bounded depth.
Merge, deduplicate, and rerank context.
Generate answer with citations to chunks and graph facts.
Log retrieval path for auditability.

Security and Governance

Use IAM roles for all services (no static keys).
Keep private graph/vector resources in VPC/private network policies.
Encrypt S3, Neptune snapshots, and vector data at rest.
Redact PII in extracted triples before indexing.
Maintain provenance fields (doc_id, section, version) in every node/chunk.

Monitoring and Operations

CloudWatch metrics:
retrieval latency
vector recall hit rate
graph expansion breadth
answer citation coverage
Alarm on graph load failures and retrieval timeout spikes.
Track index freshness lag from source documents to retrievable context.

Cost Optimization

Keep graph expansion depth small (for example depth 1-2 by default).
Cache frequent query subgraphs.
Use selective extraction for high-value documents first.
Batch ingestion pipelines with Step Functions and queue workers.
Prefer serverless/vector autoscaling where traffic is bursty.

Pricing reminder: verify live pricing before estimating TCO.

Neptune: https://aws.amazon.com/neptune/pricing/
OpenSearch Service: https://aws.amazon.com/opensearch-service/pricing/
S3: https://aws.amazon.com/s3/pricing/

When GraphRAG Is Worth the Extra Complexity

Use GraphRAG when at least two are true:

questions require explicit relationship reasoning
users demand explainable provenance chains
entity ambiguity is frequent
multi-hop retrieval materially improves decision quality

If questions are mostly local FAQ-style lookups, start with vector RAG and add graph components incrementally.

Production-readiness checklist

Data model for entities/relations approved by domain experts
Ingestion pipeline idempotent and replay-safe
Retrieval path logged with citations and graph hops
PII and legal sensitivity controls enforced
Quality eval suite includes relation-heavy queries
Cost dashboards and budgets configured
Incident runbooks for index lag and graph corruption tested

Final takeaway

GraphRAG is not a universal replacement for classic RAG. It is a targeted upgrade for relationship-heavy domains like legal-tech, where structure and traceability are part of the product value.

Source

platform/archive/articles/rag-is-evolving-into-graphrag.md

RAG Is Evolving into GraphRAG

Scenario

Why Classic RAG Breaks at Scale

GraphRAG Concept

AWS Architecture Options

Option A: Vector-only RAG (lowest complexity)

Option B: GraphRAG with Neptune + vector store (recommended for this scenario)

Option C: Hybrid with Aurora pgvector + Neptune

Storage Design Choices

Graph store: Neptune

Vector store: OpenSearch Serverless vector collection

Raw content: S3

Step-by-Step Tutorial

1) Create S3 bucket and prefixes

2) Create Neptune cluster (graph layer)

3) Create OpenSearch Serverless vector collection

4) Entity and relationship extraction script

5) Load graph triples into Neptune (example)

6) Build retrieval orchestrator (FastAPI)

7) Retrieval flow in production

Security and Governance

Monitoring and Operations

Cost Optimization

When GraphRAG Is Worth the Extra Complexity

Production-readiness checklist

Final takeaway

Related Articles

GraphRAG + Blockchain Provenance on AWS: Relationship-Aware and Tamper-Evident QA

Verifiable RAG on AWS: Cryptographic Provenance for Retrieval Results

Bedrock vs SageMaker: Choosing the Right AWS AI Platform