Building LLM Applications with LangChain: Go, Python, and AWS

2024-09-15

I write most of my infrastructure code in Go and most of my AI prototypes in Python, and for a long time those two worlds did not talk to each other. Then I needed an LLM-powered feature inside a Go microservice — not a standalone chatbot, but a component embedded in a larger system — and I discovered LangChainGo. It is younger and leaner than its Python sibling, but it follows the same abstractions, and that consistency matters when your team spans both languages. What follows is a practical tour of both ecosystems: Go for the lightweight integrations, Python for the full RAG pipelines, and AWS Bedrock as the production backend behind both.

The LangChain Philosophy

LangChain provides composable building blocks for LLM applications:

Models: Unified interface to various LLM providers
Prompts: Templates and management for model inputs
Chains: Sequences of calls to models and utilities
Memory: State persistence across interactions
Retrieval: Integration with vector stores and document loaders

The key insight: LLM applications are pipelines, not single API calls.

Getting Started: LangChain Go with Ollama

The simplest entry point uses a local LLM through Ollama. This avoids API costs and latency while prototyping.

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/ollama"
)

func main() {
    llm, err := ollama.New(ollama.WithModel("llama2"))
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()
    completion, err := llms.GenerateFromSinglePrompt(
        ctx,
        llm,
        "Human: Who was the first man to walk on the moon?\nAssistant:",
        llms.WithTemperature(0.8),
        llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
            fmt.Print(string(chunk))
            return nil
        }),
    )
    if err != nil {
        log.Fatal(err)
    }

    _ = completion
}

Key points:

ollama.New() connects to a local Ollama instance
WithModel("llama2") selects the model to use
WithStreamingFunc enables real-time token streaming
WithTemperature(0.8) controls randomness in responses

Scaling Up: AWS Bedrock Integration in Go

For production workloads, AWS Bedrock provides managed access to foundation models including Claude, Llama, and Titan.

package main

import (
    "context"
    "flag"
    "fmt"
    "log"

    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/bedrock"
)

func main() {
    var (
        prompt    = flag.String("prompt", "Summarize the novel 'Fairy Tale'", "Prompt to send")
        awsRegion = flag.String("region", "eu-west-1", "AWS region")
        verbose   = flag.Bool("verbose", false, "Enable verbose output")
    )
    flag.Parse()

    ctx := context.Background()

    // Create Bedrock LLM with Claude Haiku.
    // Note: `langchaingo`'s model constants track a snapshot of the
    // Bedrock catalogue; for newer Claude 4.x / Haiku 4.5 IDs you may
    // need to pass the raw model string via bedrock.WithModel("...").
    opts := []bedrock.Option{
        bedrock.WithModel(bedrock.ModelAnthropicClaudeV3Haiku),
    }

    llm, err := bedrock.New(opts...)
    if err != nil {
        log.Fatalf("Failed to create Bedrock LLM: %v", err)
    }

    if *verbose {
        fmt.Printf("AWS Region: %s\n", *awsRegion)
        fmt.Printf("Prompt: %s\n", *prompt)
    }

    // Simple Call method
    response, err := llm.Call(ctx, *prompt)
    if err != nil {
        log.Printf("Error calling model: %v", err)
    } else {
        fmt.Printf("Response: %s\n", response)
    }

    // GenerateContent with structured messages
    messages := []llms.MessageContent{
        {
            Role: llms.ChatMessageTypeSystem,
            Parts: []llms.ContentPart{
                llms.TextPart("You are a helpful assistant."),
            },
        },
        {
            Role: llms.ChatMessageTypeHuman,
            Parts: []llms.ContentPart{
                llms.TextPart(*prompt),
            },
        },
    }

    resp, err := llm.GenerateContent(ctx, messages)
    if err != nil {
        log.Printf("Error generating content: %v", err)
    } else {
        if len(resp.Choices) > 0 {
            fmt.Printf("Response: %s\n", resp.Choices[0].Content)
        }
    }
}

The Go Bedrock integration provides:

Two calling patterns: Simple Call() for basic prompts, GenerateContent() for structured conversations
Message types: System, Human, and AI message roles
AWS credential handling: Uses standard AWS SDK credential chain

Full RAG Pipeline: Python with LangChain

For complex applications, Python’s LangChain offers the most mature ecosystem. Here’s a complete RAG implementation using AWS Bedrock, Titan embeddings, and Qdrant vector store.

Model IDs are version-specific and region-scoped. The Bedrock catalogue has moved on twice since this article was written — Claude 3.5 Sonnet, then 3.7, then Claude 4 and 4.5; Titan embeddings v2 remains current but check regional availability. Always consult the AWS Bedrock console for the identifier that matches the region you are deploying in before copying the snippets below.

from langchain.chat_models import init_chat_model
from langchain_aws import BedrockEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
import os

# LLM Setup - Claude 3.7 Sonnet via AWS Bedrock
model = init_chat_model(
    "eu.anthropic.claude-3-7-sonnet-20250219-v1:0",
    model_provider="bedrock_converse"
)

# Embedding model - Amazon Titan
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")

# Vector store - Qdrant Cloud
qdrant_client = QdrantClient(
    url=os.getenv("QDRANT_CLOUD_URL"),
    api_key=os.getenv("QDRANT_CLOUD_KEY"),
)

vector_store = QdrantVectorStore(
    client=qdrant_client,
    collection_name="langchainpy-aws-poc",
    embedding=embeddings,
)

Document Loading and Chunking

RAG begins with ingesting documents:

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load web content with targeted parsing
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# Split into chunks for embedding
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
all_splits = text_splitter.split_documents(docs)

# Index in vector store
_ = vector_store.add_documents(documents=all_splits)

Key considerations:

chunk_size=1000: Balance between context and specificity
chunk_overlap=200: Prevents information loss at boundaries
Targeted parsing: BeautifulSoup filters relevant content

LangGraph Orchestration

LangGraph provides state management and workflow orchestration:

from langchain import hub
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Pull a standard RAG prompt template
prompt = hub.pull("rlm/rag-prompt")

# Define application state
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# Retrieval step
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}

# Generation step
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({
        "question": state["question"],
        "context": docs_content
    })
    response = model.invoke(messages)
    return {"answer": response.content}

# Build and compile the graph
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

# Execute
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

The pipeline:

Retrieve: Vector similarity search finds relevant document chunks
Generate: LLM synthesizes answer from retrieved context

Architecture Comparison

Aspect	Go (LangChainGo)	Python (LangChain)
Maturity	Growing	Mature
Providers	Ollama, Bedrock, OpenAI	50+ integrations
RAG Support	Basic	Full ecosystem
LangGraph	Not available	Full support
Performance	Lower latency	More features
Use Case	Microservices, CLI tools	Complex AI apps

When to Use Each

In practice, the choice is less about language preference and more about where the LLM sits in your architecture.

Choose Go when the LLM call is a feature inside a larger service — a summarisation step in a data pipeline, a classification call in a routing layer. Go gives you a single binary, predictable memory, and the concurrency model to handle thousands of in-flight requests. You do not need LangGraph for that.

Choose Python when the LLM is the application — a RAG chatbot, an agent with tool use, anything that needs LangGraph orchestration, multi-step retrieval, or the full ecosystem of document loaders and vector store integrations. Python’s maturity here is not just ahead; it is a different category entirely.

Production Considerations

AWS Bedrock Setup

Enable model access in the AWS Console
Configure IAM permissions for bedrock:InvokeModel
Use cross-region inference endpoints for newer models
Monitor costs—embeddings and completions bill separately

Vector Store Selection

Store	Best For
Qdrant	Production, managed cloud option
Pinecone	Serverless, auto-scaling
pgvector	PostgreSQL integration
FAISS	Local development, in-memory

Chunking Strategy

The chunk size affects retrieval quality:

Smaller chunks (500-1000): More precise retrieval, may lose context
Larger chunks (1500-2000): Better context, noisier retrieval
Overlap (10-20%): Ensures continuity across chunk boundaries

Reflection

The real value of LangChain — in either language — is not the framework itself but the abstraction boundary it enforces. Today I call Bedrock; tomorrow it might be a self-hosted model behind vLLM. The provider changes, the interface does not. In a landscape where model capabilities shift every few months and pricing changes overnight, that portability is not a nicety; it is an architectural requirement.

If I were starting a new project today, I would prototype the RAG pipeline in Python, prove the retrieval quality, then ask a simple question: does this need to live inside an existing Go service, or can it stand alone? The answer to that question — not language loyalty — should determine the stack.

Building LLM Applications with LangChain: Go, Python, and AWS

A practical guide to LLM applications with AWS Bedrock integration.

Achraf SOLTANI — September 15, 2024

The Sanctuary