Building LLM Applications with LangChain: Go, Python, and AWS
I write most of my infrastructure code in Go and most of my AI prototypes in Python, and for a long time those two worlds did not talk to each other. Then I needed an LLM-powered feature inside a Go microservice — not a standalone chatbot, but a component embedded in a larger system — and I discovered LangChainGo. It is younger and leaner than its Python sibling, but it follows the same abstractions, and that consistency matters when your team spans both languages. What follows is a practical tour of both ecosystems: Go for the lightweight integrations, Python for the full RAG pipelines, and AWS Bedrock as the production backend behind both.
The LangChain Philosophy
LangChain provides composable building blocks for LLM applications:
- Models: Unified interface to various LLM providers
- Prompts: Templates and management for model inputs
- Chains: Sequences of calls to models and utilities
- Memory: State persistence across interactions
- Retrieval: Integration with vector stores and document loaders
The key insight: LLM applications are pipelines, not single API calls.
Getting Started: LangChain Go with Ollama
The simplest entry point uses a local LLM through Ollama. This avoids API costs and latency while prototyping.
package main
import (
"context"
"fmt"
"log"
"github.com/tmc/langchaingo/llms"
"github.com/tmc/langchaingo/llms/ollama"
)
func main() {
llm, err := ollama.New(ollama.WithModel("llama2"))
if err != nil {
log.Fatal(err)
}
ctx := context.Background()
completion, err := llms.GenerateFromSinglePrompt(
ctx,
llm,
"Human: Who was the first man to walk on the moon?\nAssistant:",
llms.WithTemperature(0.8),
llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
fmt.Print(string(chunk))
return nil
}),
)
if err != nil {
log.Fatal(err)
}
_ = completion
}
Key points:
ollama.New()connects to a local Ollama instanceWithModel("llama2")selects the model to useWithStreamingFuncenables real-time token streamingWithTemperature(0.8)controls randomness in responses
Scaling Up: AWS Bedrock Integration in Go
For production workloads, AWS Bedrock provides managed access to foundation models including Claude, Llama, and Titan.
package main
import (
"context"
"flag"
"fmt"
"log"
"github.com/tmc/langchaingo/llms"
"github.com/tmc/langchaingo/llms/bedrock"
)
func main() {
var (
prompt = flag.String("prompt", "Summarize the novel 'Fairy Tale'", "Prompt to send")
awsRegion = flag.String("region", "eu-west-1", "AWS region")
verbose = flag.Bool("verbose", false, "Enable verbose output")
)
flag.Parse()
ctx := context.Background()
// Create Bedrock LLM with Claude Haiku.
// Note: `langchaingo`'s model constants track a snapshot of the
// Bedrock catalogue; for newer Claude 4.x / Haiku 4.5 IDs you may
// need to pass the raw model string via bedrock.WithModel("...").
opts := []bedrock.Option{
bedrock.WithModel(bedrock.ModelAnthropicClaudeV3Haiku),
}
llm, err := bedrock.New(opts...)
if err != nil {
log.Fatalf("Failed to create Bedrock LLM: %v", err)
}
if *verbose {
fmt.Printf("AWS Region: %s\n", *awsRegion)
fmt.Printf("Prompt: %s\n", *prompt)
}
// Simple Call method
response, err := llm.Call(ctx, *prompt)
if err != nil {
log.Printf("Error calling model: %v", err)
} else {
fmt.Printf("Response: %s\n", response)
}
// GenerateContent with structured messages
messages := []llms.MessageContent{
{
Role: llms.ChatMessageTypeSystem,
Parts: []llms.ContentPart{
llms.TextPart("You are a helpful assistant."),
},
},
{
Role: llms.ChatMessageTypeHuman,
Parts: []llms.ContentPart{
llms.TextPart(*prompt),
},
},
}
resp, err := llm.GenerateContent(ctx, messages)
if err != nil {
log.Printf("Error generating content: %v", err)
} else {
if len(resp.Choices) > 0 {
fmt.Printf("Response: %s\n", resp.Choices[0].Content)
}
}
}
The Go Bedrock integration provides:
- Two calling patterns: Simple
Call()for basic prompts,GenerateContent()for structured conversations - Message types: System, Human, and AI message roles
- AWS credential handling: Uses standard AWS SDK credential chain
Full RAG Pipeline: Python with LangChain
For complex applications, Python’s LangChain offers the most mature ecosystem. Here’s a complete RAG implementation using AWS Bedrock, Titan embeddings, and Qdrant vector store.
Model IDs are version-specific and region-scoped. The Bedrock catalogue has moved on twice since this article was written — Claude 3.5 Sonnet, then 3.7, then Claude 4 and 4.5; Titan embeddings v2 remains current but check regional availability. Always consult the AWS Bedrock console for the identifier that matches the region you are deploying in before copying the snippets below.
from langchain.chat_models import init_chat_model
from langchain_aws import BedrockEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
import os
# LLM Setup - Claude 3.7 Sonnet via AWS Bedrock
model = init_chat_model(
"eu.anthropic.claude-3-7-sonnet-20250219-v1:0",
model_provider="bedrock_converse"
)
# Embedding model - Amazon Titan
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")
# Vector store - Qdrant Cloud
qdrant_client = QdrantClient(
url=os.getenv("QDRANT_CLOUD_URL"),
api_key=os.getenv("QDRANT_CLOUD_KEY"),
)
vector_store = QdrantVectorStore(
client=qdrant_client,
collection_name="langchainpy-aws-poc",
embedding=embeddings,
)
Document Loading and Chunking
RAG begins with ingesting documents:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load web content with targeted parsing
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
# Split into chunks for embedding
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
all_splits = text_splitter.split_documents(docs)
# Index in vector store
_ = vector_store.add_documents(documents=all_splits)
Key considerations:
chunk_size=1000: Balance between context and specificitychunk_overlap=200: Prevents information loss at boundaries- Targeted parsing: BeautifulSoup filters relevant content
LangGraph Orchestration
LangGraph provides state management and workflow orchestration:
from langchain import hub
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
# Pull a standard RAG prompt template
prompt = hub.pull("rlm/rag-prompt")
# Define application state
class State(TypedDict):
question: str
context: List[Document]
answer: str
# Retrieval step
def retrieve(state: State):
retrieved_docs = vector_store.similarity_search(state["question"])
return {"context": retrieved_docs}
# Generation step
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({
"question": state["question"],
"context": docs_content
})
response = model.invoke(messages)
return {"answer": response.content}
# Build and compile the graph
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
# Execute
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])
The pipeline:
- Retrieve: Vector similarity search finds relevant document chunks
- Generate: LLM synthesizes answer from retrieved context
Architecture Comparison
| Aspect | Go (LangChainGo) | Python (LangChain) |
|---|---|---|
| Maturity | Growing | Mature |
| Providers | Ollama, Bedrock, OpenAI | 50+ integrations |
| RAG Support | Basic | Full ecosystem |
| LangGraph | Not available | Full support |
| Performance | Lower latency | More features |
| Use Case | Microservices, CLI tools | Complex AI apps |
When to Use Each
In practice, the choice is less about language preference and more about where the LLM sits in your architecture.
Choose Go when the LLM call is a feature inside a larger service — a summarisation step in a data pipeline, a classification call in a routing layer. Go gives you a single binary, predictable memory, and the concurrency model to handle thousands of in-flight requests. You do not need LangGraph for that.
Choose Python when the LLM is the application — a RAG chatbot, an agent with tool use, anything that needs LangGraph orchestration, multi-step retrieval, or the full ecosystem of document loaders and vector store integrations. Python’s maturity here is not just ahead; it is a different category entirely.
Production Considerations
AWS Bedrock Setup
- Enable model access in the AWS Console
- Configure IAM permissions for
bedrock:InvokeModel - Use cross-region inference endpoints for newer models
- Monitor costs—embeddings and completions bill separately
Vector Store Selection
| Store | Best For |
|---|---|
| Qdrant | Production, managed cloud option |
| Pinecone | Serverless, auto-scaling |
| pgvector | PostgreSQL integration |
| FAISS | Local development, in-memory |
Chunking Strategy
The chunk size affects retrieval quality:
- Smaller chunks (500-1000): More precise retrieval, may lose context
- Larger chunks (1500-2000): Better context, noisier retrieval
- Overlap (10-20%): Ensures continuity across chunk boundaries
Reflection
The real value of LangChain — in either language — is not the framework itself but the abstraction boundary it enforces. Today I call Bedrock; tomorrow it might be a self-hosted model behind vLLM. The provider changes, the interface does not. In a landscape where model capabilities shift every few months and pricing changes overnight, that portability is not a nicety; it is an architectural requirement.
If I were starting a new project today, I would prototype the RAG pipeline in Python, prove the retrieval quality, then ask a simple question: does this need to live inside an existing Go service, or can it stand alone? The answer to that question — not language loyalty — should determine the stack.
Building LLM Applications with LangChain: Go, Python, and AWS
A practical guide to LLM applications with AWS Bedrock integration.
Achraf SOLTANI — September 15, 2024
