Building LLM Applications with LangChain: Go, Python, and AWS
I write most of my infrastructure code in Go and most of my AI prototypes in Python, and for a long time those two worlds did not talk to each other. Then I needed an LLM-powered feature inside a Go microservice — not a standalone chatbot, but a component embedded in a larger system — and I discovered LangChainGo. It is younger and leaner than its Python sibling, but it follows the same abstractions, and that consistency matters when your team spans both languages. What follows is a practical tour of both ecosystems: Go for the lightweight integrations, Python for the full RAG pipelines, and AWS Bedrock as the production backend behind both.
The LangChain Philosophy
LangChain provides composable building blocks for LLM applications:
- Models: Unified interface to various LLM providers
- Prompts: Templates and management for model inputs
- Chains: Sequences of calls to models and utilities
- Memory: State persistence across interactions
- Retrieval: Integration with vector stores and document loaders
The key insight: LLM applications are pipelines, not single API calls.
Getting Started: LangChain Go with Ollama
The simplest entry point uses a local LLM through Ollama. This avoids API costs and latency while prototyping.
package main
import (
"context"
"fmt"
"log"
"github.com/tmc/langchaingo/llms"
"github.com/tmc/langchaingo/llms/ollama"
)
func main() {
llm, err := ollama.New(ollama.WithModel("llama2"))
if err != nil {
log.Fatal(err)
}
ctx := context.Background()
completion, err := llms.GenerateFromSinglePrompt(
ctx,
llm,
"Human: Who was the first man to walk on the moon?\nAssistant:",
llms.WithTemperature(0.8),
llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
fmt.Print(string(chunk))
return nil
}),
)
if err != nil {
log.Fatal(err)
}
_ = completion
}
Key points:
ollama.New()connects to a local Ollama instanceWithModel("llama2")selects the model to useWithStreamingFuncenables real-time token streamingWithTemperature(0.8)controls randomness in responses
Scaling Up: AWS Bedrock Integration in Go
For production workloads, AWS Bedrock provides managed access to foundation models including Claude, Llama, and Titan.
package main
import (
"context"
"flag"
"fmt"
"log"
"github.com/tmc/langchaingo/llms"
"github.com/tmc/langchaingo/llms/bedrock"
)
func main() {
var (
prompt = flag.String("prompt", "Summarize the novel 'Fairy Tale'", "Prompt to send")
awsRegion = flag.String("region", "eu-west-1", "AWS region")
verbose = flag.Bool("verbose", false, "Enable verbose output")
)
flag.Parse()
ctx := context.Background()
// Create Bedrock LLM with Claude Haiku
opts := []bedrock.Option{
bedrock.WithModel(bedrock.ModelAnthropicClaudeV3Haiku),
}
llm, err := bedrock.New(opts...)
if err != nil {
log.Fatalf("Failed to create Bedrock LLM: %v", err)
}
if *verbose {
fmt.Printf("AWS Region: %s\n", *awsRegion)
fmt.Printf("Prompt: %s\n", *prompt)
}
// Simple Call method
response, err := llm.Call(ctx, *prompt)
if err != nil {
log.Printf("Error calling model: %v", err)
} else {
fmt.Printf("Response: %s\n", response)
}
// GenerateContent with structured messages
messages := []llms.MessageContent{
{
Role: llms.ChatMessageTypeSystem,
Parts: []llms.ContentPart{
llms.TextPart("You are a helpful assistant."),
},
},
{
Role: llms.ChatMessageTypeHuman,
Parts: []llms.ContentPart{
llms.TextPart(*prompt),
},
},
}
resp, err := llm.GenerateContent(ctx, messages)
if err != nil {
log.Printf("Error generating content: %v", err)
} else {
if len(resp.Choices) > 0 {
fmt.Printf("Response: %s\n", resp.Choices[0].Content)
}
}
}
The Go Bedrock integration provides:
- Two calling patterns: Simple
Call()for basic prompts,GenerateContent()for structured conversations - Message types: System, Human, and AI message roles
- AWS credential handling: Uses standard AWS SDK credential chain
Full RAG Pipeline: Python with LangChain
For complex applications, Python’s LangChain offers the most mature ecosystem. Here’s a complete RAG implementation using AWS Bedrock, Titan embeddings, and Qdrant vector store.
from langchain.chat_models import init_chat_model
from langchain_aws import BedrockEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
import os
# LLM Setup - Claude 3.7 Sonnet via AWS Bedrock
model = init_chat_model(
"eu.anthropic.claude-3-7-sonnet-20250219-v1:0",
model_provider="bedrock_converse"
)
# Embedding model - Amazon Titan
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")
# Vector store - Qdrant Cloud
qdrant_client = QdrantClient(
url=os.getenv("QDRANT_CLOUD_URL"),
api_key=os.getenv("QDRANT_CLOUD_KEY"),
)
vector_store = QdrantVectorStore(
client=qdrant_client,
collection_name="langchainpy-aws-poc",
embedding=embeddings,
)
Document Loading and Chunking
RAG begins with ingesting documents:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load web content with targeted parsing
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
# Split into chunks for embedding
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
all_splits = text_splitter.split_documents(docs)
# Index in vector store
_ = vector_store.add_documents(documents=all_splits)
Key considerations:
chunk_size=1000: Balance between context and specificitychunk_overlap=200: Prevents information loss at boundaries- Targeted parsing: BeautifulSoup filters relevant content
LangGraph Orchestration
LangGraph provides state management and workflow orchestration:
from langchain import hub
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
# Pull a standard RAG prompt template
prompt = hub.pull("rlm/rag-prompt")
# Define application state
class State(TypedDict):
question: str
context: List[Document]
answer: str
# Retrieval step
def retrieve(state: State):
retrieved_docs = vector_store.similarity_search(state["question"])
return {"context": retrieved_docs}
# Generation step
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({
"question": state["question"],
"context": docs_content
})
response = model.invoke(messages)
return {"answer": response.content}
# Build and compile the graph
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
# Execute
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])
The pipeline:
- Retrieve: Vector similarity search finds relevant document chunks
- Generate: LLM synthesizes answer from retrieved context
Architecture Comparison
| Aspect | Go (LangChainGo) | Python (LangChain) |
|---|---|---|
| Maturity | Growing | Mature |
| Providers | Ollama, Bedrock, OpenAI | 50+ integrations |
| RAG Support | Basic | Full ecosystem |
| LangGraph | Not available | Full support |
| Performance | Lower latency | More features |
| Use Case | Microservices, CLI tools | Complex AI apps |
When to Use Each
In practice, the choice is less about language preference and more about where the LLM sits in your architecture.
Choose Go when the LLM call is a feature inside a larger service — a summarisation step in a data pipeline, a classification call in a routing layer. Go gives you a single binary, predictable memory, and the concurrency model to handle thousands of in-flight requests. You do not need LangGraph for that.
Choose Python when the LLM is the application — a RAG chatbot, an agent with tool use, anything that needs LangGraph orchestration, multi-step retrieval, or the full ecosystem of document loaders and vector store integrations. Python’s maturity here is not just ahead; it is a different category entirely.
Production Considerations
AWS Bedrock Setup
- Enable model access in the AWS Console
- Configure IAM permissions for
bedrock:InvokeModel - Use cross-region inference endpoints for newer models
- Monitor costs—embeddings and completions bill separately
Vector Store Selection
| Store | Best For |
|---|---|
| Qdrant | Production, managed cloud option |
| Pinecone | Serverless, auto-scaling |
| pgvector | PostgreSQL integration |
| FAISS | Local development, in-memory |
Chunking Strategy
The chunk size affects retrieval quality:
- Smaller chunks (500-1000): More precise retrieval, may lose context
- Larger chunks (1500-2000): Better context, noisier retrieval
- Overlap (10-20%): Ensures continuity across chunk boundaries
Reflection
The real value of LangChain — in either language — is not the framework itself but the abstraction boundary it enforces. Today I call Bedrock; tomorrow it might be a self-hosted model behind vLLM. The provider changes, the interface does not. In a landscape where model capabilities shift every few months and pricing changes overnight, that portability is not a nicety; it is an architectural requirement.
If I were starting a new project today, I would prototype the RAG pipeline in Python, prove the retrieval quality, then ask a simple question: does this need to live inside an existing Go service, or can it stand alone? The answer to that question — not language loyalty — should determine the stack.
Building LLM Applications with LangChain: Go, Python, and AWS
A practical guide to LLM applications with AWS Bedrock integration.
Achraf SOLTANI — September 15, 2024
