Commercial Hosted Architecture
Technical architecture for the ContextGraph OS commercial hosted product.
System Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ Customer Applications │
│ (LangChain · AutoGen · Custom Agents · Internal Tools) │
└───────────────────────────────────┬─────────────────────────────────────┘
│ SDK / REST API
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ ContextGraph Cloud Platform │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ API Gateway │ │
│ │ (Rate Limiting · Auth · Tenant Routing) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────┼────────────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ Auth │ │ Core API │ │ Dashboard │ │
│ │ Service │ │ Service │ │ (React) │ │
│ └──────────┘ └──────────────┘ └──────────┘ │
│ │ │ │ │
│ │ ┌──────────────────────┼──────────────────────┐ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌────────┐ ┌──────────────┐ ┌────────┐ │ │
│ │ │ Policy │ │ Decision │ │ Report │ │ │
│ │ │ Engine │ │ Processor │ │ Worker │ │ │
│ │ └────────┘ └──────────────┘ └────────┘ │ │
│ │ │ │ │ │ │
│ └─────┼──────────────────────┼──────────────────────┼─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL (Multi-Tenant) │ │
│ │ Claims · Decisions · Policies · Provenance · Agents │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Redis Cluster │ │
│ │ (Sessions · Cache · Real-time Events) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Core Components
1. API Gateway
Technology: NGINX / Kong / AWS API Gateway
Responsibilities:
- Request routing to services
- Rate limiting per tenant
- API key validation
- Request/response logging
- SSL termination
# Rate limits per tier
rate_limits:
team:
requests_per_minute: 1000
burst: 100
enterprise:
requests_per_minute: 10000
burst: 500
2. Authentication Service
Technology: Node.js + Passport.js / Auth0
Features:
- API key management
- SSO/SAML integration (Team+)
- JWT token issuance
- Role-based access control
- Multi-factor authentication (Enterprise)
interface TenantAuth {
tenantId: string;
plan: 'team' | 'enterprise';
apiKeys: ApiKey[];
ssoConfig?: SSOConfig;
users: User[];
roles: Role[];
}
3. Core API Service
Technology: Node.js + TypeScript + Express/Fastify
Endpoints:
/api/v1/claims- CRUD for claims/api/v1/decisions- Decision lifecycle/api/v1/policies- Policy management/api/v1/agents- Agent registry/api/v1/provenance- Audit trail queries
Multi-tenancy:
// Every request is tenant-scoped
app.use((req, res, next) => {
const tenantId = extractTenantId(req);
req.context = { tenantId };
next();
});
// Queries are automatically filtered
const claims = await ckg.query({
tenantId: req.context.tenantId, // Enforced
entityId: 'report_123',
});
4. Policy Engine Service
Technology: Node.js + @contextgraph/policy
Features:
- Real-time policy evaluation
- Policy simulation/dry-run
- Template management
- Policy versioning
5. Decision Processor
Technology: Node.js + Bull Queue
Responsibilities:
- Async decision processing
- Human review queue management
- Webhook delivery
- Status transitions
// Decision processing flow
queue.process('decision', async (job) => {
const { decisionId, tenantId } = job.data;
// Evaluate policies
const result = await policyEngine.evaluate(decision);
if (result.effect === 'deny' && result.requiresApproval) {
await notifyReviewers(tenantId, decision);
return { status: 'needs_review' };
}
// Auto-approve
await dtg.transition(decisionId, 'approved');
return { status: 'approved' };
});
6. Report Worker
Technology: Node.js + PDFKit + Bull Queue
Features:
- Scheduled report generation
- On-demand compliance exports
- PDF/CSV/JSON formats
- Email delivery
7. Dashboard (Web UI)
Technology: React + TypeScript + TailwindCSS
Features:
- Real-time decision monitoring
- Policy editor (visual + YAML)
- Agent status & capabilities
- Compliance report generation
- Team management
- Audit log viewer
Database Schema (PostgreSQL)
-- Multi-tenant partitioning
CREATE TABLE tenants (
id UUID PRIMARY KEY,
name TEXT NOT NULL,
plan TEXT NOT NULL, -- 'team' | 'enterprise'
settings JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- All tables include tenant_id
CREATE TABLE claims (
id UUID PRIMARY KEY,
tenant_id UUID REFERENCES tenants(id),
entity_id TEXT NOT NULL,
attribute TEXT NOT NULL,
value JSONB NOT NULL,
confidence DECIMAL(3,2),
valid_from TIMESTAMPTZ NOT NULL,
valid_until TIMESTAMPTZ,
provenance_id UUID,
created_at TIMESTAMPTZ DEFAULT NOW(),
-- Partition by tenant for isolation
CONSTRAINT claims_tenant_fk FOREIGN KEY (tenant_id) REFERENCES tenants(id)
);
CREATE INDEX idx_claims_tenant_entity ON claims(tenant_id, entity_id);
CREATE INDEX idx_claims_valid_range ON claims(tenant_id, valid_from, valid_until);
-- Row-level security for tenant isolation
ALTER TABLE claims ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON claims
USING (tenant_id = current_setting('app.tenant_id')::uuid);
Infrastructure Requirements
Team Tier (Managed)
| Component | Specification |
|---|---|
| API Servers | 2x (load balanced) |
| Database | PostgreSQL RDS (db.t3.medium) |
| Redis | ElastiCache (cache.t3.micro) |
| Storage | S3 for reports |
| CDN | CloudFront for dashboard |
Estimated Cost: ~$200-400/month base infrastructure
Enterprise Tier (Self-Hosted)
| Component | Specification |
|---|---|
| Kubernetes | 3-node cluster minimum |
| Database | PostgreSQL (dedicated, encrypted) |
| Redis | 3-node cluster |
| Storage | Local or cloud storage |
| Monitoring | Prometheus + Grafana |
Deployment Options:
- AWS EKS / GCP GKE / Azure AKS
- On-premise Kubernetes
- Docker Compose (small scale)
Deployment Architecture
Kubernetes Deployment
# Helm chart structure
contextgraph-cloud/
├── Chart.yaml
├── values.yaml
├── templates/
│ ├── api-deployment.yaml
│ ├── api-service.yaml
│ ├── auth-deployment.yaml
│ ├── dashboard-deployment.yaml
│ ├── worker-deployment.yaml
│ ├── ingress.yaml
│ ├── configmap.yaml
│ └── secrets.yaml
# values.yaml
replicaCount:
api: 3
auth: 2
dashboard: 2
worker: 2
postgresql:
enabled: true
auth:
database: contextgraph
primary:
persistence:
size: 100Gi
redis:
enabled: true
architecture: replication
ingress:
enabled: true
className: nginx
hosts:
- host: api.contextgraph.io
paths:
- path: /
pathType: Prefix
Docker Compose (Development/Small Scale)
version: '3.8'
services:
api:
build: ./packages/cloud-api
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgresql://postgres:password@db:5432/contextgraph
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
dashboard:
build: ./packages/cloud-dashboard
ports:
- "3001:80"
worker:
build: ./packages/cloud-worker
environment:
- DATABASE_URL=postgresql://postgres:password@db:5432/contextgraph
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
db:
image: postgres:15
volumes:
- pgdata:/var/lib/postgresql/data
environment:
- POSTGRES_DB=contextgraph
- POSTGRES_PASSWORD=password
redis:
image: redis:7-alpine
volumes:
- redisdata:/data
volumes:
pgdata:
redisdata:
Security Considerations
Data Isolation
- Row-Level Security: PostgreSQL RLS enforces tenant isolation
- Encryption at Rest: AES-256 for all stored data
- Encryption in Transit: TLS 1.3 for all connections
- API Key Hashing: bcrypt for stored API keys
Compliance
| Standard | Features |
|---|---|
| SOC 2 | Audit logging, access controls, encryption |
| HIPAA | BAA available, PHI handling procedures |
| GDPR | Data export, deletion, consent tracking |
| ISO 27001 | Security controls, incident response |
Audit Trail
// All operations are logged
interface AuditLog {
id: string;
tenantId: string;
userId: string;
action: string; // 'claim.create', 'decision.approve', etc.
resource: string;
resourceId: string;
metadata: Record<string, unknown>;
ip: string;
userAgent: string;
timestamp: Date;
}
Monitoring & Observability
Metrics (Prometheus)
# Key metrics
- contextgraph_decisions_total{tenant, status}
- contextgraph_policy_evaluations_total{tenant, effect}
- contextgraph_api_latency_seconds{endpoint, method}
- contextgraph_claims_count{tenant}
- contextgraph_agents_active{tenant}
Logging (Structured JSON)
{
"level": "info",
"timestamp": "2024-03-15T10:30:00Z",
"service": "api",
"tenantId": "tenant_123",
"requestId": "req_abc",
"message": "Decision approved",
"decisionId": "dec_xyz",
"duration_ms": 45
}
Alerting
| Alert | Threshold |
|---|---|
| API Error Rate | > 1% for 5 min |
| API Latency P99 | > 500ms for 5 min |
| Database Connections | > 80% pool |
| Queue Backlog | > 1000 jobs |
| Decision Failure Rate | > 5% for 10 min |
Next Steps
- Phase 1: Core API + PostgreSQL multi-tenancy
- Phase 2: Dashboard MVP + Basic auth
- Phase 3: SSO integration + Advanced policies
- Phase 4: Self-hosted Helm chart
- Phase 5: Compliance certifications