Provenance First

Provenance is the cornerstone of ContextGraph OS. Every piece of data in the system has a traceable origin, ensuring complete auditability and trust.

What is Provenance?

Provenance answers the question: "Where did this data come from?"

In ContextGraph, provenance tracking means:

  • Every claim has a documented source
  • Every change is recorded in an immutable ledger
  • The entire history can be verified cryptographically
  • No orphan data exists in the system

The Provenance Ledger

The provenance ledger is an append-only, hash-chained log of all data mutations:

interface ProvenanceEntry {
  id: ProvenanceId;
  hash: string;           // SHA-256 hash of this entry
  previousHash: string;   // Link to previous entry
  type: ProvenanceType;   // claim_created, entity_created, etc.
  subjectId: string;      // What this is about
  data: unknown;          // Entry-specific data
  source: ProvenanceSource;
  timestamp: Timestamp;
  agentId?: AgentId;      // Who created this
}

type ProvenanceType =
  | 'claim_created'
  | 'claim_revoked'
  | 'entity_created'
  | 'entity_updated'
  | 'decision_recorded'
  | 'policy_created'
  | 'execution_logged';

Hash Chain Verification

Each entry contains a hash of its contents and a reference to the previous entry's hash, creating an unbreakable chain:

Entry 1          Entry 2          Entry 3
┌─────────┐      ┌─────────┐      ┌─────────┐
│ hash: A │◄─────│prevHash:A│◄─────│prevHash:B│
│ prev: ∅ │      │ hash: B │      │ hash: C │
│ data... │      │ data... │      │ data... │
└─────────┘      └─────────┘      └─────────┘

To verify integrity:

const verification = await client.verifyProvenance();

if (verification.value.valid) {
  console.log('Chain integrity verified');
  console.log(`Entries: ${verification.value.entriesVerified}`);
} else {
  console.log('Chain corrupted!');
  console.log(`Broken links: ${verification.value.brokenLinks}`);
  console.log(`Invalid hashes: ${verification.value.invalidHashes}`);
}

Provenance Sources

Every entry has a source describing its origin:

interface ProvenanceSource {
  type: 'agent' | 'user' | 'system' | 'external' | 'inference';
  id: string;
  method?: string;
  metadata?: Record<string, unknown>;
}

// Example sources
const agentSource = {
  type: 'agent',
  id: 'agt_processor',
  method: 'data_extraction'
};

const externalSource = {
  type: 'external',
  id: 'weather_api',
  method: 'GET /current',
  metadata: { apiVersion: '2.0' }
};

const inferenceSource = {
  type: 'inference',
  id: 'reasoning_engine',
  method: 'transitive_closure',
  metadata: { confidence: 0.95 }
};

Automatic Provenance Tracking

When using the SDK, provenance is tracked automatically:

// When you add a claim...
await client.addClaim({
  subjectId: entityId,
  predicate: 'status',
  value: 'active',
});

// A provenance entry is automatically created:
// {
//   type: 'claim_created',
//   subjectId: entityId,
//   data: { predicate: 'status', value: 'active' },
//   source: { type: 'system', id: 'sdk' },
//   hash: '...',
//   previousHash: '...'
// }

Querying Provenance

Get Entry by ID

const entry = await ledger.get(provenanceId);

Query by Subject

const entries = await ledger.query({
  subjectId: entityId,
  limit: 100
});

Query by Type

const claimEntries = await ledger.query({
  type: 'claim_created',
  from: startTimestamp,
  to: endTimestamp
});

Query by Agent

const agentActions = await ledger.query({
  agentId: agentId,
  limit: 50
});

Provenance for Compliance

Provenance records support compliance requirements:

Audit Trails

// Get complete audit trail for an entity
const audit = await client.getAuditTrail({
  entityId: personId,
  format: 'detailed'
});

// Export for compliance review
const report = await compliance.generateAuditReport({
  from: '2024-01-01',
  to: '2024-12-31',
  format: 'pdf'
});

GDPR Support

// Find all data related to a person (data subject)
const subjectData = await compliance.getDataSubjectData(personId);

// Export for data portability
const exportData = await compliance.exportDataSubjectData(personId);

// Right to erasure (with provenance of deletion)
await compliance.deleteDataSubjectData(personId, {
  reason: 'GDPR Article 17 request',
  requestId: 'gdpr_req_123'
});

Best Practices

1. Always Provide Source Context

// Good - includes context
await ledger.record({
  type: 'claim_created',
  source: {
    type: 'external',
    id: 'crm_system',
    method: 'sync',
    metadata: { syncId: 'sync_456' }
  },
  // ...
});

// Avoid - missing context
await ledger.record({
  type: 'claim_created',
  source: { type: 'system', id: 'unknown' },
  // ...
});

2. Use Meaningful Agent IDs

// Good - descriptive agent
const agent = await client.createAgent({
  name: 'invoice-processor',
  description: 'Processes incoming invoices from suppliers'
});

// Avoid - generic agent
const agent = await client.createAgent({
  name: 'agent1',
  description: 'Does stuff'
});

3. Verify Regularly

// In your health checks
async function healthCheck() {
  const verification = await client.verifyProvenance();
  if (!verification.value.valid) {
    alertOps('Provenance chain integrity failure');
  }
}

Next Steps