Vector Search Documentation

Vector Metadata FAQ

Q: Is there a maximum metadata size per vector in Cachee?

No. Metadata is stored as HashMap with no enforced size limit. Each field is a key-value string pair. You can store 36-byte UUIDs or 200KB payloads. The practical limit is available memory.

Q: Does large metadata slow down VSEARCH?

No. HNSW graph traversal only accesses vector dimensions (Vec ), never metadata. Whether your metadata is 36 bytes or 200KB, VSEARCH completes in 0.0015ms. Metadata is only touched during filter evaluation and result serialization.

Q: Can I store ZK-STARK proofs as vector metadata?

Yes, but we recommend a two-tier approach for proofs >10KB. Store small identifying fields (proof_hash, circuit_id, status) as VADD metadata. Store the full proof blob as a separate cache key (SET stark:{hash} ). VSEARCH finds matches by vector similarity, metadata gives you the proof hash, then GET retrieves the full proof at 1.5 microseconds. This keeps the HNSW index lean.

Q: How much memory does metadata consume?

Memory = number_of_vectors x average_metadata_size. At 100K vectors: 500-byte metadata = 50MB, 12KB metadata = 1.2GB, 200KB metadata = 20GB. Plan your instance size accordingly. The two-tier pattern (small metadata + separate cache key) keeps vector memory under control.

Q: Does metadata affect VADD insertion speed?

Minimally. VADD copies metadata into the vector entry and updates the HNSW graph. Larger metadata means more bytes copied, but this is nanoseconds vs the microseconds for HNSW graph updates. Insertion speed is dominated by graph maintenance, not metadata size.

Q: Can I filter on metadata fields during VSEARCH?

Yes. Use FILTER: VSEARCH key dim1 dim2 ... K FILTER field_name eq value. Filter evaluation is O(1) per vector checked (HashMap lookup). The number of metadata fields doesn't affect filter speed — only the filtered field is accessed.

Q: What data types can metadata values be?

All metadata values are strings. Store numbers as string representations ('42', '3.14'). Store booleans as 'true'/'false'. For complex objects, serialize to JSON string. Filter comparisons are string-based (eq, ne, gt, lt use lexicographic ordering for gt/lt).

Q: Can I update metadata without re-inserting the vector?

Currently, updating metadata requires a new VADD call with the same vector_id. The existing entry is replaced. The HNSW graph connections are preserved if the vector dimensions haven't changed. This is an atomic operation.

Q: What's the recommended metadata pattern for different use cases?

RAG/embeddings: document_id, source, timestamp, access_count (~200 bytes). E-commerce: product_id, category, price, in_stock (~300 bytes). Fraud detection: transaction_id, risk_score, merchant_id, device_hash (~400 bytes). ZK proofs: proof_hash, circuit_id, prover, status, created_at (~500 bytes) + full proof in separate key.

Q: How does metadata interact with the eviction policy?

Vectors are evicted as whole entries (vector + all metadata). Cachee-FLU eviction considers access frequency and recency of the vector, not metadata size. Large-metadata vectors are not penalized in eviction decisions. If you need cost-aware eviction (keep expensive-to-recompute entries), use the cost-aware eviction feature.

Everything developers need to know about metadata sizing, performance impact, filtering, and best practices for Cachee vector search

Vector Search Metadata Guide API Reference ZK Proofs AI Features

1 Is there a maximum metadata size per vector in Cachee?

No. Metadata is stored as HashMap<String, String> with no enforced size limit. Each field is a key-value string pair. You can store 36-byte UUIDs or 200KB payloads. The practical limit is available memory.

VADD my_index vec_001 0.1 0.5 0.3 0.9
  META id "550e8400-e29b-41d4-a716-446655440000"
  META description "A very long description..."
  META full_document "<200KB of text>"

All three metadata fields are stored without truncation. The HashMap scales to any number of fields with any value length.

2 Does large metadata slow down VSEARCH?

No. HNSW graph traversal only accesses vector dimensions (Vec<f32>), never metadata. Whether your metadata is 36 bytes or 200KB, VSEARCH completes in 0.0015ms. Metadata is only touched during two phases:

Filter evaluation — only if you include a FILTER clause, and only the filtered field is read
Result serialization — metadata for the top-K results is serialized into the response

The HNSW traversal that dominates latency never reads metadata. Graph edges point to vector slots, and distance computation uses f32 dimensions exclusively.

3 Can I store ZK-STARK proofs as vector metadata?

Yes, but we recommend a two-tier approach for proofs >10KB:

Tier 1: Small metadata on the vector

VADD proofs proof_001 0.82 0.15 0.93 ...
  META proof_hash "sha3_abc123def456"
  META circuit_id "fibonacci_v2"
  META status "verified"
  META created_at "2026-03-27T10:00:00Z"

Tier 2: Full proof in a separate cache key

SET stark:sha3_abc123def456 <full proof blob>

Retrieval flow:

VSEARCH finds matches by vector similarity. Metadata gives you the proof_hash. Then GET stark:{hash} retrieves the full proof at 1.5 microseconds. This keeps the HNSW index lean while the full proof is instantly accessible.

4 How much memory does metadata consume?

Memory = number_of_vectors × average_metadata_size. Here are common scenarios at 100K vectors:

Avg Metadata Size	Memory at 100K Vectors	Use Case
`500 bytes`	50 MB	IDs + timestamps
`12 KB`	1.2 GB	Product catalogs
`200 KB`	20 GB	Full documents

Plan your instance size accordingly. The two-tier pattern (small metadata + separate cache key) keeps vector memory under control while large payloads remain instantly accessible via GET.

5 Does metadata affect VADD insertion speed?

Minimally. VADD copies metadata into the vector entry and updates the HNSW graph. Larger metadata means more bytes copied, but this is nanoseconds vs the microseconds for HNSW graph updates.

Where insertion time goes:

~95% — HNSW graph maintenance (finding neighbors, updating edges)
~4% — Distance computations against candidate vectors
~1% — Metadata copy + memory allocation

Insertion speed is dominated by graph maintenance, not metadata size. Even at 200KB metadata, the copy overhead is negligible compared to the HNSW layer search.

6 Can I filter on metadata fields during VSEARCH?

Yes. Use the FILTER clause:

VSEARCH my_index 0.1 0.5 0.3 0.9 10
  FILTER category eq "electronics"

Filter evaluation is O(1) per vector checked (HashMap lookup). The number of metadata fields on a vector doesn't affect filter speed — only the filtered field is accessed.

Supported operators:

eq — exact string equality
ne — not equal
gt / lt — lexicographic greater-than / less-than

Filtering happens during HNSW traversal: each candidate is checked against the filter before being added to the result set, so non-matching vectors are skipped efficiently.

7 What data types can metadata values be?

All metadata values are strings. Store other types as string representations:

VADD my_index product_42 0.8 0.2 0.5 0.1
  META price "29.99"
  META in_stock "true"
  META tags "{\"colors\":[\"red\",\"blue\"]}"
  META quantity "42"

Type conventions:

Numbers: "42", "3.14"
Booleans: "true" / "false"
Complex objects: Serialize to JSON string
Dates: ISO 8601 strings (e.g., "2026-03-27T10:00:00Z")

Filter comparisons (gt, lt) use lexicographic ordering. For numeric sorting, zero-pad values (e.g., "00042") or use ISO dates for chronological filters.

8 Can I update metadata without re-inserting the vector?

Currently, updating metadata requires a new VADD call with the same vector_id. The existing entry is replaced. The HNSW graph connections are preserved if the vector dimensions haven't changed. This is an atomic operation.

# Original insertion
VADD my_index doc_001 0.1 0.5 0.3 META status "draft"

# Update just the metadata (same vector_id, same dimensions)
VADD my_index doc_001 0.1 0.5 0.3 META status "published"
  META published_at "2026-03-27T12:00:00Z"

Because the vector dimensions are identical, the HNSW graph does not need to recompute neighbors. Only the metadata HashMap is replaced. The operation completes at the same speed as the original insertion.

9 What's the recommended metadata pattern for different use cases?

RAG / Embeddings

document_id, source, timestamp, access_count

~200 bytes

E-commerce

product_id, category, price, in_stock

~300 bytes

Fraud Detection

transaction_id, risk_score, merchant_id, device_hash

~400 bytes

ZK Proofs

proof_hash, circuit_id, prover, status, created_at + full proof in separate key

~500 bytes

Keep inline metadata under 1KB per vector for optimal memory efficiency. For payloads exceeding 10KB, use the two-tier pattern: small metadata fields on the vector, large blobs in separate cache keys.

10 How does metadata interact with the eviction policy?

Vectors are evicted as whole entries (vector + all metadata). Cachee-FLU eviction considers access frequency and recency of the vector, not metadata size. Large-metadata vectors are not penalized in eviction decisions.

Key behaviors:

Eviction unit: Entire vector entry (dimensions + metadata) is removed together
Eviction signal: Access frequency (how often VSEARCH returns this vector) and recency (last access time)
Size-blind: A vector with 50 bytes of metadata has the same eviction priority as one with 200KB, given equal access patterns

If you need cost-aware eviction (keep expensive-to-recompute entries longer), use the cost-aware eviction feature — this lets you assign a recomputation cost weight to vectors so high-value entries survive longer under memory pressure.

Related Resources

Ready to Build with Vector Search?

Start using VADD and VSEARCH with metadata in minutes. Free tier includes 10K vectors with full metadata support.

Get Started Free View Live Demos