/nishio/Qdrant - Scrapbox Reader

generated at 2/11/2025, 6:16:26 AM
qdrant
Qdrant - Vector Database
https://qdrant.tech/

Collections - Qdrant
>A collection is a named set of points (vectors with a payload) among which you can search. Vectors within the same collection must have the same dimensionality and be compared by a single metric.

Payload - Qdrant
>One of the significant features of Qdrant is the ability to store additional information along with vectors. This information is called payload in Qdrant terminology.
> Qdrant allows you to store any information that can be represented using JSON.
プロジェクト名をペイロードに積んでおけば特定のプロジェクトだけから検索したり横断検索したりできそう

Points - Qdrant
IDは64ビット整数、PUTする側が決める
UUID的に作るか、連番など他のメカニズムで作るか…
同じIDを指定してPUTすれば上書きされる
ペイロードに対する条件を指定して、条件を満たすすべてのIDを取得することができる

Search - Qdrant
類似度スコアで足切りすることもできるけど、適切な閾値なんてわからないからどちらかというと類似度の可視化の方が良さそう

Filtering - Qdrant
"Match Any"
json{
  "key": "project",
  "match": {
    "any": ["aaa", "bbb"]
  }
}
全文検索マッチもある
json{
  "key": "description",
  "match": {
    "text": "good cheap"
  }
}
インデックスを作らない場合は部分文字列を中に含むかのサーチになる

Storage - Qdrant
Vector Storage: In-memmory / memmap
Payload Storage: InMemory / OnDisk
RocksDB RocksDB | A persistent key-value store | RocksDB
ペイロードが大きいならメモリに載せるのは現実的ではない〜という話
ミニマムプランで1GB RAMで、このScrapboxのJSONが32MBだから、とりあえず気にせずオンメモリにすべき
裁断スキャンした1000冊の書籍を入れようとするとギリギリあふれるくらい
まあまずはスモールスタートだよな

Indexing - Qdrant
フルテキストインデックスのトークナイザー、説明を見る限りだと日本語に対してまともに機能するか疑わしい
試してみる
Quickstart - Qdrant
https://gist.github.com/nishio/111f11e078dd68768f5904ea879abe2f
特に問題なく文字列の部分一致でヒットするな
インデックスの仕組み
>Qdrant currently only uses HNSW as a vector index. HNSW (Hierarchical Navigable Small World Graph) is a graph-based indexing algorithm.
HNSW (Hierarchical Navigable Small World Graph)
Pinecoreも同じ仕組み Hierarchical Navigable Small Worlds (HNSW) | Pinecone
1603.09320 Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

...