🎆P2P Search Network

Search engines in web2 are built in isolation. Every company builds its own solutions and the resulting search engines are not designed to be integrated or interoperable.

Sepana envisions a search infrastructure where individuals, projects, and companies can maintain ownership over their data and technology and still access engines throughout the network through a single search query. This is accomplished through the introduction of a decentralized search layer that hosts, deploys, and maintains the network.

History

Distributed search is not a novel concept. In fact, distributed architectures power most enterprise search engines. Millions of servers in data centers scattered across the globe power the magic we call Google.

In the past, Distributed Hash Tables (DHT) and the Kademlia protocol helped power novel p2p search engines such as Faroo and Yacy. Federated search engines have been built to query multiple engines in parallel yet as a result, are plagued by scaling challenges. This dramatically limits the number of such engines.

The Gnutella gossip protocol pioneered unstructured search in p2p networks. Ultimately, this led to much redundancy among popular content and high costs as a result of inefficient network usage.

The recent innovations around consensus and tokenization allow p2p networks to introduce long-term incentives for both data consistency and coordination.

As a result, Sepana can build an internet-scale search infrastructure that is self-maintained, permissionless, and trustless.

Sepana's Contributions

Over the past months, we've designed and innovated on core p2p protocols in order to build a decentralized search network capable of internet-wide scale.

M-ary Gossip Protocol

In traditional p2p gossip protocols, each peer maintains only a limited local directory of peers. in a given epoch, the peers check for updates and the status of one another. Each node periodically gossips to K random peers in the network. This is known as the infection factor. Once these K targets receive the gossip, they once again randomly select another K targets to further gossip to. This behavior continues until all peers have forwarded the message to all of their known peers, or the message expires. This process can hypothetically continue forever with each node reading and forwarding gossiped messages to one another. In order to limit this, protocols introduce a form of internal memory and each message is assigned a Universally unique identifier (UUID). Peers keep track of the UUIDs they've already forwarded and therefore can ignore repeat messages which in turn prevents infinite message loops.

In Gnutella, a prevailing p2p protocol at the core of previous attempts at federated search, a node will forward a message once, yet can receive the same message from multiple peers. This happens because the memory of seen UUIDs is local to each peer. This creates a lot of unnecessary traffic and congestion of the network.

The M-ary spanning tree gossip protocol guarantees succient and non-overlapping messaging paths. This improves protocol communication and speed by a factor of 1,000,000!

Coming soon -> documentation about our innovations with modular search cores, novel Rust-based indexing algorithms, cluster managment, vector embedding search engine, consensus algorithms, and more!

Decentralized Search Network (DSN)

Sepana's DSN runs on a network of independent nodes that deploy and operate the protocol. There is no central admin meaning equal nodes and multiple stakeholders hold the network accountable.

The infrastructure was designed to handle internet-scale data and can grow as needed to adapt. Users can search through a single node/ data provider or perform a federated search across multiple nodes.

Improvements of DSN vs centralized search infrastructures

  • Multiple independent stakeholders -> Anyone can join the DSN, own and operate a node, contribute data, design search algorithms, help maintain consensus, and still own their data.

  • No gatekeepers or single points of failure -> Search or access data through any node. Data resiliency is improved through decentralization.

  • Discovery across diverse data sets -> Utilize the network effects of indexed data to build better recommendations, personalization, and results.

Technology

When building a search engine, latency and performance are vital. Therefore we chose to start from scratch and build the DSN from the ground up in Rust.

This helped ensure:

  • High-performance capabilities

  • No garbage collection with its unpredictable delays

  • No intermediate language (IL) layer

  • Support for future SIMD hardware vector acceleration

What's been built

Designed a novel gossip protocol

Built P2P messaging layer

Integrated Rust-based search core

Federated search and ranking mechanisms

Multi-index node capacity

➡️ Fully functioning decentralized search engine with real-time results and indexing

Up Next:

📍 Consensus and data consistency engine

📍 Vector/ embedding-space search engine

📍 Ranking optimizations

📍 Encrypted search (blind indexing)

📍 Cluster management

📍 and more!

Stay tuned for more documentation and details on the DSN roadmap.

Last updated