P2P systems¶

P2P systems (peer‑to‑peer computing/networking), or peer networks, operate without centralized control. The networks are scalable, meaning the P2P protocol works as‑is even when the number of members grows. In such a situation, the client‑server architecture typically requires additional backend resources. In addition, peer networks naturally tolerate failures or other disturbances of individual members. The member population forms the service‑providing infrastructure, meaning that consumers of the service must also participate in providing resources. Even the idea of the WWW was based on peer thinking, and since the late 1990s numerous P2P models have emerged. They typically follow five principles:

symmetry of interfaces, since members of the network can perform tasks both as servers and as clients. This is also reflected in terminology: network members, the peers, are also called nodes.
tolerance of disturbances in the underlying communication network and peer failures;
durability of information and services via replication methods;
resources located at the edge of the network, reducing infrastructure costs and promoting scalability and decentralization;
address variance of resource provisioning among peers, preventing load from accumulating on a single one.

These principles make P2P an important foundation for various applications. Originally P2P systems became famous in a questionable manner because they were used for file‑sharing applications such as Napster (1999), eMule, or KaZaA, which violated copyrights. Today P2P use is common in many applications: social networks, multimedia content distribution, online games, Internet telephony, instant messaging, IoT, vehicle‑to‑vehicle communication, SCADA (Supervisory Control and Data Acquisition) and wide‑area monitoring systems. As will become evident later, distributed ledgers also use some P2P functions.

The two most important P2P paradigms are unstructured and structured protocols. Unstructured protocols are suited mostly for information dissemination when it is broad and scalability is needed, whereas structured protocols are typically used to improve information retrieval.

Hybrid models combine structure and lack of structure. In addition, hierarchical P2P systems exist. These are partly in conflict with the peer principle. Hierarchical systems may be viewed as layered, e.g. composed of several topologies with foreground and background members.

Regardless of the peer network type, it is important to note that P2P fundamental functions are built on three elements: (a) naming or otherwise identifying nodes, (b) routing schemes between nodes, and (c) discovering nodes as a function of their identifiers and routing.

Unstructured P2P protocols¶

Unstructured P2P protocols — e.g. Freenet and Gnutella — are used mainly for information dissemination, such as uncensored communication or file sharing. Although the nodes of the network have no prescribed topology, they often form structures resembling a tree or a mesh network, enabling low‑latency message exchange. Trees are found in single‑source streaming media cases. Mesh networks appear in situations with multiple sources and consumers, e.g. file sharing.

Unstructured P2P protocols typically search for resources by name or identifiers without using an addressing system. This scales well for dissemination but poorly for resource discovery or repeatable routing paths. Peers nevertheless maintain an identifier, allowing independence from the underlay network address. Resources are found using overlay‑network search algorithms, such as breadth‑ or depth‑first search, random walk, and expanding ring search, combined according to application requirements.

Communication between peers happens via messages. Forwarding may be direct, based on the underlay network connection, but this usually requires peers to know each other’s address and route explicitly. When the destination of the message is unknown, the message structure and its handling incorporate features related to search.

All members maintain lists containing the contact information of other members (or hash values thereof). This prevents the network from clogging with address‑lookup messages. The effectiveness of the lists depends on member liveness. Therefore, a query (ping) is periodically sent to the members of the list. The list is cleaned if no response is received. The interval between queries is adjusted dynamically based on observed churn.

Structured P2P protocols¶

Structured P2P protocols, such as Chord, Pastry, Tapestry, Kademlia, CAN, etc., are typically used in information‑retrieval applications. The topologies usually have “small‑world” properties, meaning the path between two members is relatively short. The topology is often ring‑like, using shortcuts (Chord is literally a chord on the circumference; CAN is a Content Addressable Network). The most important properties are efficiency of node lookup and routing.

In structured P2P protocols, pointers to resources are stored in a data structure called a distributed hash table (DHT). The overlay address space is usually the integer set 0,…,2^w−1, where w is typically 128 or 160. A distance function d(a, b) is usually defined, enabling distance computation between any two addresses a and b. Computing the distance is crucial for the lookup mechanism and for determining storage responsibility. The distance function and its properties vary among protocols. Lookup is performed by computing a key from the resource name or other easily understood identifier. Information corresponding to this key is then requested from one network member linked to it.

Messages are exchanged directly in most structured protocols, using an underlay connection between two nodes. If nodes do not know each other, a direct connection cannot be formed, and the destination must be determined to route the message. For this purpose, the overlay search mechanism attempts to reduce the distance to the target on each iteration of the search algorithm until the identifier can be resolved. This method is highly efficient and promotes scalability. Once the lookup has successfully retrieved the underlying network address of the target, messages can be exchanged. Lookup variants include iterative or recursive algorithms and parallel queries to a set of nearest neighbors. Routing tables maintain information only on live and reachable nodes, requiring regular queries (pings). Protocol messaging is also needed so newly joined nodes can be inserted into routing tables, while departed or unresponsive nodes are removed. Adjusting the topology makes the maintenance of structured P2P networks more costly than unstructured ones.

Hybrid P2P protocols¶

Hybrid variants of P2P protocols integrate elements from both unstructured and structured schemes because their primary purpose is both lookup and dissemination. Well‑known hybrid examples include file‑sharing services such as Napster and BitTorrent. BitTorrent was originally a classic unstructured protocol but was extended with structured features to provide a fully distributed lookup mechanism. In this context BitTorrent could abandon the tracker server that previously facilitated search, improving its availability.

Hierarchical P2P protocols¶

In some application scenarios hierarchical P2P has proven useful. In hierarchical models, network nodes are classified based on bandwidth, latency, storage, or computational capacity, and some “supernodes” take on a coordinating role. Typically, the class with fewer members forms the background layer of the system, while the majority forms the foreground, handling service requests first and forwarding requests to the background only if they cannot fulfill them themselves. This improves lookup efficiency and also produces fewer messages in the network. Furthermore, popular content can be cached to reduce download delay. Such a structure worked well e.g. in the eDonkey file‑sharing system or in Super P2P models such as KaZaA, where a chosen member functions as a server for a subset of members.