What are Distributed Systems? A quick introduction

Dec 14, 2020 - 7 min read
Amanda Fawcett
editor-page-cover

In light of recent technological changes and advancements, distributed systems are becoming more popular. Many top companies have created complex distributed systems to handle billions of requests and upgrade without downtime.

Distributed designs may seem daunting and hard to build, but they are becoming more essential in 2021 to accommodate scaling at exponential rates. When beginning a build, it is important to leave room for a basic, high-availability, and scalable distributed system.

There’s a lot to go into when it comes to distributed systems. So today, we introduce you to distributed systems in a simple way. We will explain the different categories, design issues, and considerations to make.

Today, we will learn:



Learn how to build scalable systems

In this learning path, you’ll cover everything you need to know to design scalable systems for enterprise-level software.

Scalability & System Design for Developers



What is a distributed system?

At a basic level, a distributed system is a collection of computers that work together to form a single computer for the end-user. All these distributed machines have one shared state and operate concurrently.

They are able to fail independently without damaging the whole system, much like microservices. These interdependent, autonomous computers are linked by a network to share information, communicate, and exchange information easily.

Note: Distributed systems must have a shared network to connect its components, which could be connected using an IP address or even physical cables.

Unlike traditional databases, which are stored on a single machine, in a distributed system, a user must be able to communicate with any machine without knowing it is only one machine. Most applications today use some form of a distributed database and must account for their homogenous or heterogenous nature.

In a homogenous distributed database, each system shares a data model and database management system and data model. Generally, these are easier to manage by adding nodes. On the other hand, heterogeneous databases make it possible to have multiple data models or varied database management systems using gateways to translate data between nodes.

Generally, there are three kinds of distributed computing systems with the following goals:

  • Distributed Information Systems: distribute information across different servers via multiple communication models
  • Distributed Pervasive Systems: use embedded computer devices (i.e. ECG monitors, sensors, mobile devices)
  • Distributed Computing Systems: computers in a network communicate via message passing

Note: An important part of distributed systems is the CAP theorem, which states that a distributed data store cannot simultaneously be consistent, available, and partition tolerant.


Decentralized vs distributed

There is quite a bit of debate on the difference between decentralized vs distributed systems. Decentralized is essentially distributed on a technical level, but usually a decentralized system is not owned by a single source.

It is harder to manage a decentralized system, as you cannot manage all the participants, unlike a distributed, single course design where one team/company owns all the nodes.


Benefits of a distributed system

Distributed systems can be challenging to deploy and maintain, but there are many benefits to this design. Let’s go over a few of those perks.

  • Scaling: A distributed system allows you to scale horizontally so you can account for more traffic.
  • Modular growth: There is almost no cap on how much you can scale.
  • Fault tolerance: Distributed systems are more fault tolerant than a single machine.
  • Cost effective: The initial cost is higher than a traditional system, but because of their scalability, they quickly become more cost effective
  • Low latency: Users can have a node in multiple locations, so traffic will hit the closet node
  • Efficiency: Distributed systems break complex data into smaller pieces
  • Parallelism: Distributed systems can be designed for parallelism, where multiple processors divide up a complex problem into pieces
widget

Scalability is the biggest benefit of distributed systems. Horizontal scaling means adding more servers into your pool of resources. Vertical scaling means scaling by adding more power (CPU, RAM, Storage, etc.) to your existing servers.

Horizontal-scaling is easier to scale dynamically, and vertical-scaling is limited to the capacity of a single server.

Good examples of horizontal scaling are Cassandra and MongoDB. They make it easy to scale horizontally by adding more machines. An example of vertical scaling is MySQL, as you scale by switching from smaller to bigger machines.


Keep the learning going.

Learn how to build complex, scalable systems without scrubbing through videos or documentation. Educative’s text-based courses are easy to skim and feature live coding environments, making learning quick and efficient.

Scalability & System Design for Developers



Design issues with distributed systems

While there are many benefits to distributed systems, it’s also important to note the design issues that can arise. We’ve summarized the main design considerations below.

  • Failure Handling: Failure handling can be difficult with distributed systems because some components fail while others continue to function. This can often serve as an advantage to prevent large-scale failures, but it also lead to more complexity when it comes to troubleshooting and debugging.
  • Concurrency: A common issue occurs when several clients attempt to access a shared resource simultaneously. You must ensure that all resources are safe in a concurrent environment.
  • Security issues: Data security and sharing have increased risks in distributed computer systems. The network has to be secured, and users must be able to safely access replicated data across multiple locations.
  • Higher initial infrastructure costs: The initial deployment cost of a distributed system can be higher than a single system. This pricing includes basic network setup issues, such as transmission, high load, and loss of information.

Distributed systems aren’t easy to get up and running, and often this powerful technology is too “overkill” for many systems. There are many challenges distributing data that ensures various requirements under unexpected circumstances.

Similarly, bugs are harder to detect in systems that are spread across multiple locations.


Cloud vs distributed systems

Cloud computing and distributed systems are different, but they use similar concepts. Distributed computing uses distributed systems by spreading tasks across many machines. Cloud computing, on the other hand, uses network hosted servers for storage, process, data management.

Distributed computing aims to create collaborative resource sharing and provide size and geographical scalability. Cloud computing is about delivering an on demand environment using transparency, monitoring, and security.

Compared to distributed systems, cloud computing offers the following advantages:

  • Cost effective
  • Access to a global market
  • Encapsulated change management
  • Access storage, servers, and databases on the internet

However, cloud computing is arguably less flexible than distributed computing, as you rely on other services and technologies to build a system. This gives you less control overall.

Priorities like load-balancing, replication, auto-scaling, and automated back-ups can be made easy with cloud computing. Cloud building tools like Docker, Amazon Web Services (AWS), Google Cloud Services, or Azure make it possible to create such systems quickly, and many teams opt to build distributed systems alongside these technologies.


Examples of distributed systems

Distributed systems are used in all kinds of things, everything from electronic banking systems to sensor networks to multiplayer online games. Many organizations utilize distributed systems to power content delivery network services.

In the healthcare industry, distributed systems are being used for storing and accessing and telemedicine. In finance and commerce, many online shopping sites use distributed systems for online payments or information dissemination systems in financial trading.

Distributed systems are also used for transport in technologies like GPS, route finding systems, and traffic management systems. Cellular networks are also examples of distributed network systems due to their base station.

Google utilizes a complex, sophisticated distributed system infrastructure for its search capabilities. Some say it is the most complex distributed system out there currently.


What to learn next

You should now have a good idea how distributed systems work and why you should consider building for this architecture. These systems are important for scaling for the future. There is still a lot to learn. Next, you should check out these topics:

  • Microservices and applications
  • Load balancing and caching
  • Designing databases for your systems

To get hands-on practice with building systems, check out Educative’s learning path Scalability & System Design for Developers. In this learning path, you’ll cover everything you need to know to design scalable systems for enterprise-level software.

By the end, you’ll understand the concepts, components, and technology trade-offs involved in architecting a web application and microservices architecture. You’ll learn to confidently approach and solve system design problems in interview settings.

Happy learning!


Continue learning about system design


WRITTEN BYAmanda Fawcett

Join a community of 500,000 monthly readers. A free, bi-monthly email with a roundup of Educative's top articles and coding tips.