Home/Blog/System Design/Design TinyURL and Instagram: System design interview tutorial

Design TinyURL and Instagram: System design interview tutorial

16 min read

Nov 15, 2021

content

How to design TinyURL

What is TinyURL

System requirements and goals

Capacity estimation and constraints

System APIs

Database design

Basic system design and algorithm

Data partitioning and replication

Cache

Load balancing

Designing Instagram

What is Instagram?

System requirements and goals

Capacity estimation and constraints

High-level system design

Database schema

Data size estimation

Component design

Reliability and redundancy

Data sharding

Generating PhotoIDs

Ensuring reliability

Load balancing

Wrapping up and resources

Continue reading about system design

Note: This post was originally published in 2020 and has been updated as of Nov. 15, 2021.

Understanding system design is more important than ever in today’s tech industry. The larger an application gets, the more reliable and efficient architectural patterns need to be. System design is likely a requirement to land your dream software engineering job.

Today, we’ll apply system design principles by approaching two common system design interview questions: designing TinyURL and Instagram. You’re in the right place if you’re here to prepare for a system design interview.

Master system design beyond the interview
Our comprehensive system design course covers these design problems – and much more.

Grokking Modern System Design

How to design TinyURL#

What is TinyURL#

TinyURL is a URL shortening web service that creates shorter aliases for long URLs. Users select these shortened URLs and are redirected to the original URL. This service is useful because short links save space and allow users to easily type abbreviated URLs instead of long ones.

System requirements and goals#

Let’s consider some functional and non-functional requirements for designing TinyURL.

Non-functional requirements:

The system must be highly available. The short links will not function if the service fails.
URL redirection should happen in real-time with minimal latency.
Shortened links should not be predictable in any manner.

Functional requirements:

Our service will generate a shorter alias of the original URL that can easily be copied and pasted.
The short link should redirect users to the original link.
Users should have the option to pick a custom short link for their URL.
Short links will expire after a default timespan, which users can specify.

Capacity estimation and constraints#

Our system will be read-heavy. There will be a large number of redirection requests compared to new URL shortenings. Let’s assume a 100:1 read/write ratio. Our reads are redirection requests and writes are new URL shortenings.

Traffic estimates: Let’s assume we have 500 million new URL shortenings per month. Based on our 100:1 read/write ratio, we can expect 50 billion redirections during the same period:

$100 * 500 million = 50 billion$

We’ll now determine our system’s Queries Per Second (QPS). We’ll take the monthly amount of 50 billion to calculate the new URLs shortenings per second:

$500 million / (30 days * 24 hours * 3,600 seconds) = ~200 URLs/second$

Applying the 100:1 read/write ratio again, URL redirections per second will be:

$100 * 200 URLs/second = 20,000/second$

Storage estimates: Let’s assume we store every URL shortening request and its associated link for five years. Since we expect to have 500 million new URLs every month, the total number of objects we expect to store over five years will be 30 billion:

$500 million * 12 months * 5 years = 30 billion$

For now, let’s assume that each stored object will be approximately 500 bytes. This means we’ll need 15TB of total storage for the five year period:

$30 billion * 500 bytes = 15 TB$

Bandwidth estimates: We expect 200 new URLs per second for write requests. This makes our service’s total incoming data 100KB per second:

$200 * 500 bytes = 100 KB/s$

For read requests, we expect approximately 20,000 URL redirections every second. Then total outgoing data for our service would be 10MB per second:

$20,000 * 500 bytes = ~10 MB/s$

Memory estimates: We’ll need to determine how much memory we’ll need to store the frequently accessed hot URLs’ cache. If we follow the 80-20 rule, then 20% of URLs will generate 80% of traffic. We would like to cache this 20% of hot URLs.

Since we have 20,000 requests per second, we will be getting 1.7 billion requests per day:

$20,000 * 3,600 seconds * 24 hours = ~1.7 billion$

To cache 20% of these requests, we will need 170GB of memory.

$0.2 * 1.7 billion * 500 bytes = ~170GB$

It’s worth noting that there will be a lot of duplicate requests for the same URL. This means our actual memory usage will be less than 170GB.

System APIs#

We can use REST APIs to expose our service’s functionality. This example shows the API’s definition for creating and deleting URLs without service:

The url_key is a string that delineates the shortened URL that needs to be deleted. A successful deletion will return URL Removed.

Master system design beyond the interview
Our comprehensive system design course covers these design problems – and much more.

Grokking Modern System Design: Design TinyURL

Database design#

Let’s look at some observations about the data we are storing:

The service needs to store billions of records.
The service is read-heavy.
Each object is small (<1,000).
There are no relationships between each record, except for storing which user created the short link.

Schema:

We’ll now consider database schemas. We need a table for storing information on URL mappings as well as a database for data on users who created short links.

Basic system design and algorithm#

Our service’s most crucial concern is how we’ll generate a short and unique key when given a URL. For this example, we’ll go about this by encoding the URL.

We can compute a given URL’s unique hash, such as MD5 or SHA256. The hash can then be encoded for display. This encoding could be base36, base62, or base64.

We’ll use base64 encoding. Now we’ll consider whether the short key should be six, eight, or 10 characters long.

Using base64 encoding, a six-character key would yield 64^6 = ~68.7 billion possible strings.
Using base64 encoding, an eight-character key would result in 64^8 = ~281 trillion possible strings

Let’s assume the six-character key is sufficient for our system. Our hash function will produce a 128-bit hash value if we use the MD5 algorithm. Each base64 character encodes six bits of the hash value. This means we’ll get a string of over 21 characters after encoding. We’ll need to take a different approach for choosing our key because we only have space for 8 characters per short key.

We can take the first six (or eight) letters for the key. We’ll need to take steps to avoid key duplication. We might swap some characters or choose characters not already in the encoding string.

There are some potential obstacles when taking this approach:

If multiple users enter the same URL, they will get the same short link.
Parts of the URL can be URL-encoded.

Workarounds: One solution to the aforementioned problems would be appending a number from a sequence to each short link URL. This would make each link unique, even if multiple users provide the same URL. Another potential solution is to append the user ID to the input URL. However, this would only work if the user was signed in. We would also need to generate a uniqueness key.

Learn how to design generative AI systems!

Grokking the Generative AI System Design

This course will prepare you to design generative AI systems with a practical and structured approach. You will begin by exploring the foundational concepts, such as neural networks, transformers, tokenization, embedding, etc. This course introduces a 6-step SCALED framework, a systematic approach to designing robust GenAI systems. Next, through real-world case studies, you will immerse into the design of GenAI systems like text-to-text (e.g., ChatGPT), text-to-image (e.g., Stable Diffusion), text-to-speech (e.g., ElevenLabs), and text-to-video (e.g., SORA). This course describes these systems from a user-focused perspective, emphasizing how user inputs interact with backend processes. Whether you are an ML/software engineer, AI enthusiast, or manager, this course will equip you to design, train, and deploy generative AI models for various use cases. You will gain confidence to approach new challenges in GenAI and leverage advanced techniques to create impactful solutions.

3hrs 30mins

Intermediate

1 Quiz

116 Illustrations

Data partitioning and replication#

Our database will store information about billions of URLs. We’ll need to partition it to make it scalable.

Range-based partitioning: We can store the URLs in separate partitions based on the first letter of its hash key. So, we save all the URLs with the first letter of their hash key being A in one partition, and so on.

This approach is problematic because it leads to unbalanced database servers. This in turn will create unequal load.

Hash-based partitioning: We can take the hash of a stored object and calculate which partition to use. The hashing function will randomly distribute data into different partitions.

This approach sometimes leads to overloaded partitions. If it does, you could use consistent hashing to resolve it.

Cache#

Our service should be able to cache URLs that are frequently accessed. We could use a solution like Memcached to store full URLs with respective hashes.

Cache memory requirements: We can start with around 20% of the daily traffic and adjust based on usage patterns. Our previous estimations tell us we’ll need 170GB of memory to cache 20% of the daily traffic.

Cache eviction policy: We want to replace a link with a more popular URL. We can use the Least Recently Used (LRU) policy for our system.

Load balancing#

We can add a load balancing layer to our system in three places:

Between clients and application servers
Between application servers and database servers
Between application servers and cache servers

We can start with a Round Robin approach to equally distribute incoming requests among servers. This is easy to implement and does not create any overhead.

It’s important to note that the load balancer won’t stop sending new requests to servers that become overloaded when using this approach. To avoid overload, a more intelligent load balancer solution can be developed to periodically adjust traffic based on load per server.

Designing Instagram#

What is Instagram?#

Instagram is a social media platform that allows users to share photos and videos with other users. Instagram users can share information either publicly or privately.

We’ll design a simple version of Instagram that enables users to share photos, follow each other, and access the news feed. Users will see the top photos of the people they follow on their news feed.

System requirements and goals#

Let’s consider some functional and non-functional requirements for designing Instagram.

Non-functional requirements:

The service should be highly available.
The system’s acceptable latency should be around 200 milliseconds for the news feed.
The system should so reliable that photos and videos in the system are never lost.

Functional requirements:

The user should be able to search based on photo or video titles.
The user should be able to upload, download, and view photos and videos.
The user should be able to follow other users.
The system should be able to generate a displayed news feed consisting of the top photos and videos of the people that the user follows.

Functions that are not within the scope of this project are adding tags to photos, searching for photos with tags, commenting on photos, and tagging users.

Master system design beyond the interview
Our comprehensive system design course covers these design problems – and much more.

Grokking Modern System Design for Software Engineers & Managers: Design Instagram

Capacity estimation and constraints#

We will base our approach off these numbers:

500 million total users
1 million daily active users
2 million new photos every day (at a rate of 23 new photos/second)
Average photo file size of 200KB
Total space required for 1 day of photos:

$2 million * 200KB => 400 GB$

Total space required for 10 years:

$400GB * 365 days * 10 years = ~1,425TB$

High-level system design#

The system should be able to support users as they upload and view each other’s media. This means our service needs servers to store media as well as another database server to store the media’s metadata.

We could take a straightforward approach by storing the above schema in a Relational Database Management System (RDBMS). However, challenges often arise when using a relational database for a scaling application.

We can store the above schema with key-value pairs utilizing a NoSQL database. The metadata for the photos and videos will belong to a table where the key would be the PhotoID, and the value would be an object containing PhotoLocation, UserLocation, CreationTimestamp, and so on.

We can use a wide-column datastore, Cassandra, to store the relationships between users and photos as well as a list of a user’s followed accounts. The UserPhoto table’s key would be UserID. The value would be the list of PhotoIDs the user owns. These will be stored in different columns. This pattern will be similar to the UserFollow table.

The photos and videos can be stored in a distributed file storage like HDFS.

Data size estimation#

Let’s estimate how much data will go into each table and how much total storage we’ll need for 10 years.

User: Each row in this table will be 68 bytes, assuming each int and dateTime is four bytes:

$UserID (4 bytes) + Name (20 bytes) + Email (32 bytes) + DateOfBirth (4 bytes) + CreationDate (4 bytes) + LastLogin (4 bytes) = 68 bytes$

If we have 500 million users, we’ll need 32GB of total storage:

$500 million * 68 =~ 32GB$

Photo: Each row in this table will be 284 bytes:

$PhotoID (4 bytes) + UserID (4 bytes) + PhotoPath (256 bytes) + PhotoLatitude (4 bytes) + PhotLongitude(4 bytes) + UserLatitude (4 bytes) + UserLongitude (4 bytes) + CreationDate (4 bytes) = 284 bytes$

If 2 million new photos get uploaded every day, we’ll need 0.5GB of storage for one day:

$2 million * 284 bytes ~= 0.5GB/day$

For 10 years we will need 1.88TB of storage.

UserFollow: Each row in this table will be eight bytes. We have 500 million users. If each user follows an average of 500 other users, we’ll need 1.82TB of storage for the UserFollow table:

$500 million users * 500 followers * 8 bytes ~= 1.82TB$

Total space required for all tables for 10 years will be 3.7TB:

$32GB + 1.88TB + 1.82TB ~= 3.7TB$

Component design#

Photo uploads are often slower than reads because they go to the disk. Uploading users will consume all the available connections because of how slow the process is. Reads cannot be served when the system is loaded with write requests. To handle this bottleneck, we can split reads and writes to separate servers so that the system is not overloaded.

This will allow us to optimize each operation efficiently.

Reliability and redundancy#

Creating redundancy in the system allows us to create a backup in the midst of a system failure. We can’t lose any files and need a highly reliable application.

We’ll achieve this by storing multiple copies of each photo and video so the system can retrieve media from a copy server in the even that a server dies. We’ll apply this design to the other components of our architecture. With multiple copies of services in the system, the system will run even if a service dies.

Data sharding#

One possible scheme for a metadata sharding service is to partition based on PhotoID.

To solve the above problems, we can generate unique PhotoIDs and then find a shard number through PhotoID % 10. We would not need to append ShardID with PhotoID in this case because PhotoID will be unique throughout the system.

Generating PhotoIDs#

We won’t be able to define PhotoID by having an auto-incrementing sequence in each shard. Wee need to know PhotoID in order to find the shard where it will be stored. One solution could be to dedicate a separate database instance to generate auto-incrementing IDs.

If our PhotoID can fit into 64 bits, we can define a table containing only a 64 bit ID field. Whenever we want to add a photo in our system, we can insert a new row in this table and take that ID as the PhotoID of our new photo.

Ensuring reliability#

This key-generating database could be a single point of failure. A workaround for that could be defining two such databases with one generating even-numbered IDs and the other odd-numbered. For MySQL, the following script can define such sequences:

Wrapping up and resources#

Well done! You should now have a good idea of how to design an application like Instagram and TinyURL. There are still many more system design patterns to learn.

To help you master system design for your interview and beyond, we’ve created Grokking Modern System Design for Software Engineers & Managers. This course dives deep into other design problems, such as Facebook Messenger, Twitter, and more. But going beyond your interview, it’s a comprehensive course covering the fundamentals system design for developers and engineering managers alike.

Happy learning!

Written By:

Aaron Xie

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Design TinyURL and Instagram: System design interview tutorial

Master system design beyond the interview
Our comprehensive system design course covers these design problems – and much more.

Grokking Modern System Design