Encoder for TinyURL

Understand the inner details of an encoder that is critical for URL shortening.

Introduction

We have discussed the overall design of a short URL generator (SUG) in detail, but two aspects need more clarification:

  1. How does encoding improve the readability of the short URL?
  2. How the sequencer and the base-58 encoder in the short URL generation are related?

Why Encoding?

Our sequencer generates a 64-bit ID in base-10 which can be converted to a base-64 short URL; base-64 is the most common encoding for alphanumeric strings generation. However, there are some inherent issues with sticking to the base-64 for this design problem: the generated short URL might have readability issues because of look-alike characters. Characters like O(capital o) and 0(zero), I (capital I), and l (lower case L) can be confused while characters like + and / should be avoided because of other system-dependent encodings.

We, therefore, slash out the six characters and use base-58 instead of base-64 (includes A-Z, a-z, 0-9, + and /) for enhanced readability purposes. Let's have a look at our base-58 definition.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy