Encoder for TinyURL
Understand the inner details of an encoder that is critical for URL shortening.
Introduction
We have discussed the overall design of a short URL generator (SUG) in detail, but two aspects need more clarification:
- How does encoding improve the readability of the short URL?
- How the sequencer and the base-58 encoder in the short URL generation are related?
Why Encoding?
Our sequencer generates a 64-bit ID in base-10 which can be converted to a base-64 short URL; base-64 is the most common encoding for alphanumeric strings generation. However, there are some inherent issues with sticking to the base-64 for this design problem: the generated short URL might have readability issues because of look-alike characters. Characters like O
(capital o) and 0
(zero), I
 (capital I), and l
 (lower case L) can be confused while characters like +
 and /
 should be avoided because of other system-dependent encodings.
We, therefore, slash out the six characters and use base-58 instead of base-64 (includes A-Z, a-z, 0-9, +
and /
) for enhanced readability purposes. Let's have a look at our base-58 definition.
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy