Nov 13, 2020 - 12 min read

Vincent Russo

*Vincent Russo is an Educative course contributor who is a full-time software developer and runs LucidProgramming, a YouTube channel to help individuals improve their software skills and value as developers.*

*LucidProgramming has content centered on Python development where the topics covered include data structures, algorithms, web scraping, natural language processing, and many more.*

On October 5, 2020, Python 3.9 was released, and with it, came a number of new features. One handy new feature is the addition of the ** graphlib module** that now comes standard with Python 3.9.

At the time of this writing, `graphlib`

only contains functionality for performing topological sorting, but it is planned to house a number of other graph algorithms to include in the Python standard library.

Some contributions have also been made pertaining to **string manipulation**.
Python already has extensive native functionality for manipulating strings.
In Python 3.9, there are further contributions and improvements that this new
version of Python offers for all your string tweaking needs.

We’ll be taking a deep dive into what these new contributions are and how to invoke them in your own Python projects.

In this post, we will cover the `TopologicalSorter`

component of `graphlib`

and how to use the updated string manipulation in Python 3.9.

**Today, we will go over:**

- Topological ordering and Topological sort
- Topological sorting in Python 3.9
- String Manipulation: Prefix and suffix removal
- Replace and empty strings
- Summary and wrap up

Review of the most common data structures and algorithms that you’ll see in interviews with hands-on, real-world questions.

**Data Structures and Algorithms in Python**

**Topological sort** is an algorithm that takes a directed acyclic graph (DAG) as input and returns an ordering where each vertex appears prior to all vertices that it points to. We call this ordering a *topological ordering*.

More formally, quoting from Wikipedia:

“…topological sort returns a linear ordering of its vertices such that for every directed edge $uv$ from vertex $u$ to vertex $v$, $u$ becomes before $v$ in the ordering.”

To make the definition more concrete, let’s take a look at an example. Consider the following DAG:

```
+---------------------+
| |
| v
+---+ +---+ +---+ +---+
| A | ---> | B | ---> | D | ---> | E |
+---+ +---+ +---+ +---+
| ^
| +---+ |
+------> | C | -------+
+---+
Figure-1: Example of a directed acyclic graph (DAG).
```

**One can see that the graph in Figure-1 is a DAG as it contains no cycles.**

The vertex labelled `A`

in Figure-1 is connected to vertices labelled with `B`

and `C`

, so a topological ordering would be one where `A`

comes before both `B`

and `C`

. Applying this same logic to the DAG for the remaining vertices, we find that one valid topological ordering is `[A, B, C, D, E]`

.

Topological orderings are not necessarily unique however, so it is possible to have more than one valid topological ordering. For instance the ordering `[A, C, B, D, E]`

is also valid.

Topological sort is a useful tool to have in ones arsenal. One application of topological sort is for job scheduling. If you encode the vertices as jobs that need to be completed and the edges as dependencies for the jobs, performing a topological sort will indicate an ordering of jobs that can be completed without encountering any dependency conflicts.

The way that we encode the data in a graph in Python is by using the **dictionary** where the keys correspond to the labels of the vertices and where the values are a dictionary of labels that correspond to what vertices the key vertex is connected to.

To make the above more concrete, let’s take the DAG from Figure-1 and encode it in a Python dictionary structure.

```
# Encode the DAG from Figure-1 in Python dictionary.
graph = {"B": {"A"}, "C": {"A"}, "D": {"B", "C"}, "E": {"D", "B"}}
```

Taking this `graph`

object that we constructed, we can perform a topological sort to see what the result yields

```
>>> import graphlib
>>>
>>> ts = graphlib.TopologicalSorter(graph)
>>> print(list(ts.static_order()))
('A', 'B', 'C', 'D', 'E')
```

We see that indeed, the result of performing topological sort on this graph agrees with our assessment in the previous section.

The usage of the `static_order()`

method returns an iterable of nodes in a topological order. Converting that result to a list allows us to print and see the topological ordering of the `graph`

object.

Note that if we define a graph that is not a DAG, say, a graph that contains a cycle, `graphlib`

will throw a `CycleError`

to indicate that the graph is cyclic and therefore cannot possess a topological ordering.

Consider the following cyclic graph.

```
+---+ +---+ +---+
| A | <--- | B | <--- | D |
+---+ +---+ +---+
| ^
| +---+ |
+------> | C | -------+
+---+
Figure-2: Example of a cyclic graph.
```

We can confirm that encoding this graph in a Python dictionary and attempting make use of the `TopologicalSorter`

will yield a `CycleError`

.

>>> import graphlib>>>>>> # Encode the graph from Figure-2 in Python dictionary.>>> graph = {"A": {"B"}, "B": {"D"}, "C": {"A"}, "D": {"C"}}>>>>>> ts = graphlib.TopologicalSorter(graph)>>> list(ts.static_order())Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/graphlib.py", line 241, in static_orderself.prepare()File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/graphlib.py", line 103, in prepareraise CycleError(f"nodes are in a cycle", cycle)graphlib.CycleError: ('nodes are in a cycle', ['A', 'B', 'D', 'C', 'A'])

**As expected, we cannot obtain a topological ordering on a cyclic graph and seeing this CycleError indicates that.**

To see an application of where topological sort is useful and where we can make use of the `TopologicalSorter`

from `graphlib`

, consider the following problem “Course Scheduling” problem from LeetCode.

There are a total of

`num_courses`

courses you have to take, labelled from`0`

to`num_courses-1`

. Some courses may have prerequisites, for example to take course`0`

you have to first take course`1`

, which is expressed as a pair:`pre_reqs=[0,1]`

.Given the total number of courses and a list of prerequisite pairs, is it possible for you to finish all courses?

As an example, consider the input `num_courses = 2`

and `pre_reqs = [[1, 0]]`

. There does exist a way in which to finish all of the courses. In this case, there are a total of two courses that need to be taken.

In order to take course `1`

, it is required that you must have finished course `0`

, so this is possible to achieve.

Alternatively, consider the input `pre_reqs = [[1,0],[0,1]]`

. Here, there does *not* exist a way in which to finish all of the courses. One needs to take two courses. To take course `1`

, it is required that course `0`

is finished.

Paradoxically, to take course `0`

it is required that course `1`

is finished. We have a catch-22 situation on our hands here, and thus, it is not possible to satisfy this criterion.

The course scheduling problem lends itself nicely to a setting whose relationships can be **encoded in a graph**. The vertices of the graph represent the courses and the edges between the vertices represent the prerequisites that must be completed prior to taking.

Given such a graph encoded in this manner, if we are able to obtain a topological ordering then it must be possible for a student to complete the required number of courses.

Alternatively, if we have a cycle in our graph, then we know that such a topological ordering is not possible and therefore the answer in that case would be “no”.

```
import graphlib
def can_schedule(graph):
ts = graphlib.TopologicalSorter(graph)
try:
# The graph has a topological ordering.
list(ts.static_order())
return True
except:
# The graph does *not* have a topological ordering.
return False
```

```
# Example: A valid topological ordering exists, and therefore
# it is possible to schedule the courses.
>>> graph = {0: {1}}
>>> can_schedule(graph)
True
```

```
# Example: A valid topological ordering does *not* exist, and
# therefore it is *not* possible to schedule the courses.
>>> graph = {0: {1}, {1}: 0}
>>> can_schedule(graph)
False
```

This course is a detailed review of the most common data structures and algorithms that you’ll see in interviews. With implementation details, thorough explanations, and hands-on coding exercises, you’ll quickly gain the confidence you need to solve any problem, no matter the situation.

PEP 616 provides two new
methods,`removeprefix()`

and `removesuffix()`

, as **built-in methods** for Python
string objects. As the function name sugests, `removeprefix()`

removes a
specified prefix string from a string object and `removesuffix()`

removes a
specified suffix string.

Let’s take a look at some examples:

```
>>> s = "I think Educative is awesome!"
>>> s.removeprefix("I think ")
'Educative is awesome!'
```

As we can see, the argument to the

`removeprefix()`

method is the prefix from the string`s`

that we wish to remove.

If the prefix in question does not exist in the string, then the original
string is returned as a result instead of any `ValueError`

being thrown. For
instance:

```
>>> s = "I think Educative is awesome!"
>>> s.removeprefix("I know ")
'I think Educative is awesome!'
```

In a situation where you want to know that there was indeed a prefix and it was
removed, one could check that by comparing the length of the strings prior to
and after the call to `removeprefix()`

. For example:

```
>>> if len(s) == len(s.removeprefix(prefix)):
>>> print("No prefix was removed.")
>>> else:
>>> print(f"Prefix {prefix} was removed.")
```

Of course, in many cases it would be preferable for the function to “fail gracefully” and to only remove a prefix if one exists and to pass through if it does not.

The same basic idea applies to the `removesuffix()`

method operating on
strings, where of course the key difference for this method is that it removes
the suffix of a string as opposed to the prefix. A small example for
completeness:

```
>>> s = "Educative is awesome, I think!"
>>> s.removesuffix(", I think!")
'Educative is awesome'
```

One subtle point that should be mentioned is that if we did not provide the
*complete* suffix, that is, part of the string that does not extend all the way
to the end of the string, we would obtain the same initial string back as our
result. Observe that:

```
>>> s = "Educative is awesome, I think!"
>>> s.removesuffix(", I think")
'Educative is awesome, I think!'
```

That is, since the string in the argument to `removesuffix`

is not the complete
suffix of the string `s`

, it is ignored and instead, the initial string is
returned back to the user.

There are some direct benefits to preferring `removeprefix()`

and
`removesuffix()`

over doing something like making use of the `replace()`

function that Python provides.

```
>>> s = "Educative is awesome, I think!"
>>> s.replace(", I think", "")
'Educative is awesome'
```

While the behavior in this case gives us what we want, behind the scenes the
`replace()`

method is more computationally expensive. We certainly wouldn’t
notice any difference for examples of the size we have here.

But if, say, you
are performing **various string manipulations** on large sequences of genes for a
bioinformatics projects, the expensive nature of using `replace()`

over either
`removeprefix()`

or `removesuffix()`

could be much more noticeable, depending
on the size of your data.

Another arguable benefit to the `removeprefix()`

and `removesuffix()`

functions
are that they are more descriptive. Instead of using something opaque and fragile for the specific instance, it is more direct as to what the end result of running the computation on the
string should be.

If you are revamping an existing Python project to be compatible with Python
3.9, and it happens to make use of various string
manipulations, it may be worth considering where you could make use of the
now-standard `removeprefix()`

and `removesuffix()`

methods.

Further information about the `removeprefix()`

and `removesuffix`

methods can
be found in the respective *PEP 616 – String methods to remove prefixes and
suffixes*.

*Bug 28029 – Replace and empty strings*, which pertained to some confusing behavior of the aforementioned `replace()`

function for strings, was adapted in Python 3.9.

To understand the misleading behavior, consider the `replace()`

method
prototype according to the
Python docs.

```
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
```

We have that `s`

is the input string, the `old`

argument pertains to the string
to search for in `s`

, the `new`

argument is the string to replace the `old`

value with, and finally the `maxreplace`

is a
number specifying how many occurrences of the old value you want to replace.

If no

`maxreplace`

is specified, the`replace()`

method defaults to applying itself to all occurrences of`old`

to`new`

.

So for instance, a common usage of `replace()`

is to replace all of the spaces
in a string with no spaces. Take for example:

```
>>> "123 456 789".replace(" ", "")
'123456789'
```

We see that *all* of the occurrences of the `" "`

character (the space) was
replaced with the `""`

character (no space). Taking advantage of the
`maxreplace`

parameter, we can specify how many of the instances of the
character in question we wish to replace (starting from the beginning of the
string).

For instance, if we just want to replace just the first instance of a space in the example above, we could do the following:

```
>>> "123 456 789".replace(" ", "", 1)
'123456 789'
```

One surprisingly and perhaps counterintuitive behavior of the `replace()`

function was discovered in 2016, which is documented in
previously mentioned Bug 28029.

The gist of the confusing behavior is captured the following example:

```
>>> # First example:
>>> "".replace("", "prefix")
"prefix"
>>> # Second example:
>>> "".replace("", "prefix", 1)
""
```

The first line makes sense, as we want to replace the empty string with the
string `"prefix"`

. However, we should also expect the same output for the
second example.

The change for Python 3.9 was to fix this perplexing edge case and to have the
`replace()`

function return the string provided as input instead of an empty
string for all non-zero `maxreplace`

.

Fixing this issue resolves an edge case that could come off as potentially confusing to new Python users in a function that has been with Python for quite a long time.

While there have been third party implementations of topological sort in the Python ecosystem (e.g. topsort and NetworkX), native support has been included as of the release of Python 3.9.

The `graphlib`

module is presently sparse, but will continue to be developed to include other graph algorithms in future iterations. Beyond the scope of this article, the `TopologicalSorter`

also has support for parallelization and can be applied to settings where invoking such an approach would be advantageous.

On top of that, string manipulation is an important feature in most modern programming languages. While the release of Python 3.9 includes a number of newer features and functions, it is good to see a dedicated effort on the core aspects of Python’s feature set.

Hopefully you can make use of these new changes to improve your own Python projects. Check out my course **Data Structures and Algorithms in Python** to continue learning about Python in the real-world.

*Happy learning!*

**Join a community of more than 1.4 million readers. A free, bi-monthly email with a roundup of Educative's top articles and coding tips.**