The core challenge lies in the inability to identify an NP-complete problem that also falls under P, a discovery that would imply the potential for rapid and straightforward solutions to all NP problems. For several decades, computer scientists have been diligently working to resolve this issue, yet a conclusive answer remains elusive.
Beyond P vs. NP: Finding Harder Problems
The P vs. NP problem#
The discussion on hard programming problems is mostly centered around the famous vs. problem. Where is the collection of problems that are
The vs. problem asks whether these two collections are the same!
The answer to this problem is unknown, as of now. If the answer turns out to be in the affirmative, the two notions (efficient verifiability and efficient search for solutions) would be provably the same. This does not seem very likely, and most researchers believe that it is not the case. There are some exceptions, however, Donald Knuth being the most
We can easily establish that any problem in can be solved in exponential time.
We’ll prove in this blog that there are even harder problems than the ones in .
The Hardest of Them All#
Have you ever wondered which of the known problems is the hardest of them all?
We’re not talking about problems that cannot be solved (like the halting problem). We’d like to know which problem, from among the solvable ones (that is, for which some algorithm is known), is the hardest—in the sense that it would require the most time to solve.
We’ll answer this question in this blog. As a sneak peek, our answer will be “There’s no such problem.” That is, given any hard problem, we can always find a harder problem—so there’s no hardest problem! This is where we will need to go beyond the the P vs. NP question into the realm of the time hierarchy theorem.
Setting up the Framework#
We’ll limit ourselves to the standard framework of computability theory, conveniently adapted for the general computer science audience we laid down in an earlier blog about the halting problem.
In summary:
-
All of our problems will be decision problems. That is, we’ll be interested in deciding whether a given input satisfies certain conditions or not. For instance, the problem of determining whether a given number is prime, or the problem of determining whether a given graph has a
cycle.Hamiltonian A cycle that contains all vertices of the graph. -
All of our programs will take only one input and return a boolean value. We can easily modify every program to receive only one input by using, for instance, some aggregate data type.
Justifications for why these assumptions do not affect the generality are given in the above mentioned blog.
We’d also assume that we have access to a program that can simulate a given program for a specified number of steps. Think of it as a modified interpreter that increments a counter after executing each step. We can build it, or modify one, without too much difficulty.
Time Hierarchy Theorem#
We’ll start by introducing the time hierarchy theorem, that’s the key to answering the questions we’ve set out to answer. It basically formalizes and justifies the intuitive notion that given more time, more complex problems can be solved!
We’ll state and prove a weak version of the time hierarchy theorem since it’s easier to prove, and can serve as the foundation of studying the stronger version for interested learners.
The Statement of the Theorem#
In order to avoid weird functions, let’s restrict ourselves to increasing functions on
We’ll show that for any such function , there is a problem that cannot be solved in time.
This would immediately imply that no matter how much time is given, there’d still be problems that can’t be solved in that time!
There’s a stronger version of this theorem of a more applied flavor that measures the time complexity of the program that solves . However, we’ll restrict ourselves to this weaker version, as it’s more suitable to the blog format.
Defining the Problem in Terms of the Solution#
The approach we’ve taken for coming up with this problem is a little unusual—instead of defining directly, we’ll define it in terms of a program that solves it.
is the set of all inputs on which would return TRUE.
Don’t be put off by it as it’s a perfectly legitimate way of defining a problem. For example, we can legitimately define even numbers by first providing a program that tests if a number is a multiple of two, and then say even numbers are all those numbers on which this program returns TRUE.
Designing DDD: The High-Level Idea#
Recall that we’re dealing with decision problems in computability theory where problems are sets. We’ll design so that it differs from any program that runs in time on at least one input. This would establish that , the problem solved by , is different from any problem that can be solved in time.
To be more precise, let’s assume are all the programs that halt and return an answer in time. If could be solved in time, then surely one of the from this list would solve it.
We’ll design in such a way that for any program from this list, there’s an input on which .
While designing , we’ll choose to be the program itself!
As is defined using , this would prove that can’t be the same as any of the problems solved by the . This would immediately imply that the problem can’t be solved in time, as the list of contains all problems solvable in time.
This is diagonalization in action that we used to show that the halting problem is unsolvable.
The program DDD#
We’ll define as follows: takes an input and returns TRUE or FALSE. Recall that our problem is the set of all inputs on which returns TRUE.
performs the following steps on the input :
-
verifies whether is a valid program. If it isn’t, returns FALSE. This can easily be done by embedding a compiler of whichever language is chosen in as a subroutine and calling it on . Let’s refer to the program that corresponds to as .
-
now runs on for steps, using the program described earlier. returns FALSE if the program does not produce a valid return value (TRUE or FALSE) in this much time.
Do note that runs the program on itself, as .
-
When halts with a valid return value in steps, returns the opposite of what has returned. That is, if returns TRUE, returns FALSE, and if returns FALSE, returns TRUE.
Note that by differing with on input , has made sure that is different from the problem solved by .
This completes the construction of .
In the above figure, each entry of the table represents the result of running the program on input for steps inside . The problem is then defined to be the set of all programs that return FALSE on input , as these are the only inputs on which returns TRUE.
Just Defining DDD Is Enough#
As the input to can be any string, it could be given any program whatsoever as an input. In particular, all programs in the chosen language that run in times can be given as inputs to .
Keep in mind that we are not going to run to do anything, our job is done by just writing , as this gives us the definition of .
To reiterate, by ensuring that is different on at least one input from any program that runs and produces a valid result in time, we have established that can’t be solved in time. This is precisely what we set out to do!
We can use this version of the time hierarchy theorem to show that there are problems that can’t be solved in even or time, for example!
Concluding Thoughts#
A stronger version of the time hierarchy theorem can be proven if we measure the complexity of the program more precisely. We can do so if we make certain assumptions, like the following:
-
is at least . needs at least this much time to determine from the input , as .
-
Each step of the simulation takes some constant time. This is a reasonable assumption, as a simulation of one step of the program should not depend on input length.
-
Each increment to the counter in (to check if we have not exceeded steps) does not take more than time. This is also a fair assumption, as we do not require more than bits to write down .
The time complexity of then turns out to be (simulation of steps each requiring time). This is under the assumption that the compiler does not need more than this much time to verify if the input is a valid program.
Once we have this, we can make stronger claims like the following:
There’s a problem that can be solved in time that can’t be solved in time.
An obvious and immediate consequence is that
where is the collection of efficiently solvable problems discussed earlier, and is the collection of all problems that can be solved in exponential time.
This won’t earn us a million dollars, but it’s no less valuable since there’s a severe dearth of specific results like this in complexity theory. Understanding the nuances of time complexity and the time hierarchy theorem gives you a foundational basis for grappling with the complexities of the ‘P vs NP’ problem.
Want to learn more? Feel free to check out the following resources available on Educative:
If want to get a light introduction to complexity analysis and some basic concepts of complexity theory, check out the Big-O Notation For Coding Interviews and Beyond course.
If you want to learn about the halting problem and why it's undecidable, check out the Unsolvable Problems: Insights into the Halting Problem and Beyond blog.
Recursion theorem plays an important role in computability theory. Feel free to check out the Quines: Self-replicating program blog.
Frequently Asked Questions
Has P vs NP been solved?
Has P vs NP been solved?