The Most Important Unsolved Problem in Computer Science


Here’s a look at the million-dollar math problem at the heart of computation

mathematic formulas on a computer display, blue text on black screen
Credit: alengo/Getty Images

When the Clay Mathematics Institute put individual $1-million prize bounties on seven unsolved mathematical problems, they may have undervalued one entry—by a lot. If mathematicians were to resolve, in the right way, computer science’s “P versus NP” question, the result could be worth worlds more than $1 million—they’d be cracking most online-security systems, revolutionizing science and even mechanistically solving the other six of the so-called Millennium Problems, all of which were chosen in the year 2000. It’s hard to overstate the stakes surrounding the most important unsolved problem in computer science.

P versus NP concerns the apparent asymmetry between finding solutions to problems and verifying solutions to problems. For example, imagine you’re planning a world tour to promote your new book. You pull up Priceline and start testing routes, but each one you try blows your total trip budget. Unfortunately, as the number of cities grows on your worldwide tour, the number of possible routes to check skyrockets exponentially, rapidly making it infeasible even for computers to exhaustively search through every case. But when you complain, your agent writes back with a solution sequence of flights. You can easily verify whether or not their route stays in budget by simply checking that it hits every city and summing the fares to compare against the budget limit. Notice the asymmetry here: finding a solution is hard, but verifying a solution is easy.

The P versus NP question asks whether this asymmetry is real or an illusion. If you can efficiently verify a solution to a problem, does that imply that you can also efficiently find a solution? Perhaps a clever shortcut can circumvent searching through zillions of potential routes. For example, if your agent instead wanted you to find a sequence of flights between two specific remote airports while obeying the budget, you might also throw up your hands at the similarly immense number of possible routes to check, but in fact, this problem contains enough structure that computer scientists have developed a fast procedure (algorithm) for it that bypasses the need for exhaustive search.

You might think this asymmetry is obvious: of course one would sometimes have a harder time finding a solution to a problem than verifying it. But researchers have been surprised before in thinking that that’s the case, only to discover last-minute that the solution is just as easy. So every attempt in which they try to resolve this question for any single scenario only further exposes how monumentally difficult it is to prove one way or another. P versus NP also rears its head everywhere we look in the computational world well beyond the specifics of our travel scenario—so much so that it has come to symbolize a holy grail in our understanding of computation.

In the subfield of theoretical computer science called complexity theory, researchers try to pin down how easily computers can solve various types of problems. P represents the class of problems they can solve efficiently, such as sorting a column of numbers in a spreadsheet or finding the shortest path between two addresses on a map. NP represents the class of problems for which computers can verify solutions efficiently. Our book tour problem, called the Traveling Salesperson Problem by academics, lives in NP because we have an efficient procedure for verifying that our agent’s solution worked.

Notice that NP actually contains P as a subset because solving a problem outright is one way to verify a solution to it. For example, how would you verify that 27 x 89 = 2,403? You would solve the multiplication problem yourself and check that your answer matches the claimed one. We typically depict the relationship between P and NP with a simple Venn diagram:

Venn diagram shows one large circle labeled “NP” encompassing a smaller one labeled “P.” The entire circle is labeled “Problems with solutions that computers can verify easily.” The area inside of P is labeled “Problems with solutions that computers can find easily.” The area in NP but outside of P is labeled “Problems with solutions that computers can verify but not find easily.”
Credit: Amanda Montañez

The region inside of NP but not inside of P contains problems that can’t be solved with any known efficient algorithm. (Theoretical computer scientists use a technical definition for “efficient” that can be debated, but it serves as a useful proxy for the colloquial concept.) But we don’t know if that’s because such algorithms don’t exist or we just haven’t mustered the ingenuity to discover them. Here’s another way to phrase the P versus NP question: Are these classes actually distinct? Or does the Venn diagram collapse into one circle? Do all NP problems admit efficient algorithms? Here are some examples of problems in NP that are not currently known to be in P:

  • Given a social network, is there a group of a specified size in which all of the people in it are friends with one another?
  • Given a varied collection of boxes to be shipped, can all of them be fit into a specified number of trucks?
  • Given a sudoku (generalized to n x n puzzle grids), does it have a solution?
  • Given a map, can the countries be colored with only three colors such that no two neighboring countries are the same color?

Ask yourself how you would verify proposed solutions to some of the problems above and then how you would find a solution. Note that approximating a solution or solving a small instance (most of us can solve a 9 x 9 sudoku) doesn’t suffice. To qualify as solving a problem, an algorithm needs to find an exact solution on all instances, including very large ones.

Each of the problems can be solved via brute-force search (e.g., try every possible coloring of the map and check if any of them work), but the number of cases to try grows exponentially with the size of the problem. This means that if we call the size of the problem n (e.g., the number of countries on the map or the number of boxes to pack into trucks), then the number of cases to check looks something like 2n. The world’s fastest supercomputers have no hope against exponential growth. Even when n equals 300, a tiny input size by modern data standards, 2300 exceeds the number of atoms in the observable universe. After hitting “go” on such an algorithm, your computer would display a spinning pinwheel that would outlive you and your descendants.

Thousands of other problems belong on our list. From cell biology to game theory, the P versus NP question reaches into far corners of science and industry. If P = NP (i.e., our Venn diagram dissolves into a single circle) and we obtain fast algorithms for these seemingly hard problems, then the whole digital economy would become vulnerable to collapse. This is because much of the cryptography that secures such things as your credit card number and passwords works by shrouding private information behind computationally difficult problems that can only become easy to solve if you know the secret key. Online security as we know it rests on unproven mathematical assumptions that crumble if P = NP.

Amazingly, we can even cast math itself as an NP problem because we can program computers to efficiently verify proofs. In fact, legendary mathematician Kurt Gödel first posed the P versus NP problem in a letter to his colleague John von Neumann in 1956, and he expressed (in older terminology) that P = NP “would have consequences of the greatest importance. Namely, it would obviously mean that … the mental work of a mathematician concerning yes-or-no questions could be completely replaced by a machine.”

If you’re a mathematician worried for your job, rest assured that most experts believe that P does not equal NP. Aside from the intuition that sometimes solutions should be harder to find than to verify, thousands of the hardest NP problems that are not known to be in P have sat unsolved across disparate fields, glowing with incentives of fame and fortune, and yet not one person has designed an efficient algorithm for a single one of them.

Of course, gut feeling and a lack of counterexamples don’t constitute a proof. To prove that P is different from NP, you somehow have to rule out all potential algorithms for all of the hardest NP problems, a task that appears out of reach for current mathematical techniques. In fact, the field has coped by proving so-called barrier theorems, which say that entire categories of tempting proof strategies to resolve P versus NP cannot succeed. Not only have we failed to find a proof but we also have no clue what an eventual proof might look like.

Scientists Want to Align Your Internal Clock Because Timing Is Everything


internal clock

In life, timing is everything.

Your body’s internal clock — the circadian rhythm — regulates an enormous variety of processes: when you sleep and wake, when you’re hungry, when you’re most productive. Given its palpable effect on so much of our lives, it’s not surprising that it has an enormous impact on our health as well. Researchers have linked circadian health to the risk of diabetes, cardiovascular disease, and neurodegeneration. It’s also known that the timing of meals and medicines can influence how they’re metabolized.

The ability to measure one’s internal clock is vital to improving health and personalizing medicine. It could be used to predict who is at risk for disease and track recovery from injuries. It can also be used to time the delivery of chemotherapy and blood pressure and other drugs so that they have the optimum effect at lower doses, minimizing the risk of side effects.

However, reading one’s internal clock precisely enough remains a major challenge in sleep and circadian health. The current approach requires taking hourly samples of blood melatonin — the hormone that controls sleep — during day and night, which is expensive and extremely burdensome for the patient. This makes it impossible to incorporate into routine clinical evaluations.

My colleagues and I wanted to obtain precise measurements of internal time without the need for burdensome serial sampling. I’m a computational biologist with a passion for using mathematical and computational algorithms to make sense of complex data. My collaborators, Phyllis Zee and Ravi Allada, are world-renowned experts in sleep medicine and circadian biology. Working together, we designed a simple blood test to read a person’s internal clock.

Listening to the Music of Cells

The circadian rhythm is present in every single cell of your body, guided by the central clock that resides in the suprachiasmatic nucleus region of the brain. Like the secondary clocks in an old factory, these so-called “peripheral” clocks are synchronized to the master clock in your brain, but also tick forward on their owneven in petri dishes!

Your cells keep time through a network of core clock genes that interact in a feedback loop: When one gene turns on, its activity causes another molecule to turn it back down, and this competition results in an ebb and flow of gene activation within a 24-hour cycle. These genes in turn regulate the activity of other genes, which also oscillate over the course of the day. This mechanism of periodic gene activation orchestrates biological processes across cells and tissues, allowing them to take place in synchrony at specific times of day.

The circadian rhythm orchestrates many biological processes, including digestion, immune function, and blood pressure, all of which rise and fall at specific times of the day. Misregulation of the circadian rhythm can have adverse effects on metabolism, cognitive function, and cardiovascular health.
The circadian rhythm orchestrates many biological processes, including digestion, immune function, and blood pressure, all of which rise and fall at specific times of the day. Misregulation of the circadian rhythm can have adverse effects on metabolism, cognitive function, and cardiovascular health.

The discovery of the core clock genes is so fundamental to our understanding of how biological functions are orchestrated that it was recognized by the Nobel Committee last year. Jeffrey C. Hall, Michael Rosbash, and Michael W. Young together won the 2017 Nobel Prize in Physiology or Medicine “for their discoveries of molecular mechanisms controlling the circadian rhythm.” Other researchers have noted that as many as 40 percent of all other genes respond to the circadian rhythm, changing their activity over the course of the day as well.

This gave us an idea: Perhaps we could use the activity levels of a set of genes in the blood to deduce a person’s internal time — the time your body thinks it is, regardless of what the clock on the wall says. Many of us have had the experience of feeling “out of sync” with our environments — of feeling like it’s 5:00 a.m. even though our alarm insists it’s already 7:00 a.m. That can be a result of our activities being out of sync with our internal clock — the clock on the wall isn’t always a good indication of what time it is for you personally. Knowing what a profound impact one’s internal clock can have on biology and health, we were inspired to try to gauge gene activity to measure the precise internal time in an individual’s body. We developed TimeSignature: a sophisticated computational algorithm that could measure a person’s internal clock from gene expression using two simple blood draws.

Designing a Robust Test

To achieve our goals, TimeSignature had to be easy (measuring a minimal number of genes in just a couple blood draws), highly accurate, and — most importantly — robust. That is, it should provide just as accurate a measure of your intrinsic physiological time regardless of whether you’d gotten a good night’s sleep, recently returned from an overseas vacation, or were up all night with a new baby. And it needed to work not just in our labs but in labs across the country and around the world.

A mismatch between our internal time and our daily activities may raise the risk of disease.
A mismatch between our internal time and our daily activities may raise the risk of disease.

To develop the gene signature biomarker, we collected tens of thousands of measurements every two hours from a group of healthy adult volunteers. These measurements indicated how active each gene was in the blood of each person during the course of the day. We also used published data from three other studies that had collected similar measurements. We then developed a new machine learning algorithm, called TimeSignature, that could computationally search through this data to pull out a small set of biomarkers that would reveal the time of day. A set of 41 genes was identified as being the best markers.

Surprisingly, not all the TimeSignature genes are part of the known “core clock” circuit — many of them are genes for other biological functions, such as your immune system, that are driven by the clock to fluctuate over the day. This underscores how important circadian control is — its effect on other biological processes is so strong that we can use those processes to monitor the clock!

Using data from a small subset of the patients from one of the public studies, we trained the TimeSignature machine to predict the time of day based on the activity of those 41 genes. (Data from the other patients was kept separate for testing our method.) Based on the training data, TimeSignature was able to “learn” how different patterns of gene activity correlate with different times of day. Having learned those patterns, TimeSignature can then analyze the activity of these genes in combination to work out the time that your body thinks it is. For example, although it might be 7 a.m. outside, the gene activity in your blood might correspond to the 5 a.m. pattern, indicating that it’s still 5 a.m. in your body.

Many genes peak in activity at different times of day. This set of 41 genes, each shown as a different color, shows a robust wave of circadian expression. By monitoring the level of each gene relative to the others, the TimeSignature algorithm learns to ‘read’ your body’s internal clock.
Many genes peak in activity at different times of day. This set of 41 genes, each shown as a different color, shows a robust wave of circadian expression. By monitoring the level of each gene relative to the others, the TimeSignature algorithm learns to ‘read’ your body’s internal clock. 

We then tested our TimeSignature algorithm by applying it to the remaining data, and demonstrated that it was highly accurate: We were able to deduce a person’s internal time to within 1.5 hours. We also demonstrated our algorithm works on data collected in different labs around the world, suggesting it could be easily adopted. We were also able to demonstrate that our TimeSignature test could detect a person’s intrinsic circadian rhythm with high accuracy, even if they were sleep-deprived or jet-lagged.

Harmonizing Health With TimeSignature

By making circadian rhythms easy to measure, TimeSignature opens up a wide range of possibilities for integrating time into personalized medicine. Although the importance of circadian rhythms to health has been noted, we have really only scratched the surface when it comes to understanding how they work. With TimeSignature, researchers can now easily include highly accurate measures of internal time in their studies, incorporating this vital measurement using just two simple blood draws. TimeSignature enables scientists to investigate how the physiological clock impacts the risk of various diseases, the efficacy of new drugs, the best times to study or exercise, and more.

Of course, there’s still a lot of work to be done. While we know that circadian misalignment is a risk factor for disease, we don’t yet know how much misalignment is bad for you. TimeSignature enables further research to quantify the precise relationships between circadian rhythms and disease. By comparing the TimeSignatures of people with and without disease, we can investigate how a disrupted clock correlates with disease and predict who is at risk.

Down the road, we envision that TimeSignature will make its way into your doctor’s office, where your circadian health could be monitored just as quickly, easily, and accurately as a cholesterol test. Many drugs, for example, have optimal times for dosing, but the best time for you to take your blood pressure medicine or chemotherapy may differ from somebody else.

Previously, there was no clinically feasible way to measure this, but TimeSignature makes it possible for your doctor to do a simple blood test, analyze the activity of 41 genes, and recommend the time that would give you the most effective benefits. We also know that circadian misalignment — when your body’s clock is out of sync with the external time — is a treatable risk factor for cognitive decline; with TimeSignature, we could predict who is at risk, and potentially intervene to align their clocks.