Tutorial

Introduction

The National Olympiad in Informatics — Philippines is an informatics olympiad where contestants solve algorithmic tasks by writing correct and efficient programs.

Let’s try to break that down.

  • National means everyone in the Philippines can join this contest.
  • Informatics is the science of studying efficient methods for storing, processing, and retrieving data.
  • An olympiad is a major contest in a particular game, sport or scientific subject.

Like many scientific olympiads, the contestants (i.e. you!) are given a number of tasks to solve within a limited amount of time. In math olympiads, contestants are often given problems that look not unlike the following:

There are an infinite number of positive integers n such that 1 + 2 + 3 + \ldots + n is divisible by n. Find the 100th one.

These types of problems are all well and good, but a sufficiently determined contestant might elect to start bashing every case; he or she might write down every n starting from 1, check whether each n satisfies the condition by trial and error — and eventually arrive at the correct answer! In this case, the contestant did not even have to think at all, and missed out on the crucial insights that would have make the solution to this problem much more palatable.

The NOI.PH, as an informatics olympiad, is not all too different from these math competitions or olympiads; in that the tasks often require insights, and these insights have to be synthesized to produce a final answer. The crucial distinction is that, in the NOI.PH, your final answer is a program, not a number.

If the same problem were to be given in the NOI.PH, it might be phrased like so:

There are an infinite number of positive integers n such that 1 + 2 + 3 + \ldots + n is divisible by n. Find the \mathbf{K}th one.

Notice the critical difference: we are being asked to find the Kth such number instead of, say, the 100th one. As a contestant, your goal is to submit a program that takes in any integer K, satisfying certain constraints (more on that later), as input, and gives the correct answer for that particular K, as output. Your program might be given K=100, or K=1, or K=39, or some other number; and your program must always give the correct answer, and do so quickly enough.

By requiring contestants to submit a program instead of a number, the contestants solve the general case of the task and thus demonstrate their knowledge of the insights necessary to solve it! While a contestant might simply write a program that tries every possible n from 1 onwards to find the Kth one, this will run noticeably slower than someone who, say, has found a closed-form formula or derived a fast algorithm to solve the same. For large enough K, the solution that simply tries all different sums will eventually take very long, and so it will exceed the time limit and, like a wrong answer, will not be credited full marks. Of course, the beauty of the NOI.PH is that even suboptimal programs can sometimes score partial points — but we’ll cover the specifics of that later.

Your First Task

Now, let’s get our hands dirty, by solving an actual informatics task!

Note that this section assumes knowledge of basic programming, such as arrays and nested loops, and a little bit of C++ syntax. If you have no prior experience with these concepts or simply want a refresher, you are invited to attempt the C++ for Competitive Programming tutorial, which also serves as a softer and more comprehensive version of this guide.

We will be tackling the task Azalea, and we will go through the different parts of the task and attempt some solutions. Along the way, we will also discuss some features of the HackerRank platform in particular, on which the NOI.PH contests are hosted for the time being.

The Problem Statement

All tasks begin with a problem statement. The problem statement situates the task, and usually, although not necessarily, comes accompanied with a story. As a competitive programmer, it is your job to dissect the problem statement and figure out exactly what it is asking. Sometimes, this is easy; the statement is no-nonsense and gets straight to the point (e.g. A Simple Problem!). Other times, the statement can be hilariously obnoxious and riddled with unnecessary background (e.g. Meteor Garden Garden).

The problem statement for Azalea reads as follows:

A small quaint town, known for its annual springtime azalea festival, begins preparations for the new season. The organizers have arranged N azalea shrubs in a horizontal row; the i^\text{th} shrub, from the left, has A_i flowers, and is labelled i.

The festival lasts D days. In the i^\text{th} day, as per tradition, the number of flowers between shrubs labelled L_i to R_i, inclusive, are counted, and the townspeople gather together and prepare that many dishes for the festivities of that day.

After the festivities of each day, the local gardener comes and waters every shrub, and so a single new flower blooms in each one of them.

The festival has not yet begun, but the townspeople want to know how many dishes they will need to prepare for each day, so they can start sourcing for ingredients. Can you help them?

The statement is quite long, and contains quite a number of irrelevant details. Can we whittle down the length and express it in a more concise way?

A good place to start is to try to list down the events in point form, so we can get a clearer picture of what we have to do:

  1. There are N shrubs, with the shrub labelled i having A_i flowers.
  2. There are D days:
    • In the ith day, the number of dishes prepared is equal to the number of flowers between shrubs L_i to R_i, inclusive.
    • In the ith night, a gardener comes and waters all the shrubs, so they each gain one flower.
  3. We need to determine the number of dishes prepared for each of the D days.

Much better!

Remember that N, D, and all the A_i, L_i and R_i are variables and our program must produce the correct result regardless of the values of these numbers!

The Input and The Output

Now that we know what to do, let’s look at the input format and output format, and the sample input and sample output.

The input format reads as follows:

The first line contains a single integer T, the number of test cases to handle.

Each test case begins with two integers N and D on a single line, the number of azalea shrubs and the number of days the azalea festival lasts, respectively.

The next line contains N integers, A_1, A_2, A_3, \dots, A_N, the number of flowers in each shrub.

The next D lines each contain two integers. Specifically, the i^\text{th} line among these contains L_i and R_i, describing the range of shrubs whose flowers will be counted for the i^\text{th} day.

The output format reads as follows:

For each test case, output D integers, separated by spaces, on a single line. The i^\text{th} integer should contain the number of dishes the townspeople must prepare for the i^\text{th} day.

If it looks rather complicated, don’t despair; all it is really telling us is the format our program should receive the input, and the format our program should report the output. We will see in a short while what all of this means! For now, take special care for the output format; everything our program outputs is considered “output”, so if you output some debugging statements like “Please enter T” or print something like: “The answer is: ”, this will be considered as part of your answer, and because it does not satisfy the output format, it will be marked wrong, even if your answer is otherwise correct!

Some tasks, such as this one, contain multiple test cases, which basically means your program should solve the task multiple times, with different inputs. Each test case describes a separate instance of the task; however, your program must correctly process all test cases in a single input within the given time limit.

Let’s take a look at the sample input:

And the sample output:

The sample input gives an instance of some valid input that your program might be given, and the sample output gives a correct response to the given sample input. To see that, let’s recall the input format.

The first line of input contains a single integer T, which is the number of test cases; in this case, we know that T=2, so we have 2 test cases to handle.

Each test case begins with two integers, N and D; in this case, we know that the first test case has N=6 and D=3. The next line contains N=6 integers, which contains the values of A_1, A_2, A_3, \ldots, A_6. Then, the next D=3 lines contain two integers; the first line among these contains L_1, R_1, the second L_2, R_2, and the third L_3, R_3.

The remaining lines describe the second test case; in this case, we know that the second test case has N=10 and D=1. The next line contains N=10 integers, which contains the values of A_1, A_2, A_3, \ldots, A_{10}. Then, the next D=1 line contains two integers, L_1 and R_1.

Consider the first test case. We can, in fact, substitute the values of the variables into our condensed problem statement:

  1. There are 6 shrubs:
    • The 1st shrub has 1 flower.
    • The 2nd shrub has 1 flower.
    • The 3rd shrub has 2 flowers.
    • The 4th shrub has 1 flower.
    • The 5th shrub has 9 flowers.
    • The 6th shrub has 9 flowers.
  2. There are 3 days:
    1. In the 1st day:
      • The number of dishes prepared is equal to the number of flowers between shrubs 3 to 5, inclusive.
      • Afterwards, the gardener comes and waters all the shrubs, so they each gain one flower.
    2. In the 2nd day:
      • The number of dishes prepared is equal to the number of flowers between shrubs 1 to 6, inclusive.
      • Afterwards, the gardener comes and waters all the shrubs, so they each gain one flower.
    3. In the 3rd day:
      • The number of dishes prepared is equal to the number of flowers between shrubs 2 to 2, inclusive.
      • Afterwards, the gardener comes and waters all the shrubs, so they each gain one flower.
  3. We need to determine the number of dishes prepared for each of the 3 days.

The Explanation Section

The explanation of the sample input is usually given in the explanation section, and indeed, this is precisely what we get:

In the first test case, there are 6 shrubs, and the springtime azalea festival lasts for 3 days.

Before the festival begins, the shrubs have 1, 1, 2, 1, 9, and 9 flowers from left to right.

In the 1^\text{st} day, the flowers between shrubs 3 and 5, inclusive, are counted. There are 2 + 1 + 9 = 12 flowers, and so the townspeople prepare 12 dishes for the festivities of the 1^\text{st} day.

After the festivities of the 1^\text{st} day, the local gardener comes and waters each shrub. Now, the shrubs have 2, 2, 3, 2, 10, and 10 flowers from left to right.

In the 2^\text{nd} day, the flowers between shrubs 1 and 6, inclusive, are counted. There are 2 + 2 + 3 + 2 + 10 + 10 = 29 flowers, and so the townspeople prepare 29 dishes for the festivities of the 2^\text{nd} day.

After the festivities of the 2^\text{nd} day, the local gardener comes and waters each shrub. Now, the shrubs have 3, 3, 4, 3, 11, and 11 flowers from left to right.

In the 3^\text{rd} and final day, the flowers between shrubs 2 and 2, inclusive, are counted. There are 3 flowers, and so the townspeople prepare 3 dishes for the festivities of the 3^\text{rd} day.

After the festivities of the 3^\text{nd} day, the local gardener comes and waters each shrub. Now, the shrubs have 4, 4, 5, 4, 12, and 12 flowers from left to right.

The festivites are over, and we report the number of dishes the townspeople must prepare each day:

1. The townspeople must prepare 12 dishes on the 1^\text{st} day.
2. The townspeople must prepare 29 dishes on the 2^\text{nd} day.
3. The townspeople must prepare 3 dishes on the 3^\text{rd} day.

Similar analysis can be performed for the second test case.

The explanation situates the problem in terms of “numbers”, instead of variables. More precisely, it explains the sample input, and how we can derive the sample output if we were given the sample input. Of course, the reasoning given in the explanation is non-general; it applies only to the given sample input, and usually does not give many insights about how the problem can be solved in the general case.

It is good practice to look at the explanation and see that your understanding of the problem is correct; sometimes, you may encounter ambiguous statements, and you can rely on the explanation to see whether you have read the problem correctly, or if you have missed some important detail. In this case, it may not be as important, but in other problems you may find this explanation very much helpful.

The Constraints

Now that that’s clear, we can discuss the constraints:

1 \leq T \leq 5
1 \leq A_i \leq 1000
1 \leq L_i \leq R_i \leq N

Subtask 1 (24 points)
1 \leq N \leq 2000
1 \leq D \leq 2000

Subtask 2 (34 points)
1 \leq N \leq 100000
1 \leq D \leq 100000
The sum of |L_i- R_i| in a test case does not exceed 10^6.

Subtask 3 (42 points)
1 \leq N \leq 100000
1 \leq D \leq 100000

The constraints communicate to us the maximum and minimum possible values the variables in the problem statement can assume. For example, in this case, we have 1 \leq T \leq 5, which means that there will be at least 1 test case and at most 5 test cases. This makes sense, because our program cannot be expected to, say, handle T = 10^9 test cases, and still run within the time limit! Some constraints are straightforward; A_i between 1 and 1000, for example; while others are more cryptic, like “the sum of |L_i-R_i| in a test case does not exceed 10^6“. Sometimes, a cryptic constraint reveals a property that can be exploited to write an efficient enough solution.

Time Limit

Speaking of efficient solutions, note that all programs have a time limit within which they are expected to read all the input, process the input, write all the output, and terminate. It must be able to solve all the test cases in the input within the time limit. The time limit is standard for all NOI.PH problems, and, unless explicitly noted otherwise, are:

  • 2 seconds for C++;
  • 4 seconds for Java;
  • 10 seconds for Python.

The time limit is more relaxed for Java and Python, as these languages are inherently slower; however, generally speaking, if the algorithm itself is sub-optimal, it will fail regardless of which language you code it on! Remember that the problems in the NOI.PH are always algorithmic in nature, and code is simply our way of expressing our algorithm. By its nature, a more efficient algorithm almost always trumps a better implementation of a bad algorithm!

In general, we can perform about 10^9 “cheap” operations, such as addition or bitwise operations, within the time limit, and about 10^8 “expensive” operations, such as multiplication or division, within the same. Knowing exactly how much processing power you can squeeze from your program is an ability that is learned through practice, and you will eventually have a good intuition for it!

The Subtasks

Notice that there are subtasks, each with a varying number of points. While there are constraints that are common to all subtasks, such as 1 \leq A_i \leq 1000 or 1 \leq L_i \leq R_i \leq N, there are constraints that are imposed to some subtasks but not others; for example, subtask 1 imposes that N is only up to 2000, whereas subtasks 2 and 3 impose that N is up to 100000. This allows us to score partial points! For example, we might have a program that is correct and runs within the time limit if N is just 2000, but produces a wrong answer or exceeds the time limit if it is greater than 100000.

The Code Editor

That said, we can now attempt to solve this problem.

HackerRank has a built-in code editor, provided with each problem, where we can write our code. By default, it contains the following:

If you are just starting out, this can be very helpful! You do not need to download a compiler or an IDE just yet; for simpler tasks, you will find that the online code editor will suffice.

HackerRank provides a number of tools you can use to test your code. For example, hitting automatically compiles and runs the code against the sample input as described in the problem statement, and tells you whether your output is correct or not, or whether your code took too long, crashed, or could not even compile (in the last case, HackerRank also helpfully informs you of the error). You can also test against your own input by using the built-in function to Test against custom input. Using this feature, you can give your program any input, and HackerRank essentially serves as a compiler. Note that this option does not check the correctness of your output or even the validity of input; this means you can use the function to test your program against anything, just as you would a normal compiler.

When you are done with your solution, and you think that it is correct, you can simply hit and watch, in real time, the results as your program is tested against the input.

Submission 1

All that said, let’s see what happens when we code the most straightforward solution.

We hit , and… um, wow! It worked on the first try! (Note: this is rare and doesn’t usually happen, especially when starting out. You will normally encounter a number of compilation issues before you can get it to work. Learning how to debug is important!)

Congratulations, you passed the sample test case.

Click the Submit Code button to run your code against all the test cases.

Now, we hit to submit our code.

Submitted a few seconds ago
Score: 24.00
Status: Terminated due to timeout

✓ Test Case #0
✓ Test Case #1
⏱ Test Case #2
⏱ Test Case #3

Alas! Our code successfully solves the first subtask, scoring 24 points, but it fails for the other two, giving a “terminated due to timeout” verdict. “What gives? It worked on the sample test cases!” I hear you exclaim.

Submission 2

Let’s analyze how many operations our code takes. It is clear that the majority of the work is happening in the two nested loops; one looping i from 1 to D, and, within that, two loops; one looping from L_i to R_i and another looping from 1 to N.

We know that, for the second test case, N and D are both up to 100000. Therefore, even if we ignore the loop going from L_i to R_i, we already have 100000 \cdot 100000 = 10^{10} iterations; this is significantly greater than 10^9, and so we can wager that our program will probably not finish in time. This is even ignoring the fact that we have up to T=5 test cases in one input!

We will need to optimize our algorithm, by making some observations.

The bottleneck in our program is the loop from 1 to N. Can we somehow eliminate this loop?

Yes, we can! We need to notice that, because the gardener waters all the plants every day, at the ith day, each shrub simply has i-1 extra flowers than original.

The other insight is that we can count these extra flowers separately; there are R-L+1 shrubs between shrubs L and R, inclusive, and each of them has i-1 extra flowers. So, the total number of extra flowers is simply (R-L+1) \cdot (i-1), and we can simply add this to the total count of flowers!

We now have the following code:

How long does this run? At worst, we could still have L_i=1 and R_i=N for all i; then, we will still have 100000 \cdot 100000 =10^{10} iterations. But wait! Remember that the second subtask has an extra suspicious constraint — the sum of |L_i-R_i| is at most 10^6. Could this perhaps save us here?

A more nuanced count on the number of iterations our loop actually performs can be of help. Note that we loop from L_i to R_i, so we consider only R_i-L_i+1 values in the ith day. So, summing this over all days, the inner loop actually iterates (R_1-L_1+1)+(R_2-L_2+1)+(R_3-L_3+1)+\ldots+(R_D-L_D+1) times! If we note that R_i-L_i is equal to |L_i-R_i| because L_i \leq R_i, and that there are exactly D occurrences of 1 in this sum, then we can rewrite this as: |L_1-R_1|+|L_2-R_2|+|L_3-R_3|+\ldots+|L_D-R_D|+D. But the sum of all the |L_i-R_i| is at most 10^6, and D \leq 10^5 so the above sum is actually only at most 1.1 \cdot 10^6; that means, the inner loop only performs at most 1.1 \cdot 10^6 iterations across all days for each test case!

This analysis should convince us that this program will finish in time. And indeed, submitting it gives…

Submitted a few seconds ago
Score: 24.00
Status: Wrong Answer

✓ Test Case #0
✓ Test Case #1
✗ Test Case #2
⏱ Test Case #3

Submission 3

…wrong answer? What’s up with that? Surely the problem setter must have an incorrect model solution!

In fact, there are a number of very subtle mistakes in our code, which can be summed up in a word: overflow. In our code, we used the int data type, which in C++ can store any integer value between -2^{31} and 2^{31}-1, or slightly more than 2 \cdot 10^9. But what is the maximum possible answer?

It is not immediately obvious, but it can actually exceed this! If we have N=100000 and D=100000, and on the last day we have L_D=1 and L_D=N, then the total number of extra flowers across all shrubs, according to our previous formula, is N \cdot (D-1). That’s 100000 \cdot 99999, which is almost 10^{10}; thus, we get integer overflow! The problem setter must have been very sneaky, and, anticipating this, added this test case in!

To remedy this, we simply have to use a larger data type, such as long long int, which can store values between -2^{64} and 2^{64}-1, which is much larger than the largest possible answer. The details of this are more nuanced, and it is not enough to simply change the types of sum and ans in the above code; we must also change l and r, or perform a typecast when we do the multiplication (r-l+1)*(i-1). You are encouraged to peruse this to find out more details. (You need to register here first before you can access the link.)

Anyway, after fixing these bugs, we now have the following code:

Now, we smash that button and get:

Submitted a few seconds ago
Score: 58.00
Status: Terminated due to timeout

✓ Test Case #0
✓ Test Case #1
✓ Test Case #2
⏱ Test Case #3

Eureka! We’ve got the second subtask.

Submission 4+

The timeout on the third is rather expected; as we mentioned earlier, in the worst case, we could still have L_i=1 and R_i=N for all i, causing our program to require 100000 \cdot 100000=10^{10} iterations. The final subtask has no restrictions on |Ri-Li|, so we can’t perform the reductions earlier to show that this worst case is impossible. Indeed, the worst case is very much possible — and you can expect, as a contestant, that the problem setter has also added this worst case, or something similar, as one of the test cases for the final subtask.

So, are we doomed? Or is there hope for us yet?

Luckily, there is (and of course there is, because the problem setter must have some correct solution), and it involves a very clever trick.

Let’s create an array C of length N, where C_i=A_1+A_2+A_3+\ldots+A_i. We can generate this array very efficiently, using only one loop; we let C_1=A_1, then for all 2 \leq i \leq N, we can do C_i=C_i-1+A_i. Thus we can generate this array with only about N operations.

Now, suppose we want to find the sum of A_{L_i}+A_{L_i+1}+A_{L_i+2}+\ldots+A_{R_i}. Currently, we do that using a loop, which takes R_i-L_i+1 operations; that’s too slow! Notice, however, that the sum is equal to: A_1+A_2+A_3+\ldots+A_{R_i}-(A_1+A_2+A_3+\ldots+A_{L_i}-1), because all the terms A_1, A_2, A_3, \ldots, A_{L_i-1} cancel themselves out!

But then this is equal to C_{R_i}-C_{L_i-1}, using our earlier definition for C. And we already know how to generate C efficiently; not to mention, we only ever have to generate it once, at the beginning, because we’ve already reduced the problem to the case where the values in A never change. So, we can eliminate the nested loop entirely!

Actually coding this is left as an exercise for you; while you code it, take into consideration any possible corner cases or sources of bugs. Rest assured that it will be worth it; there is no joy just the same as seeing this:

Submitted a few seconds ago
Score: 100.00
Status: Accepted

✓ Test Case #0
✓ Test Case #1
✓ Test Case #2
✓ Test Case #3

Except maybe the smell of azalea flowers in spring. But that’s besides the point.

Conclusion

Hopefully, the in-depth analysis of this problem and the corresponding discussions have allowed you to develop a more comprehensive understanding of what the NOI.PH is, what it entails, the contest format and platform, the types of problems you will encounter, as well as the thought processes and the kinds of insights necessary to solve these types of tasks. If you want to learn more, or try your hand at solving some tasks from past NOI.PH contests, make sure to check out:

But you don’t have to be alone in your quest for greatness. There’s even a Discord chat room for Filipino competitive programming enthusiasts. Email [email protected] for more information.

We hope you found this whole tutorial useful, and we wish you the best of luck in the NOI.PH!