A Firsthand Account of the NOI.PH Onsite Experience

Introduction

Is it going to be your first time to go to the final round? Has it been one of your hashtag life goals to officially participate in the onsite round but haven’t gotten around to preparing yet? Here is a first-hand account of what happened during NOI.PH Onsite Round last year to let you know what you’re going in for. This is being posted with permission from CJ Quines, one of last year’s finalists, and the writer of this article. The article has been edited accordingly to be on a website. The full PDF in all its glory is on his website.

[latexpage]

NOI 2017 Reflections

The following is a narration of the events that occurred in the NOI 2017 finals. It’s very likely that the order I did the problems is inaccurate, because I’m writing this quite some time after the actual event. Abbreviated versions of the problem statements are in the appendix.

Practice round

Reminders for the participants were projected in front of everyone. It contained instructions on compiling using the terminal, which was pretty helpful. The computers were running in Ubuntu and I didn’t know how to compile in Ubuntu.

This was the first time I’ll be using Ubuntu, and I really wasn’t familiar with any Linux distro. But well, that was the purpose of the practice round, to get used to the environment.

Apparently we’re allowed to use Zeal, a program that provides offline documentation. Which was a big help, considering I don’t know much of the standard library and the language, even though I should. I could look up how wide an int is, or how to use printf to truncate decimals.

I took a few minutes acclimating to the system. I tried to build a file in C++. I checked which version of Python was on the system, and there was Python 2.7. There was Sublime Text 3, which was good since that’s the editor I used. I tried the build system, which worked, so no problems.

The practice round started a bit after that. It would be for two hours, and I spent the first few minutes looking through all the problems, and they all looked doable except the last. I read the first problem and I am puzzled after finding a simple solution: repeat the string ABC up to $n$ letters. I wondered if there was more to it. It worked, so I moved to the next problems.

The second problem invovled $\oplus$, bitwise addition. After thirty minutes of false starts and forgetting the size of $n$, I managed to get a wrong answer. Then I managed to find a simple solution: n i i, where i is an integer. At this point I realize that the two previous problems were math problems — was this a sign?

Problem three was a math problem involving pigeonhole. Thus, for the record, there were three math problems in the practice round. The closed-form is well-known among the math community: it is stars-and-bars. The problem was computing a bunch of binomial coefficients quickly. I’ve seen a trick in doing that before, using the magic of DP: store Pascal’s triangle in memory and call.

That was not that straightforward to implement, and I had a few false starts due to wrong indices. I did manage to do so, after roughly another thirty minutes since I solved the second problem. This leaves me with a round $300$ points, eighty minutes in the round.

There were forty minutes left and one problem to solve, and the remaining problem was so complicated that even the abbreviated version in this report took two paragraphs. After trying a few cases on paper, I quickly realize that solving it completely would be impossible in the remaining time, and focused on the first subtask. The first subtask was restricted enough that it was only the matter of finding the shortest path from left to right.

I did know that would involve a pathfinding algorithm. I thought of using a BFS. The trouble was, I did not like working with grids and graphs. I spent the last few minutes of the practice round working on reading the data, which I struggled with, because I kept messing up scanf. I gave up at about three minutes left. We counted down the last ten seconds and ended with applause.

Round 1

After the practice round was lunch, and after lunch was the start of the first round. It would be for five hours, starting from 1:30 PM and ending at 6:30 PM.

The first thing I did was read all the problems and try to summarize what each one was asking for. The first problem involved $\oplus$ once again, which probably explains why one of the practice round problems involved $\oplus$, as setup for this problem. The second problem was probably a math problem, counting the number of increasing sequences. I try a few cases by hand and get nothing.

The third problem was an algorithmic problem, involving ranged sums-of-maximums queries. Of course, after working out on paper, the maximum part isn’t as important — it’s a ranged query of $2^{n-1}a_n + 2^{n-2}a_{n-1} + \cdots + a_1$, once you sort it to get $a_n > a_{n-1} > \cdots > a_1$. Sorting was $O(n\lg n)$, and then $q$ queries made it $O(nq\lg n)$, which isn’t nearly fast enough for the last subtasks.

The fourth problem invovled a binary grid, which makes use of a very interesting property. Suppose we have an $(m-1) \times (n-1)$ binary grid. We add a column to the right of the grid, as the sum modulo two of each row, and add a row to the bottom of the grid, as the sum modulo two of each column. Then if we take the sum modulo two of the last column, and take the sum modulo two of the last row, they’re the same. It’s an interesting combinatorics exercise in invariants.

Without any intuition for which problems would be easier, I decided to work on the first problem first. Since the inputs become really large, there is probably an $O(1)$ or an $O(\lg n)$ solution, and since it looked like a math problem, I erred on the side of an $O(1)$ solution. After working out a few small cases on paper, I fail to see a pattern. Then I realized, I had a computer! I can use a program to bash small cases.

I wrote a short Python program to bash the first few cases. At first the program was only limited to spitting the answer for certain $M$ and $N$, but this was too slow. So I wrote a function that tried all $N \leq M$. But this was also too slow, so I wrote another function that tried all $M$ up to a certain limit, and stored the results in a triangle.

There were a few observations to be made. The $N = 1$ case is easy, and so is the $N = 2$ case. The $M$ is a power of two and one more than a power of two had a pattern. Usually, for a certain row, after a certain point, everything is just the same power of two over and over. And this power of two was the smallest one larger than $M$, let’s call it $2^k$. Besides that though, there didn’t seem to be any pattern I could exploit for an $O(1)$ solution. My biggest lead was figuring out when it became a power of two over and over, which could be a potential optimization.

After writing out a few cases, I make a conjecture that this happens when $M + N – 1 \geq 2^k$. At this point, I’ve spent an hour in the competition, so I decided to fold for the moment by writing a program for what I had. It would do the optimizations for the special cases and the $M + N – 1 \geq 2^k$ case, and otherwise bash.

While writing the program, I spend a lot of time figuring out how to compute $2^k$ given $M$. I first tried to do some clever stuff using bit manipulations, but after failing for five minutes to find something suitable I decided to just hardcode the powers of two, which would also be helpful later on for the $M$ as a power of two case. At first, I thought of trying a bunch of conditional statements, but then I realized that it would be easier to do a binary search. I mess up the binary search a few times, spending another ten minutes at something that should have been routine.

I finshed the program, and it got a wrong answer. Not even the first subtask, which was $M, N \leq 20$. This was roughly ninety minutes in the competition. I had to make a decision whether to debug or to proceed, and I made a decision to debug instead, because I was more likely to score points repairing a solution rather than making a new one. (This is something I learned from math contests. When running out of time and faced with the decision to review answers or to proceed, it is often better to review answers, because that holds more potential points.)

After racking my brain looking for typos, I still couldn’t spot the error. I decided to do a brief check and use the triangle I generated from my Python code to make the first twenty rows and submit that. It got the first subtask, so there wasn’t anything wrong with the judge, and the problem had to be my code. (I have good faith in the problem-setter. But I was debugging and under stress, so I had to make sure.)

If my Python code produced the correct answers for the first subtask, and the program failed to, then there must be a discrepancy with the answers that both produced. This was kind of expected: I never proved any of my observations, I just made guesses based on data. I decided to get my program again and print the first twenty rows to compare it with the Python output.

After printing, I noticed something: the conjecture that $M + N – 1 \geq 2^k$ implied the answer was $2^k$ is wrong, and this was the cause of almost all of the discrepancies. The conjecture that worked was $M+N \geq 2^k$. There was another discrepancy however, the case $M = N = 2$, and I could not understand how that was produced, so I decided to just hardcode the answer to that case.

I also noticed I could make an optimization for the bashing. Instead of checking all $MN$ cases, you only need to check the cases between $M$ and $2^k$ if they could be produced with a number between $1$ and $N$. In other words, you can already produce all the numbers $1 \leq x \leq M$ by doing $x \oplus 1 = x$, so you only need to check which numbers $M+1 \leq x \leq 2^k$ can be produced.

This can be done with a cool property of $\oplus$: $a \oplus b = c \iff a \oplus c = b$. If we were looking for $m \oplus n = x$, where $x$ is the value we want to check is possible, we only need to do $m \oplus x = n$, and check if $1 \leq n \leq N$. Although both algorithms are $O(MN)$, the second one is faster by a constant, and the first subtasks were clustered enough together that optimizations counted. I submit and it gets 37 points, not a lot, but fine.

I moved on to the second problem, and after trying out a few small cases I quickly realize that this is a combinatorics task. I try to list out the small cases, wasting a few minutes, especially since a lot of my listing was wrong. I managed to produce the first few values.

I fail to see a pattern, so I decided to try a different approach: treat it as a math problem that I would see in a contest. Let’s say there are $n$ numbers we have to permute. I make a few initial observations from my failed listings, the most important one being that $1$ is pretty restricted. It could only appear in the first or second slot in the permutation. If $1$ appeared in a later slot, then it had to be larger than either of the two previous slots, which is impossible.

Let’s call the number of permutations where $1$ goes first as $a_n$ and where $1$ goes second as $b_n$. If $1$ was in the first slot, how can we fill in the rest of the slots? The condition for the third slot is already fulfilled: it’s already larger than the first slot, $1$. So the second slot can be anything, and the third slot can be anything, provided that the fourth slot’s conditions were fulfilled, and so on. This was precisely the case for $n-1$! So $a_n = a_{n-1} + b_{n-1}$.

Now, if $1$ appeared in the second slot, then we can just ignore the first slot. We have a permutation of $n-1$ where $1$ is the first. There are $b_{n-1}$ ways to do this. And then there are $n-1$ ways to pick the number in the first slot. Thus you had $b_n = (n-1)b_{n-1}$. I worked out a few cases by hand and it matched my small cases from earlier.

That was great — with both recursions, we could compute for any $a_n + b_n$. I decided to code up this solution, using two arrays, and it got all the subtasks, hooray. After coding this solution I then realize that you could make it one recursion instead. The $(b_n)$ were basically the factorials, and you could do it with only one array. But the point was that the problem was solved, so I moved to the next problem.

As an aside, my write-up for this might make it look like solving the second problem was a straightforward process. It wasn’t — it took another hour. At this point, I spent the first two hours at the first task and another hour at the second task, putting me three hours in the competition.

With roughly two hours left, I had several choices of things to do, and it was hard to decide which to do next. Optimize task 1, try task 3, try task 4.

The advice I heard was “maximize points in the remaining time”, which was easier said than done. I felt I couldn’t get any more potential points for task 1 without spending a lot of time, but tasks 3 and 4 seemed easier to get at least a few points. I tried task 3 first, because it was before task 4.

I already worked out earlier on paper that the answer was just the powers of two times the sorted list, and that this was $O(nq\lg n)$. I try to think of a more clever solution. Since it involved ranged queries, the first thought was to try a segtree, and figure out how to propagate.

This was not easy, and I failed to do so. The problem was that segtrees should have parts that build up from smaller cases, but the power of two times the sorted subarray did not build up easily. I couldn’t think of a good way to solve the problem. The $O(nq \lg n)$ was enough to grab about a third of the points, I think, so I tried coding that first.

That was also problematic. I had to repeatedly look up how to use sort, since I wasn’t familiar with that. I needed the powers of two, and computing them each time was slow. I realized that I already had the powers of two hardcoded from my solution to task 1, so I just grabbed that. After that, I think I managed to get $53$ points, and I had an hour left.

I did not manage to do a lot in the last hour. The tasks I could do included: optimize task 1, optimize task 3, try task 4. After trying task 4, I then quickly realized that it was not easy.

I managed to come up with a way to count the minimum number of entries that need to be changed, which is a complicated thing involving counting the number of ones in each row and column. This happened after trying lots and lots of small cases, which took up thirty minutes of my time.

The hard part was computing the number of ways to obtain a good grid. After doing a lot of combinatorial stuff, there was an $O(1)$ solution that was very very long and used a lot of factorials and was, to be brief, hard to compute.

I tried to code brute force instead for task 4, but after failing to read the input after fifteen minutes, I decided to give up. I spent the remaining time trying the first task.

As the time was about to close, I used up another ten minutes or so trying out the first task by looking for patterns in the binary expansion. I did not manage to find a usable pattern. I ended up with 190 points, placing me at third place for round 1.

Round 2

The numbering will be as in the appendix, so the problems in round 2 are tasks five through eight.

Again, I first read all the problems to get a general sense of the round today. It was lucky for me that the problems in the previous day were largely mathematical in nature, in contrast to algorithmic problems that are usual in competitive programming. (This was not good for other contestants, however. Robin was particularly against this. He makes the case in his blog that this should not happen in programming contests, and it is a thought-provoking read.) The second day, however, seemed to be all algorithmic problems.

The fifth task would’ve been easier, if it was considering subsequences instead of subarrays. Indeed, this was the mistake I made in my first reading. The sum of the second smallest terms in each subsequence can be found simply by sorting and multiplying each term by the appropriate factor, $O(n \lg n)$. But this was subarrays, not subsequences.

The sixth task seemed relatively straightfoward and it seemed like it can be done in $O(n)$ for reading the input. The seventh task seemed like a straight MST (Minimum spanning tree, a classic algorithmic problem.) at first, though I had to read it twice. I found it hard to believe that a classical problem would be given out, so I decided to take a closer look later.

The eighth task was a tiling problem, and it would make a good combinatorics problem. However, the programming version is significantly harder, because you have to read the input and output the tiling correctly, which was quite complicated. I decided I probably wouldn’t solve it.

Once again, I tried the tasks in order. I knew early on that the sixth task was the easiest, but I would be leaving it for the middle of the competition when my concentration is least. I did the fifth task first. I spent a good amount of time trying to figure out how to do it quickly.

I spent a good deal of time trying out the problem on paper, and lots of small cases. It felt like a DP problem, and if it was a DP problem, then it should be able to be done with a processing from left to right. Adding a new element on the end of a list with $n-1$ elements gives you $n$ new subarrays, all of them ending with the newly added element. If you sort them one-by-one, that’s $O(n \lg n)$. You have $n$ new subarrays, and you do this $n$ times, so that’s $O(n^3 \lg n)$, quite slow.

I somehow had the strong urge that sweeping from left-to-right would be best. I tried more and more small cases on paper, looking for a pattern. I briefly thought of using something like a divide-and-conquer, a segtree-style solution. Once you have the segtree set up in $O(n \lg n)$, you have $O(n^2)$ queries and $O(\lg n)$ time for each query, making $O(n^2 \lg n)$. But I was not good with segtrees.

I went back to my previous solution — there was a lot of redundancy in sorting the subarrays, because there was a huge overlap between them. Right, overlap! You only need the second smallest term for each subarray, but each new subarray is the same as a previous one, with a new element on the end.

Thus you didn’t need to sort the whole array over again. You just need to keep in memory the two smallest elements in the subarray starting from each element. Then you’d only compare this new term to the two previous smallest elements, which is $O(1)$, because it’s two comparisons worst-case.

Why is it two comparisons worst-case? Say the two previous smallest elements in the subarray were $x \leq y$, and you’re adding $z$ to the subarray. If $z \leq x$, then the new smallest elements are $(z, x)$. If $z \geq y$, then you retain the smallest elements: $(x, y)$. Finally, if neither, then it’s $(x, z)$. Two comparisons, instead of sorting the whole subarray, which cut $O(n \lg n)$ to $O(1)$.

That bought the time down to $O(n^2)$: $n$ subarrays for each added element, and you add an element $n$ times. I couldn’t reduce it any further, so I typed up my solution. That got me 45 points, after some debugging, and the next subtask seemed too difficult, so I moved on to the next problem.

Even though the next problem was the easiest one throughout the whole competition, I spent nearly an hour and a half trying to solve it. The sixth task was a reference that I got: it was referring to Breaking Bad, and it was rather witty.

My solution, basically, involved keeping track of the previous coordinate and the current coordinate, as well as the current “mode”. It could be in horizontal mode, or vertical mode, or neither. Basically, if it was in horizontal mode and the next vertex did not lie in the same horizontal line, you have to make a turn, ditto for vertical mode.

After coding my solution, and debugging it, it still got a wrong answer. Not even the first subtask. At this point I’ve spent nearly forty minutes coding and debugging, and trying to find what was wrong, so I made the mistake of hanging on to my code and creating test case after test case.

I tried to remember some advice I heard about debugging. Create tricky test cases, with small inputs. After writing several of these, I managed to find the error, but I didn’t know how to fix it. I simply decided that my solution was rubbish and that there was a simpler way to proceed.

Instead of keeping track of modes, you keep track of directions. This was a far simpler idea, and simpler ideas tend to work better. You kept the direction undecided for the first few points, until you hit a horizontal or vertical point. Then after that you had to alternate direction, and you simply incremented each time you had to alter.

That solution worked, and got all the points. I’ve used roughly three hours at this point.

Two hours left, and there were several things I could do: solve task seven, optimize task five, solve task eight, in order of ease. Task eight was placed last because I had bad experience with two-dimensional input and output. Optimizing task five would be hard, because the next subtask was such a big leap.

That left me with trying task seven, the MST variant. At first, I thought it was a straight MST — to minimize the average, you needed to minimize the sum, right? That was what it seemed like based on the sample case. So I tried to implement an MST algorithm, and build from there.

However, I’ve never implemented an MST algorithm before. I knew Kruskal’s, but I’ve never tried programming it. The idea was to keep all the edges sorted by weight from lightest to heaviest, and put in the edges one-by-one. If any new edge connected two vertices that were already in the tree, then discard.

Easier said than done, however. I ended up taking a lot of time for the edge routine, because I couldn’t figure out how to use priority_queue. I also got tired typing edge.second.first and edge.second.second over and over again. After a few iterations however, I got a seemingly working Kruskal algorithm.

I did not get any subtasks however. Then I realized that the way I was printing the floats was inaccurate, and it took me a few mintues of looking up printf before I could figure out what was wrong. Then I still got no points, and that was when I decided to take a break and think about the problem in the break room.

I was pacing around in the break room, trying to figure out what was wrong. Why wouldn’t the MST contain the smallest average edge weight? What wouldn’t make a minimal average spanning graph contain the MST?

After thinking it through and going through a few examples in my head, I realized what was wrong: you can add lighter edges to the MST to make the average smaller. It took me a few minutes before making that realization, but once I got that, I went back to the contest room and started working on paper.

Were there cases with a smaller average edge weight than the MST? That was impossible — either it contains the MST or it’s not the smallest average, by contradiction. I work out a few test cases on paper, and try to work it out.

For Kruskal’s, instead of throwing away an edge that would connect two vertices in the tree, you put it in another queue. And then when you make your MST, you push edges from that queue as long as it makes the average smaller.

Once again, easier said than done. The rest of the time was spent working on this problem, trying to fix my routines. I was implementing the graph using vectors, and I wasn’t well-versed with vectors at all.

Eventually there were two minutes left and I was still debugging my program. I couldn’t figure out what was wrong. I tried all the sample cases and the test cases I could think of and I still couldn’t spot the error. Eventually, I do see it: there’s a mistake with one of my subroutines, but there was not enough time.

I ended up with 145 points, which was just fine. That gave me a total of 335 points, making me sixth place, though I would only find that out later that evening. I had to leave early as I had class the next day.

Reflections

I had problems of inefficiency during the contest. Excluding the fact that I didn’t notice the bitonicity for task 3, which few people did anyway, there was a lot of room for improvement, and much of that could be done relatively quickly.

For example, the fact that I didn’t know how to implement Kruskal’s made me lose a hundred points in the second round. I also had to refer to the documentation a lot, and I was only then figuring out how to do several things. It would be better to come to a programming competition more prepared, so I could focus on solving the problems.

Poor debugging skills were also something I needed to work on. I spent a lot of time debugging, which I could’ve spent figuring out algorithms and implementing them. This was something I’ve noticed before but didn’t give much attention to.

I did enjoy the competition a lot, and it was a wonderful learning experience to program for five hours straight for two days. It was very exciting, and I made a lot of new friends. I had a really fun time and I want to join again next year.

Acknowledgements

Thanks firstly to the organizers and volunteers who made the NOI possible. It’s a huge effort, and I really appreciate the time they spend working on the NOI, without getting anything in return. Thanks to Pointwest for providing the venue which was really nice. Thanks to VCSMS as always for being supportive. And finally, thanks to the other competitors who were good company throughout the contest.