When your loop is unrolled, there is no branch. The code is one long string of spaghetti. HOWEVER, as the wiki article on unrolling points out, it HIDES this performance hit. Instead of sailing along at fucking warpspeed and then suddenly go "Deerrrrrrrp" for a moment (as either branch prediction fails and a cache miss happens, or when memory must be accessed for some other reason), then sailing along at warpspeed again-- it instead stays in Derrrrrrrp type speeds, because it is constantly hitting the memory bus.
This explanation doesn't feel right. First, how branch prediction actually works.
In computer science, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps (the eponymous "pipeline") performed by different processor units with different parts of instructions processed in parallel.
Loop unrolling, branch prediction and cache hit prediction are actually not the same thing, at all. Loop unrolling is about avoiding the instruction hit on each iteration, since you have to process the jump instruction. And, in a loop you need to keep track of the loop counter, which means you're really doing multiple extra instructions per loop: update the counter, test the counter, then jump if the test is true. If the test is false, that doesn't cause a memory page load, which is laggy. That's not what's happening.
How that intersects with branch prediction is that each time through the loop, the branch predictor will correctly predict that the loop will continue, except on the last iteration, in which case it will make the wrong call, thus have to flush the instruction pipeline. but, categorically, this is nothing to do with memory cache hits.
As for branch prediction, imagine a program as a restaurant with a chef and 19 assistants. If you know what you're making then you might make 10 souffles an hour. But, if the chef suddenly says to stop making souffles and make a lobster, then all 20 people's prep work needs to be thrown out and started from scratch: but because it's a pipeline, we throw out pre-prep for 19 souffles. We effectively lose 20 souffles (10 souffles worth of completed work and 10 souffles worth of people standing around waiting for the new prep to get up to them), because there was a surprise lobster in the mix.
CPUs do branch prediction, which means they store some data about what path was taken last time, and try and prep for that same result next time, so if last time we made souffle then we prep for another souffle. So CPUs read-ahead in the instruction queue and pre-process the instructions, in such a way that all parts of the CPU are doing something.
The absolute worst-case scenario therefore for an "if" statement is one which flip-flops between true and false. For example, if you loop through some numbers, and say if it's odd, do this, and if it's even, do that, that's going to royally fuck with the branch prediction. What you would do is unroll the loop into pairs, then do the odd value and even value on different lines. This would be like the chef telling the staff ahead of time that they always alternate between souffles and lobsters. But, unless there's a flip-flopping "if" statement inside your loop, the loop itself is only going to fail on branch prediction on the final iteration where it leaves the loop, not each time through the loop.