What I learned about C++ STL vector::erase
In file structures this week we had an assignment to implement LZ77 compression using a custom bit by bit input/output class. This resulted in having to read/write individual bits, using boolean vectors. I would pull individual bits off of the front of the vector, decode them, and then delete them from the vector using erase like this:
someBooleanVector.erase(someBooleanVector.begin(), someBooleanVector.begin() + length);
This worked well for very small text files, but took FOREVER on larger text files (like the hamlet text file that was included with the assignment). After doing a ton of testing, I narrowed it down to the .erase function calls. Then, after googling some information about vectors, I learned that .erase doesn’t simply erase the first x amount of elements. What it does is clear the first x amount of elements, and then shifts all the remaining elements down to the front of the vector, then edits the vector.size() accordingly.
This was costing my program a ton of runtime. A program that was supposed to take less than a second to run, was taking 17 seconds to run. The reason was the shifting.
For example, at capacity, my boolean vector would have 32,768 elements. If I deleted one element from the vector, it would have to perform 32,767 shifts, costing my program a ton of runtime.
I simply fixed this by completely reversing the way I constructed the boolean vector, then using the much more efficient .back() and .pop_back(), which require no shifting, to access and delete records. This sped up my program from 17 seconds to half a second.
I hope reading this helps someone out there who may have runtime issues with vectors.
Have a good weekend.
