04/10/2020
One of the most basic tasks when manipulating images is to apply a certain operation to a subsection of an image. We might want to enhance a certain portion of an image so that it becomes hightlighted or apply an special effect to an specified region. The OpenCV library offers several ways of doing these operations, which I'm hoping to show with simple examples in this article.
To show these different kinds of operations, we'll make a simple program that changes a subsection of an image to it's negative counterpart. Wikipedia defines a negative image as such:
In photography, a negative is an image, usually on a strip or sheet of transparent plastic film, in which the lightest areas of the photographed subject appear darkest and the darkest areas appear lightest.
So the program needs to take an image and change it's light section to dark and vice-versa. We need to ask the user to specify two points in the image that defines the top left and lower right corners of a rectangular section in which we'll apply our operation. To make our life easier, we'll resize the image so it's 200 x 200 pixels and check for out of bounds access.
If you think about an image as a 2D array, the first way that comes to mind of manipulating the image is to use two for loops and change the pixels one by one. OpenCV offers two functions to access the pixels in an image:
These functions have several overloaded definitions so that we're able to return the correct pointer type for the different data types that cv::Mat can take. If you don't specify the type, a simple pointer to unsined char
will be returned, even if this is not the type your Mat contains and we'll need to be careful when doing pointer arithmetic.
With this in mind, let's do our first implementation of the program using one of these member functions.
#include <iostream>
#include <opencv2/opencv.hpp>
#include <cxxopts.hpp>
struct Config {
const std::string file_path; // filepath of the image
const std::pair<int, int> top_left; // top left x & y coordinates
const std::pair<int, int> lower_right; // lower right x & y coordinates
}
Config parse_cli(int argc, char* argv[]) { /* parse cli */ }
/*
* Naive way of doing it.
*/
int main(int argc, char* argv[]) {
auto config = parse_cli(argc, argv);
cv::Mat image = cv::imread(config.file_path, cv::IMREAD_GRAYSCALE);
// Resizing the image to 200x200
cv::Mat small_image;
cv::Size size { 200, 200 };
cv::resize(image, small_image, size);
// Renaming because Point2i has better attribute names than std::pair
cv::Point2i top_left { config.top_left.first, config.top_left.second };
cv::Point2i lower_right { config.lower_right.first, config.lower_right.second };
// Iterate through the region
for (int i = top_left.x; i < lower_right.x; i++) {
for (int j = top_left.y; j < lower_right.y; j++) {
uchar *value = small_image.ptr<uchar>(i, j);
*value = 255 - *value; // Do the 8-bit inversion to obtain the negative
}
}
cv::imshow("negative", small_image);
cv::waitKey();
}
With this implementation we can run our binary in the command line:
./regions chip-grayscale.jpg --tlx 50 --tly 55 --lrx 160 --lry 150
And we obtain the desired result:
The interesting part of the code snippet above is the two for loops:
// Iterate through the region
for (int i = top_left.x; i < lower_right.x; i++) {
for (int j = top_left.y; j < lower_right.y; j++) {
uchar *value = small_image.ptr<uchar>(i, j);
*value = 255 - *value; // Do the 8-bit inversion to obtain the negative
}
}
In this we're asking the library to calculate the offset from the start of the underlying data array where our pixel is located, then saving the result of 255 minus the value in that position. The problem with this implementation is that to calculate the offset, we need to do a multiplication to find the row and then a sum to find the column for every pixel in our region. This answer on stackoverflow explains this problem and gives a way around it.
Since we're applying the same operation in every pixel inside the region, we can use a broadcast operation inside the region of interest, which will make our code shorter and more legible.
OpenCV overrides the Mat::operator() so we can extract subregions of an image. This extracted subsection is not a copy of the values, only a copy of the header that points to the same underlying data array. In this way, operations on the extracted submatrix will be reflected on the original image.
/*
* ROI and broadcasting.
*/
int main(int argc, char* argv[]) {
auto config = parse_cli(argc, argv);
cv::Mat image = cv::imread(config.file_path, cv::IMREAD_GRAYSCALE);
// Resizing the image to 200x200
cv::Mat small_image;
cv::Size size { 200, 200 };
cv::resize(image, small_image, size);
// Renaming because Point2i has better attribute names than std::pair
cv::Point2i top_left { config.top_left.first, config.top_left.second };
cv::Point2i lower_right { config.lower_right.first, config.lower_right.second };
// A better way using region of interest
int height = lower_right.x - top_left.x;
int width = lower_right.y - top_left.y;
cv::Rect roi { top_left.x, top_left.y, width, height };
// Obtains a view of the section in which we're going to make the operation;
cv::Mat image_roi = small_image(roi);
// Applies a broadcasting operation
image_roi = 255 - image_roi;
cv::imshow("negative", small_image);
cv::waitKey();
}
If we run our binary with this implementation:
./regions chip-grayscale.jpg --tlx 20 --tly 20 --lrx 100 --lry 60
We get the desired results again:
Since a grayscale image with one channel is represented by 8 bits values. We can use bitwise operators to invert our image. If we invert the bits from each pixel, we get the same value as subtracting the intensity value from 255.
And so on...
With this in mind, we can rewrite our operation using the cv::bitwise_not operation.
/*
* ROI and bitwise operation.
*/
int main(int argc, char* argv[]) {
auto config = parse_cli(argc, argv);
cv::Mat image = cv::imread(config.file_path, cv::IMREAD_GRAYSCALE);
// Resizing the image to 200x200
cv::Mat small_image;
cv::Size size { 200, 200 };
cv::resize(image, small_image, size);
// Renaming because Point2i has better attribute names than std::pair
cv::Point2i top_left { config.top_left.first, config.top_left.second };
cv::Point2i lower_right { config.lower_right.first, config.lower_right.second };
// A better way using region of interest
int height = lower_right.x - top_left.x;
int width = lower_right.y - top_left.y;
cv::Rect roi { top_left.x, top_left.y, width, height };
// Obtains a view of the section in which we're going to make the operation;
cv::Mat image_roi = small_image(roi);
// Applies a bitwise_not operation
cv::bitwise_not(image_roi, image_roi);
cv::imshow("negative", small_image);
cv::waitKey();
}
If you run the resulting binary, you'll get the same results as before.
The last question remaining is if this refactoring is worth it in terms of performance. If the performance gain of using the bitwise operator is not substantial, we might opt to use a simpler implementation.
To test performance of the three scripts, I used the original image which has a size of 3632 x 5456 pixels and removed the constraint on the image being 200 x 200. After that I wrapped the operations for the three scripts with calls to std::chrono and ran the scripts to invert an subregion of 1000 x 1000 pixels.
auto begin = std::chrono::steady_clock::now();
// A better way using region of interest
int height = lower_right.x - top_left.x;
int width = lower_right.y - top_left.y;
cv::Rect roi { top_left.x, top_left.y, width, height };
// Obtains a view of the section in which we're going to make the operation;
cv::Mat image_roi = image(roi);
// Applies a broadcasting operation
cv::bitwise_not(image_roi, image_roi);
auto end = std::chrono::steady_clock::now();
std::cout << "Time elapsed (roi-bitwise): " <<
std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count()
<< "[us]" << std::endl;
These were the obtained results:
|Script | Time elapsed (g++ -O2) | | ---- | ---- | ---- | | for-loop | 1308 (+/- 77) [µs] | | roi-broadcasting | 67 (+/- 44) [µs] | | roi-bitwise | 45 (+/- 46) [µs] |
I've ran the scripts multiple times and the execution time fluctuate around these values. As we can see the for-loop version is an order of magnitude slower and we should avoid using it if there's a better option available.
Another example where region of interest operators might be useful is when creating a quadrant shift of an image where you shift the top left quadrant with the lower right and the top right with the lower left. An example is as follows:
At first sight, it might seem like we need to iterate through the half the image, calculate the new position for the pixel and then change places between them, but this can be achieved in a simpler way using one region of interest for each quadrant of the image and the copying the values between them. The implementation is given below:
int main(int argc, char* argv[]) {
auto file_path = parse_cli(argc, argv);
// Read the image
cv::Mat image = cv::imread(file_path, cv::IMREAD_GRAYSCALE);
cv::Mat squared_image;
cv::Size size { 200, 200 };
cv::resize(image, squared_image, size);
image = squared_image;
cv::imshow("before", image);
// Quadrants
cv::Rect top_left { 0, 0, image.rows / 2, image.cols / 2 };
cv::Rect lower_left { image.rows / 2, 0, image.rows / 2, image.cols / 2 };
cv::Rect top_right { 0, image.cols / 2, image.rows / 2, image.cols / 2 };
cv::Rect lower_right { image.rows / 2, image.cols / 2, image.rows / 2, image.cols / 2 };
// shifting quadrants
cv::Mat temp = cv::Mat::zeros(image.rows / 2, image.cols / 2, CV_8UC1);
// Changing top_left with lower_right
auto top_left_view = image(top_left);
auto lower_right_view = image(lower_right);
top_left_view.copyTo(temp);
lower_right_view.copyTo(top_left_view);
temp.copyTo(lower_right_view);
// Changing top_right with lower_left
auto top_right_view = image(top_right);
auto lower_left_view = image(lower_left);
top_right_view.copyTo(temp);
lower_left_view.copyTo(top_right_view);
temp.copyTo(lower_left_view);
cv::imshow("after shifting", image);
cv::waitKey();
}
After a few data copies, we obtain the desired quadrant shift. This is much more legible than if we were to calculate the new position for each pixel in a for loop.
The problem with this implementation is that it assumes that the image width and height are divisible by 2. A better implementation would check the size of the image and treat this accordingly.