Jake Liguori is a junior studying Computer Science and Economics at Clemson University. He will graduate in May 2022 and is interested in a career and further education involving data science, artificial intelligence, and analytics.
ColorNet 2.0: Image Segmentation for Brand Color Correction in Video
Michelle Mayer, Dr. Erica Walker, Jake Liguori, and Hudson Smith, Clemson University
A commonly used technique in image processing is image segmentation, partitioning an image into multiple parts or regions based on the characteristics of pixels in the image. For example, image segmentation could involve separating foreground from background, or grouping pixels based on the presence of objects or on similar textures, colors, or brightness (What Is Image Segmentation? 3 things you need to know, n.d.). Segmentation can be defined as a pixel-level classification, where instead of labeling a whole image as a certain object, each individual pixel is labeled or classified into a distinct category, Figure 1.
Figure 1. An example segmentation where pixels labeled as vehicles are colored red, buildings are yellow, etc. (Palac, B, 2020).
Applications of image segmentation are widespread. Medical professionals use segmentation for highly accurate labeling of medical scans (Havaei, M. et al., 2017). Each pixel in the image that corresponds to a tumor can be labeled with a different color to help the doctor better identify the tumor’s exact shape. Segmentation is also an important component of autonomous vehicle systems. Numerous cameras and other sensors continually collect data that self-driving cars use to make decisions (Xu, H. et al., 2017). It is extremely important that the machine learning model be highly accurate and have low latency to detect, identify, and respond correctly to objects that appear in the video feed whether they are road markings, detour signs, or pedestrians.
The ColorNet research team at Clemson University has recently developed a novel application of segmentation: color correcting live video feeds (Mayes, E. et al., 2020). One of the most significant issues with the current approach to color management in sports broadcasting is that color adjustments made to the frame impact all pixels. As a result, adjusting the RGB values to make a team’s jerseys the appropriate brand color specification can negatively impact players’ skin tones or the opposing team’s brand colors (Walker, E.B. et al., 2020). The ideal solution would allow the technician to select certain regions of pixels on the screen for color adjustment.
ColorNet 2.0 is a segmentation model that allows the user to input a target color and segment out portions of the screen containing pixels similar to the target color, Figure 2. There is no specific object to segment, but instead a chosen part of the color spectrum. During implementation, the model receives an uncorrected image and a target color then assigns each pixel in the original image a probability of being the brand color. The probabilities are then thresholded, and the network outputs a pixel mask that labels which pixels in the input frame correspond to the target brand color. Then, the technician can adjust the color of the targeted pixels without impacting any surrounding colors.
Figure 2. Segmentation masks for selecting Clemson orange (middle) and Clemson purple (right).
The ColorNet 2.0 model builds upon ColorNet 1.5 which successfully used regression to predict the correct RGB values of pixels and perform automatic color correction on sequences of video frames. ColorNet 1.5 demonstrated the correction of Clemson Orange and Clemson Purple, but the model requires more training data to correct for each additional color. In addition, the technician did not have the opportunity to manually tweak the corrections with this version. ColorNet 2.0 fixes these limitations by introducing segmentation to create a model that can correct any specified team color and integration for technicians to target and manually adjust brand colors during broadcast. Augmentation during model training synthetically expands the diversity of the training dataset to include all ACC team colors, Figure 3.
Figure 3. Augmentation results from shifting Clemson orange to appear blue.
To correct an image for multiple colors, one frame can be passed through the model several times, each time with a different target color. Therefore, it is no longer necessary to create separate models for each color and multiple colors can be corrected to different targets simultaneously. This manuscript provides an overview of the ColorNet 2.0 neural network architecture, model evaluation metrics, and performance.
Havaei, M., Davy, A., Warde-Farley, D. , Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P., & Larochelle, H. (2017). Brain tumor segmentation with Deep Neural Networks. Medical Image Analysis (Vol. 35). Retrieved October 18, 2021 from https://www.sciencedirect.com/science/article/abs/pii/S1361841516300330?via%3Dihub
Mayes, E., Lineberger, J.P., Mayer, M., Sanborn, A., Smith, H.D., & Walker, E.B. (2020). Automated Brand Color Accuracy for Real Time Video. Proceedings of the 2020 NAB Broadcast Engineering and Information Technology Conference. NAB Broadcast Engineering and Information Technology (BEIT) Conference, Las Vegas, NV. Retrieved October 18, 2021, from https://nabpilot.org/beitc-proceedings/2020/automated-brand-color-accuracy-for-real-time-video/
Palac, B. (2020). CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0. Retrieved October 18, 2021 from https://commons.wikimedia.org/wiki/File:Image_segmentation.png
Walker, E.B., Smith, H.D., Mayes, E., Lineberger, J.P., Mayer, M., & Sanborn, A. (2020). Consistent Display of Clemson Brand Colors Using Artificial Intelligence. Technical Association of the Graphic Arts (TAGA) Conference, June 2020, Oklahoma City, OK. Retrieved October 18, 2021
What Is Image Segmentation? 3 things you need to know. (n.d.). Retrieved October 18, 2021 from https://www.mathworks.com/discovery/image-segmentation.html
Xu, H., Gao, Y., Yu, F., & Darrell, T. (2017). End-to-end Learning of Driving Models from Large-scale Video Datasets. Computer Vision Foundation. Retrieved October 18, 2021 from https://openaccess.thecvf.com/content_cvpr_2017/papers/Xu_End-To-End_Learning_of_CVPR_2017_paper.pdf