This blog post will explore the feasibility of using Content-aware Fill in After Effects 2019 to remove on-screen texts in videos.

Content-aware Fill is a new feature in After Effects 2019 which enables users to remove unwanted objects from a video. As stated in Adobe’s User Guide, “this feature is temporally aware, so it automatically removes a selected area and analyzes frames over time to synthesize new pixels from other frames.” This is a feature of great potential for video localization, especially when the source files are not accessible to the localizers. Therefore, I want to look into the viability of using Content-aware Fill to remove on-screen texts from videos.

How to remove on-screen text with Content-aware Fill

First of all, since I cannot find detailed tutorials on how to use Content-aware Fill on the internet, I would like to start with the steps I took to mask on-screen texts and cover them with Content-aware Fill.

  1. Import the video to After Effects
  2. Draw a mask using Pen Tool
  3. Right click on the mask in panel to track mask
  4. A Tracker will pop up to track the mask frame by frame automatically (manual adjustments needed)
  5. Go to the frame in which the object/mask shows up for the first time and add a keyframe in Position
  6. Move the mask out of the frame
  7. Go the next frame and add a keyframe in Position
  8. Go to Animation panel and click Toggle Hold Keyframe (so that the mask will not gradually move out of the frame)
  9. Go to the frame in which the object/ mask shows up for the last time and repeat step 5-8
  10. Drag the Work Area bar to cover the part of the video where you want to cover with Content-Aware Fill
  11. Before generating Fill Layer, you can create a reference frame first in Content-Aware Fill panel (A Photoshop file will pop up and the reference frame will be added to the composition automatically)
  12. Next, adjust the Alpha Expansion, Fill Method and Range in Content-Aware Fill panel
  13. Hit Generate Fill Layer
  14. The Fill Layer will be added to the composition automatically

I tried to remove on-screen texts from Nike’s commercial “Fastest Ever” using this approach. Here are the original video and the one without the on-screen texts:

Clip from Nike’s ad “Fastest Ever”
Nike’s ad edited via Content-aware Fill in After Effects

As demonstrated in the videos, the experiment on using Content-aware Fill to remove on-screen texts from these videos generates subpar results. The footages created by Content-aware Fill are mostly distorted. Therefore, the selected areas could not blend in the videos seamlessly.

The potential issues with this approach

By examining the AE projects closely, I identified several potential issues:

  • The objects I want to mask are moving too swiftly in the videos: When viewing the demo videos on AE Content-aware Fill I found online, I noticed that they all have one thing in common: the objects being masked and replaced with Content-aware Fill tend to move slowly in the footages, and the elements surrounding the objects are relatively still from frame to frame. However, in Nike’s advertisement, both the objects and the surrounding elements are moving swiftly. This might diminish the program’s ability to analyze the frames and synthesize new pixels from other frames.
  • On-screen text expansion leads to distortion in the selected areas: Another thing that the demo videos have in common is that the objects being masked tend to stay in the same shape, therefore the editors only need to make some small adjustments to the masks to cover the objects. Nevertheless, the text expansion in Nike’s ad distorts the newly-generated blocks in the video, which is the reason that the masked areas could not blend seamlessly with the surrounding elements.

A helpful trick

Despite the fact that AE Content-aware Fill didn’t do well on removing on-screen texts from videos, I did find a useful tip that might improve the quality of Content-aware Fill:

  • Create a reference frame before generating Fill Layer: I noticed that the new pixels generated by Content-aware Fill can fit into the background better if a reference frame is created beforehand. The effect is especially significant if the reference frame and the frames containing the selected object are alike. To illustrate the importance of creating a reference frame before generating Fill Layer, I generated two clips with Content-aware Fill. One is created with reference frame; the other is created without the reference frame. Here are the results of this experiment:
Clip from Mercedes-Benz’s commercial
 
Mercedes-Benz “Chicken”_Edited with Content-aware Fill in After Effects (created without reference frame)
Mercedes-Benz “Chicken”_Edited with Content-aware Fill in After Effects (created with reference frame)

As you can see, the background of the video with reference frame is more consistent and less glitchy, which is why people should really generate reference frames before using Content-aware Fill in AE.

Why it is not recommended to use Content-aware Fill in video localization

To sum up, after trying to remove on-screen texts from the two videos with Content-aware Fill, I would not recommend using this approach in video localization due to the following reasons:

  • Content-aware Fill doesn’t work well with text expansion and moving background: As mentioned above, the new pixels tend to be distorted if the selected objects expand in the video or the background is moving quickly.
  • This approach is very time-consuming: Removing on-screen texts with Content-aware Fill is a lot of work. One has to mask the object properly, create key frames to move the mask around, track the mask and create reference frames before generating Content-aware Fill. Hence, it is not worth it if this method only works well on videos with simple background, as the localizers can simply create PSDs to mask the on-screen texts.

In short, Content-aware Fill in After Effects is definitely a tool with great potential for video localization. However, with its limitations and inconsistent performance, it might not be a very useful feature for video localization at this moment.