Sign In

Communications of the ACM

Research highlights

Technical Perspective: 3D Image Editing Made Easy

Numerous images appear each day on smartphone screens, computer displays, and printed materials. News articles and advertisements attract the attention of viewers by using appealing images. People often take photographs with their smartphones and immediately share them with hundreds of viewers via social media. Moreover, users include images in documents and presentations to communicate messages. These images are typically edited to appear more aesthetically pleasing or achieve other objectives. Thus, there is an increasing need for advanced image editing tools.

Image editing has significantly advanced over the past several years to address consumer demand. Color adjustment tools now more readily facilitate simpler image sharpening, softening, brightening, and darkening. Furthermore, advanced tools can automatically remove blur and noise. Users can easily cut an object in an image and paste it onto another background. Moreover, it is easy to readily deform shapes as needed and seamlessly blend two images. It is even possible to erase objects in a scene by removing the object and filling the hole by automatically synthesizing appropriate background images.

Despite these advancements, most image editing techniques are two-dimensional (2D). Three-dimensional (3D) image editing has been strongly desired; however, it is still difficult. Even the most advanced current image editing tools lack 3D editing capabilities. This is because it is necessary for a computer to infer the 3D structure of the scene for 3D editing, which remains a difficult, ill-posed problem. Inference of a meaningful 3D structure requires abundant knowledge about the physical world, which remains missing in current systems. Moreover, 3D editing is difficult for users because the user must provide 3D control information to a computer using 2D input devices, such as a touchpad or mouse. It is tedious and difficult for inexperienced users to specify the 3D shapes of an object in a scene and thereby manipulate the object.

Nonetheless, the importance of 3D editing is obvious. We exist in a 3D world, and 3D image editing opens countless possibilities. With the availability of 3D information, we can easily view objects from different angles and compose novel scenes by three dimensionally combining 3D objects. Even basic cutting and pasting of an object is not easy with purely 2D editing tools because the viewing angle changes if the object is moved. A 3D editing capability would make the cut-and-paste result much more convincing.

The authors of the following paper present an important step toward achieving 3D editing. To address this difficult problem, human perception and computational analysis are both required. Therefore, the authors devised an interaction technique, called 3-sweep, which is comprised of three simple mouse strokes. With this interaction, the user provides guidance for the computer to segment an object in the scene and simultaneously infer the 3D geometry of the object. The system then executes segmentation and 3D reconstruction by inferring details using image analysis methods. Using the 3D reconstruction results, the user can rotate the object to view it from multiple perspectives. In addition, the user can cut and paste the object into different scenes while preserving 3D consistency. Interested readers are strongly encouraged to watch the authors' impressive demonstration video (

As with most technologies, this 3D editing technology is not the only one of its kind; previous efforts exist. A notable one is the photo editing tool presented by Zheng et al.2 They similarly combine intuitive user interaction and computational analysis to facilitate 3D manipulation of objects in a photograph. However, their tool is specifically designed for cuboids and only works for objects made of boxes, rectangular plates, and square pillars. The present technology, on the other hand, can handle a significantly larger variety of objects with curved surfaces by introducing a sophisticated gestural interaction.

It should be noted, however, that the present technology currently supports only one class of primitive: generalized cylinders. This representation is highly versatile and covers many human-made objects; nonetheless, it is not sufficient for representing complicated shapes perceived in nature. Tools are needed to create a more diverse set of shape primitives to represent complicated shapes, which would enable manipulation of arbitrary objects viewed in photographs. One promising approach is to use a large collection of known 3D geometries.1 Nevertheless, the core concept presented in this paper—combining human perception and computational analysis using clever interactive design—is broad and applicable to the development of future tools. Rapid developments in 3D image editing tools inspired by this work can be foreseen, and it will become increasingly easier for even casual users to readily edit photographs. Images will no longer always depict unaltered reality. Is this good or bad? The answer may not be self-evident; however, such a future is surely coming.

Back to Top


1. Kholgade, N., Simon, T., Efros, A., and Sheikh, Y. 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Graph. 33, 4 (July 2014), Article 127.

2. Zheng, Y., Chen, X., Cheng, M.M., Zhou, K., Hu, S.M., Mitra, N.J. Interactive images: Cuboid proxies for smart image manipulation. ACM Trans. Graph. 31, 4 (2012), 99:1–99:11.

Back to Top


Takeo Igarashi is a professor in the Computer Science Department at the University of Tokyo, Japan.

Back to Top


To view the accompanying paper, visit

Copyright held by author.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Article Contents: