-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathparams.json
1 lines (1 loc) · 21.1 KB
/
params.json
1
{"name":"Fractal-mosaics","tagline":"Photomosaics 3.0","body":"# [UNDER CONSTRUCTION]\r\n\r\n[framework link](http://s-ben.github.io/Fractal-Mosaics/log_polar_registration_framework.html)\r\n\r\n# Abstract\r\n\r\nThe Fractal Mosaic algorithm is a rotation, scale, and translation invariant photomosaic algorithm. While traditional photomosaics create a larger image out of smaller square images on a rectangular grid, Fractal Mosaics creates a larger image out of rectangular images that can be placed at arbitrary locations, can be scaled to any size, and can be rotated by any angle. \r\n\r\nThis is accomplished with a combination of image search (i.e. image retrieval) algorithms and an image registration algorithm developed with NASA funding by George Wolberg and Siavash Zokai at the City College of New York ([pdf](http://www.researchgate.net/publication/2830220_Robust_Image_Registration_Using_Log-Polar_Transform/file/d912f51116dedc8a30.pdf)).\r\n\r\n![Alt text](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/eye_mosaic_flickr_800pix.jpg?login=s-ben&token=d045d2a20e53d9009700d6829157669f)\r\nFigure 1: Fractal Mosaic created with graffiti image set (best viewed [large](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/eye_mosaic_flickr_large.jpg?login=s-ben&token=87b09134a5f66d314547e419299a65b7))\r\n\r\nFigure 2 shows three traditional photomosaics created with [AndreaMosaic](http://www.andreaplanet.com/andreamosaic/)using the same image set and target image. \r\n\r\n![Alt text](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/traditional_mosaic_different_resolutions_800pix.png?login=s-ben&token=5dc052c2ee602a1b7bfd08d525e9d00a)\r\nFigure 2: Traditional Photomosaics using large tiles (left), medium sized tiles (middle), and small tiles (right)\r\n\r\nAs can be seen, using large tiles (left), finer details are lost. Using small tiles (right), we capture target image details, but can’t see individual image detail. Using medium size tiles (middle), we get a compromise between the two. Fractal Mosaics aims to create mosaics that capture both target image details and individual image details, in an aesthetically interesting way.\r\n\r\nMany people have, of course, created photomosaic variants that accomplish this general goal in different ways. Too many to list here. Some of my favorite photomosaic pioneers that I’m aware of: [Tsevis](http://www.flickr.com/photos/tsevis/sets/72157594536252686/), [Village9991](http://www.flickr.com/photos/village9991/sets/72157603327275992/), and [StellaMe](http://www.flickr.com/photos/stellame/). If I’m unaware of your work, contact me!\r\n\r\nBy allowing for arbitrary placement, rotation, and scaling of the “library images” (images making up the mosaic), there are an many more possible matches to evaluate. This allows for larger, often more aesthetically interesting matches than possible with traditional photomosaics. However, searching such a large solution space on a personal computer presents a challenging problem. This “paper” presents my solution to this problem (two years worth of unpaid work...). It explains the algorithm in depth for those wishing to hack the Matlab code. It should also aid those using the code to create mosaics, as understanding how the algorithm works will help tweak parameters that shape mosaic output.\r\n\r\n# Introduction / Problem Statement\r\n\r\nIn a traditional photomosaic, the number of possible matches is large but still easily manageable on a modern PC. The number of possible matches is simply the number of “tiles” (square areas in the target image) times the number of library images. For each tile, a similarity metric is calculated (root mean squared (rms) error, typically) for every library image. The library image with the smallest rms error is placed in the mosaic.\r\n\r\nIn a Fractal Mosaic, the number of possible possible matches is effectively infinite. For example, let’s say we use 5,000 library images, a 3072 x 1024 pixel target image, and do the following fairly rough quantization of the possible translations, rotations, and scalings: \r\n\r\n* 2 degree rotations (180 possible rotations)\r\n* Every fifth pixel is a possible placement location (629,145 possible translations)\r\n* 50 possible scaling values\r\n\r\nThis gives us (180 rotations) x (629,145 translations) x (50 scalings) x (5,000 library images) = 28,311,525,000,000 possible matches to search. Or ~28 trillion matches. Obviously, a brute force search is not feasible (I tried).\r\n\r\nFractal Mosaics tackles this complexity with three main strategies. One strategy is inherent in the “Log-polar” image registration algorithm at the core of Fractal Mosaics. This algorithm uses a coarse-to-fine multi-resolution framework, whereby estimates computed in lower resolution images are used as initial guesses in higher resolution images. An in-depth explanation of how log polar registration is implemented in Fractal Mosaics is given in the “log-polar registration” section. \r\n\r\nThe second main strategy is to use an image search algorithm to filter out library images that are statistically unlikely to produce a match. Details on this can be found in the ‘Image Search (Retrieval) Filtering’ section. \r\n\r\nThe third main strategy for reducing the number of matches evaluated is to look for large matches first, then not look for smaller matches “under” those matches (i.e., don’t look for matches in areas where we’ve already placed an image). This saves considerable computation. However, it will cause the algorithm to miss aesthetically better matches under the placed matches. For this reason, Fractal Mosaics can be configured to also look for alternate matches under placed images. All matches can optionally be output to PNG files that can be loaded into Photoshop for later compositing. \r\n\r\nThe following sections will walk you through the main pieces of the Fractal Mosaic algorithm. \r\n\r\n* Image Registration\r\n* Image Search (Retrieval)\r\n* Mosaic Rendering\r\n* Conclusions / Future Directions\r\n\r\n\r\n# Image Registration\r\n\r\nImage registration is generally defined as the lining up of images misaligned due to rotation, scale, and position (translation). The field of image registration is mature, constituting of several decades of research. The key insight of Fractal Mosaics is that image registration algorithms can be employed to efficiently search the photomosaic solution space. Instead of evaluating every possible rotation, scaling, and translation for every library image (computationally impossible), we can use image registration to efficiently estimate these parameters. Once these parameters are estimated, they are used to rotate, scale, and translate the library image in question to best match the target image. A similarity metric (rms error in this case) is applied. If the rotated, scaled, and translated library image scores high enough on the similarity metric, it is placed in the mosaic. \r\n\r\n## Registration Algorithm Choice\r\n\r\nAfter an initial survey of the image registration literature, a frequency domain image registration algorithm was evaluated; specifically, Fourier-Mellin-based image registration, as implemented in this [Matlab code](http://www.mathworks.com/matlabcentral/fileexchange/3000-fourier-mellin-based-image-registration-with-gui). This would have been much more computationally efficient than the spatial domain algorithm eventually chosen, as rotation, scale, and translation are efficiently recovered in one step. However, this technique proved not robust enough for the photomosaic application. This is likely because it relies more heavily on the assumption made by all image registration algorithms: that the two images being matched are of the same object/scene. In photomosaics however, we are matching parts of the target image to random library images. The assumption that the library images are rotated, scaled, and translated versions of the target image is weak. \r\n\r\nAn image registration algorithm that operates in the spatial domain was tried; specifically, the “Log-polar registration” algorithm found in the paper “Robust Image Registration Using Log-Polar Transform” ([pdf](http://www.researchgate.net/publication/2830220_Robust_Image_Registration_Using_Log-Polar_Transform/file/d912f51116dedc8a30.pdf)). This algorithm proved very robust. And most importantly, it met the main aesthetic criteria: for any two images presented to the algorithm, no matter how dissimilar, it aligned them the way a human would. To test this claim out for yourself, check out this [video showing the algorithm at work](http://www.youtube.com/watch?v=fw7deJSLVE4) matching library images to a target image. \r\n\r\nThe downside to Log-polar registration is increased computation. Because Log-polar registration only recovers rotation and scale, to recover translation, it must be applied to every location (pixel) in the target image. The location resulting in the lowest rms error is then chosen as the estimate. This creates a lot of extra computation. To mitigate this, the authors of the algorithm use a coarse-to-fine multi-resolution framework, whereby estimates computed in lower resolution images are used as initial guesses in higher resolution images. The following section details the implementation of Log-polar registration in Fractal Mosaics, which deviates somewhat from the original implementation. \r\n\r\n## Log Polar Image Registration\r\n\r\nIn “Robust Image Registration Using Log-Polar Transform”, the authors lay out a “two module” approach. First, a “Log-polar registration” module estimates rotation, scale, and translation. Then, a second module uses a nonlinear least squares optimization method to refine this estimation and obtain sub-pixel accuracy. Since Fractal Mosacis does not require sub-pixel accuracy, and the log polar registration module was found to be accurate enough for this application, just the log-polar registration module is implemented.\r\n\r\n### Polar Coordinates\r\n\r\nTo understand Log-polar registration, we briefly review [polar coordinates](http://en.wikipedia.org/wiki/Polar_coordinate_system). \r\n\r\nWhen transforming from Cartesian to polar coordinates, each pixel location (x,y) is represented as a radius and angle (r,a), where r is the distance from the center of the image (xc,yc), and a is the angle in degrees from the x-axis.\r\n\r\n![equation](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/radius_eq_polar.gif?login=s-ben&token=05e5efcd7cad361572bbf941f86ad1f2)\r\n\r\n![equation](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/angle_equation_polar.gif?login=s-ben&token=ee4778d991bfe42ea68e377ed9c59f5f)\r\n\r\nFigure 3 illustrates the polar transform using an image of Abraham Lincoln. Here we see a crop of the Lincoln image (a), the Lincoln image rotated by 90 deg (c), and the log-polar transforms of both ((b) and (d)).\r\n\r\n![Alt text](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/Polar_transform_500pix.png?login=s-ben&token=2297c64c8a54efd5b9fe4e7f3a62f311)\r\n\r\nAs can be seen, in the polar space, the 90 degree rotation manifests as a circular shift along the x-axis of the polar image (i.e. the angle axis). If we take the cross-correlation of the polar transformed images ((b) and (d)), the maximum of the cross-correlation gives us the rotation. However, scale cannot be recovered from the cross-correlation.\r\n\r\n### Log-polar Transform\r\n\r\nHere’s where we take advantage of some math magic. According to the product rule of logarithms, \r\n\r\n![equation](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/log_general_equation.gif?login=s-ben&token=daccda0e863fce8c925880b950d70545)\r\n\r\nAssuming we scale an image up by a factor of 3, in the polar space, this is equivalent to multiplying the radius values by 3.\r\n\r\n![equation](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/polar_scaling_eq.gif?login=s-ben&token=20de8f9d649563130ea241a3573fd5b6)\r\n\r\nIf we apply the log function to the radius values (i.e. the Log-polar transform),\r\n\r\n![equation](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/log_polar_scale_eq.gif?login=s-ben&token=cc3546e0e40606f5ffbcff3586f42c92)\r\n\r\nScale now manifests as a scalar phase shift (log(3)) in the radius axis (i.e. x-axis). Rotation still manifests as a phase shift in the angle axis (i.e. y-axis), as it did in the straight polar transform. We can now recover both scale and rotation from the cross-correlation of the log-polar transformed images. Magic!\r\n\r\nFigure 4 shows a crop of the Lincoln image (a), the Lincoln image rotated by 45 deg and scaled up by a factor of 2 (c), and the log-polar transforms of both ((b) and (d)). \r\n\r\n![Alt text](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/LogPolar_transform_500pix.png?login=s-ben&token=e72f28a8e8855f7484d116120e03fe1b)\r\n\r\nAs can be seen, the 45 deg rotation shows up as a circular shift along the x-axis (i.e. angle axis) and the 2X scaling shows up as a phase shift along the y-axis (i.e. angle axis).\r\n\r\n\r\n### Rotation and Scale Estimation\r\n\r\nTo estimate rotation and scale, we take the cross-correlation of the log-polar transformed images ((b) and (d) in Figure 4). The maximum of the cross-correlation (dark red peak) corresponds to rotation and scale (equations for calculating rotation and scale from peak can be found in the code).\r\n\r\n![Alt text](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/Lincoln_cc.png?login=s-ben&token=22024bc7d3e71e2a15e9fb2d16e9e80b)\r\n\r\nFor computational efficiency, the cross-correlation is computed in the Fourier domain using 2D FFTs,\r\n\r\n![equation](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/cc_eq.gif?login=s-ben&token=dac60d00b66a927d84272a207994b202)\r\n\r\n,where L1 and L2 are the log-polar transformed library and target images, respectively.\r\n\r\n### Recovering Translation\r\n\r\nAs mentioned previously, Log-polar registration does not recover translation. It assumes the two images presented to it line up, except for rotation and scale differences. To recover translation, the algorithm must be applied to every location (pixel). The translation resulting in the best match (i.e. lowest rms error) is then chosen as the translation estimate. This makes log-polar registration fairly computationally expensive compared to other registration techniques.\r\n\r\nTo reduce computation, the authors of log-polar registration implement a coarse-to-fine multi-resolution framework, whereby estimates computed in lower resolution images are used as initial guesses in higher resolution images. This framework was tweaked in several places to apply it to Fractal Mosaics, as outlined below in the ‘Image Registration Algorithm Steps’ section.\r\n\r\n\r\n# Image Search (Retrieval) Filtering\r\n\r\nAn image search (i.e. retrieval) algorithm is used to filter out statistically unlikely matches before the computationally intensive image registration step (i.e., don’t waste cycles trying to match a white part of the target image with an image of the night sky). Specifically, Fractal Mosaics performs Content-Based Image Retrieval ([CBIR](http://en.wikipedia.org/wiki/Content-based_image_retrieval), similar to Google’s [search by image](http://support.google.com/images/answer/1325808?hl=en) feature. The “search image” (the part of the target image we’re looking to match), is compared to every library image using a similarity metric. Then only the most similar library images are evaluated using image registration.\r\n\r\nNumerous similarity metrics were evaluated, including histograms, standard deviation, Hu moments, General Fourier Descriptors (GFD), and Fourier Mellin (FM) shape descriptors. The metrics which performed best were histograms, GFDs, and standard deviation. These metrics were combined to create a custom similarity metric that slightly outperformed any individual similarity metric. This metric is currently only used in black and white mode, as I ran out of time to extend it to color. For color, a relatively simple yet effective similarity metric (histogram distance in the HSV color space) is used. Implementation details for black and white and color retrieval are given in the sections below.\r\n\r\nOnce the similarity metric is calculated for all library images, the images are sorted by similarity. Only the images with the highest similarity scores are sent to the image registration step. The number of library images sent to the image registration step is controlled by the ‘num_lib_test_pix_v’ variable, as explained in the Variables page. \r\n\r\n## Black and White\r\n\r\nFor black and white mosaics, a custom similarity metric was developed. This metric combines three commonly used similarity metrics: histogram distance, General Fourier Descriptor (GFD) distance, and difference in standard deviation. For a given pixel location, the histogram, GFD, and standard deviation of the “search image”, or target tile (piece of the target image we’re matching) are calculated. These are then compared to the histogram, GFD, and standard deviation of every library image, which have been precomputed. For the histogram and GFD, euclidean distance is computed. For standard deviation, the absolute difference is used. \r\n\r\nThe histogram distance, GFD distance, and standard deviation difference are then input into respective Probability Density Functions (PDFs), which were calculated empirically using 430,500 pseudo-random samples of the solution space. The sum of the output of these PDFs is the “Match PDF”. \r\n\r\n![equation](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/Match_PDF.gif?login=s-ben&token=43c084e1129c2e6a5fd517bd3410cfa3)\r\n\r\nAll library images are then sorted by their Match PDF scores, and only the images with the highest probability of being a match are sent to the image registration step.\r\n\r\n## Color\r\n\r\nWhen creating a color mosaic, the similarity metric is the distance between 3D histograms in the HSV color space. The code used to implement this (and implementation details) can be found [here](http://www.mathworks.com/matlabcentral/fileexchange/22030-image-retrieval-query-by-example-demo). Thanks to [Theodoros Giannakopoulos](http://www.mathworks.com/matlabcentral/fileexchange/authors/30223) for sharing!\r\n\r\n# Mosaic Rendering\r\n\r\nThe main Fractal Mosaic script (mosaic_registration....m) renders a version of the mosaic as it goes. This mosaic image is stored in the ‘Render_target’ variable, which is written to ‘fractal_mosaic_eye.jpg’ when the script completes. If the script is interrupted (Ctrll+C), the mosaic “thus far”, can be viewed from the Matlab command line:\r\n```\r\n>>imshow(Render_target)\r\n```\r\nAlternately, the mosaic can be rendered after the fact using the ‘fractal_mosaic_render.m’ script. This gives the user the ability to play with different rendering parameters. Notably, you can set:\r\n\r\n* Whether the images will be placed on top of the target image or on a black background.\r\n* How much blending to do between images. Blending is currently done with a simple linear gradient. The user sets a variable, ‘perc_sloped’, which sets the percentage of the image that will be blended. Figure 12 shows a blending window that uses the default, 25%.\r\n\r\n![Alt text](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/blending_window.png?login=s-ben&token=a0227807701da25bdbebd0b297376baf)\r\n\r\n* Whether to use color images or not. For color mosaics, this is typically a no brainer. However, you can render a black and white mosaic using the color versions of the black and white images used for matching. This can often lead to interesting, accidental results, as illustrated in Figure 13.\r\n\r\n![Alt text](https://raw.github.com/s-ben/Fractal-Mosaics/gh-pages/images/graffiti_SF_color_render_ex.png?login=s-ben&token=7dd819d3aeae99b287ad3ae610342b0e)\r\n\r\n# Conclusion / Next Steps\r\n\r\nConsiderable effort has been put into automating the entire mosaic creation process, and making beautiful mosaics. However, Fractal Mosaics is not programmed to make artistic or conceptual choices on individual images. Therefore, its most promising use in some use cases will be generating input into a another process (e.g. a human using Photoshop or other software). \r\n\r\nConsiderable effort has been put into optimizing the code, both on the algorithmic and code level. However, it still takes a considerable amount of time to run. Promising optimization strategies include translation from Matlab to another language such as C/C++, accelerating key functions on the GPU (graphics card), and cloud computing (i.e. parallelizing processing across many computers in the cloud). \r\n\r\nConsiderable effort has been put into creating art with the code. However, Fractal Mosaics constitutes a new tool. One that exposes many new possibilities. These possibilities, as with any new art form, will take time to flesh out. It is my hope that by putting this code out there, artists will take up this challenge and create amazing art the likes of which the world has never seen before. Have at it!","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."}