--- layout: default ---

# 自己写一个PRISMA

## "让两张图片融合起来"

Posted by Xz Yao on September 27, 2016

### 原理

PRISMA工作在一种叫做卷积神经网络的理论之上，论文可以按此：A Neural Algorithm of Artistic Style，我们的这个项目根据的是这篇论文在torch上的一个实现，作者也将其开源在了Github上了：Neural Style。我们将其安装在我们的系统上，在做一些简单的操作就可以完成类似PRISMA的操作了。

### 硬件配置

PRISMA所用的卷积神经网络(CNN)通常都对计算机的性能有着较高的要求，在科研和工业环境中，通常需要使用较高配置的显卡来进行基于CUDA的运算才可以在较快的时间内完成。Neural Style的作者也提供了对CUDA的支持，因此有一块较好的显卡是比较推荐的配置。

### 安装

Neural Style的作者提供了安装文档，然而，还是会经常遇到一些问题。推荐的安装流程如下（以Ubuntu为例）：

#### 升级GCC

GCC 5是必备的组件之一。最初我使用gcc 4.8和gcc 4.9都失败了，这是特别坑的一点，只有使用gcc 5以上的版本才可以正常编译。

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-5 g++-5

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5


#### 安装Torch及依赖

cd ~/
curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch
./install.sh


#### 安装loadcaffe

loadcaffe 可以在Torch中加载Caffe的网络，也是一个经常使用的库。它依赖Google的Protocol Buffer Library，所以要先安装它们

sudo apt-get install libprotobuf-dev protobuf-compiler


luarocks install loadcaffe


#### 安装 Neural-Style

cd ~/
git clone https://github.com/jcjohnson/neural-style.git
cd neural-style


sh models/download_models.sh


### 使用

th neural_style.lua -style_image <image.jpg> -content_image <image.jpg>


Options:

• -image_size: Maximum side length (in pixels) of of the generated image. Default is 512.

• -style_blend_weights: The weight for blending the style of multiple style images, as a comma-separated list, such as -style_blend_weights 3,7. By default all style images are equally weighted.

• -gpu: Zero-indexed ID of the GPU to use; for CPU mode set -gpu to -1.

Optimization options:

• -content_weight: How much to weight the content reconstruction term. Default is 5e0.

• -style_weight: How much to weight the style reconstruction term. Default is 1e2.

• -tv_weight: Weight of total-variation (TV) regularization; this helps to smooth the image. Default is 1e-3. Set to 0 to disable TV regularization.

• -num_iterations: Default is 1000.

• -init: Method for generating the generated image; one of random or image. Default is random which uses a noise initialization as in the paper; image initializes with the content image.

• -optimizer: The optimization algorithm to use; either lbfgs or adam; default is lbfgs. L-BFGS tends to give better results, but uses more memory. Switching to ADAM will reduce memory usage; when using ADAM you will probably need to play with other parameters to get good results, especially the style weight, content weight, and learning rate; you may also want to normalize gradients when using ADAM.

• -learning_rate: Learning rate to use with the ADAM optimizer. Default is 1e1.

• -normalize_gradients: If this flag is present, style and content gradients from each layer will be L1 normalized. Idea from andersbll/neural_artistic_style.

Output options:

• -output_image: Name of the output image. Default is out.png.

• -print_iter: Print progress every print_iter iterations. Set to 0 to disable printing.

• -save_iter: Save the image every save_iter iterations. Set to 0 to disable saving intermediate results.

Layer options:

• -content_layers: Comma-separated list of layer names to use for content reconstruction. Default is relu4_2.

• -style_layers: Comma-separated list of layer names to use for style reconstruction. Default is relu1_1,relu2_1,relu3_1,relu4_1,relu5_1.

Other options:

• -style_scale: Scale at which to extract features from the style image. Default is 1.0.

• -original_colors: If you set this to 1, then the output image will keep the colors of the content image.

• -proto_file: Path to the deploy.txt file for the VGG Caffe model.

• -model_file: Path to the .caffemodel file for the VGG Caffe model. Default is the original VGG-19 model; you can also try the normalized VGG-19 model used in the paper.

• -pooling: The type of pooling layers to use; one of max or avg. Default is max. The VGG-19 models uses max pooling layers, but the paper mentions that replacing these layers with average pooling layers can improve the results. I haven’t been able to get good results using average pooling, but the option is here.

• -backend: nn, cudnn, or clnn. Default is nn. cudnn requires cudnn.torch and may reduce memory usage. clnn requires cltorch and clnn

• -cudnn_autotune: When using the cuDNN backend, pass this flag to use the built-in cuDNN autotuner to select the best convolution algorithms for your architecture. This will make the first iteration a bit slower and can take a bit more memory, but may significantly speed up the cuDNN backend.

Content Image:

Style Image:

50次迭代:

100次迭代:

200次迭代: