1. 程式人生 > 實用技巧 >小時轉換為機器學習特徵_使用機器學習將您的影象轉換為蒸氣波或其他藝術風格...

小時轉換為機器學習特徵_使用機器學習將您的影象轉換為蒸氣波或其他藝術風格...

小時轉換為機器學習特徵

TL;DR: This article walks through the mechanism of a popular machine learning algorithm called neural style transfer (NST), which is able to convert any image of your choice to your favorite artistic style. The algorithm is a direct application of the famed convolutional neural network and dexterously frame the problem into optimizing for two loss terms. With its succinct formulation, the algorithm offers a straightforward way to come up with your own implementation of a fun image converter (Think of DeepArt or Prisma). Any topic in deep learning is vast and this article only walks over briefly the NST algorithm. While its sequels will deal with the implementation quirkies and some other interesting application of the algorithm, for now let us get some intuition behind the algorithm and have fun playing with it.

TL; DR:本文介紹了一種流行的機器學習演算法(稱為神經樣式轉換(NST))的機制,該演算法能夠將您選擇的任何影象轉換為您喜歡的藝術風格。 該演算法是著名的卷積神經網路的直接應用,並且巧妙地將問題構造為針對兩個損失項進行優化。 憑藉簡潔的公式,該演算法提供了一種直接的方法來提出您自己的有趣影象轉換器的實現(DeepArt或Prisma的思考)。 深度學習中的任何主題都非常廣泛,本文僅簡要介紹NST演算法。 雖然其續集將討論演算法的實現古怪和其他有趣的應用,但現在讓我們對演算法有所瞭解,並樂在其中。

問題設定 (Problem Setup)

Our goal is clear: make an image S adopt the style of another image T. At this point, this goal might sound a little bit too high-level and you may have legit questions such as how we represent an image in a neural network and how we quantify style, which will be duly answered in the following sections.

我們的目標很明確:使影象S採納另一幅影象T的樣式。在這一點上,這個目標聽起來可能有點過高,並且您可能會遇到一些合法問題,例如我們如何在神經網路中表示影象以及我們如何量化樣式,將在以下各節中適當回答。

數值表示 (Numerical Representation)

Simply put, an image is represented as a tensor, which can be thought of as a generalization of a matrix. For example, a colored image of size 512*512 will be represented as a tensor(3-D matrix) of size 512*512*3, the number 3 coming from the fact that any color can be encoded as a tuple of R-G-B values, each ranging from 0 to 255. This matrix will be used as the input to the algorithm later.

簡而言之,影象表示為張量,可以將其視為矩陣的一般化。 例如,大小為512 * 512的彩色影象將表示為大小為512 * 512 * 3的張量(3-D矩陣),數字3來自任何顏色都可以編碼為RGB值元組的事實,每個範圍從0到255。此矩陣稍後將用作演算法的輸入。

卷積神經網路基礎 (Convolutional Neural Network Basics)

Since the algorithm builds on the convolutional neural network (CNN) architecture, it is helpful to clarify some points about it beforehand.

由於該演算法建立在卷積神經網路(CNN)體系結構上,因此事先弄清有關它的幾點是有幫助的。

Two of the most important building blocks in CNN that pertain to our task are the convolutional layer and the pooling layer. We will look at the inner workings of the convolutional layer first.

CNN中與我們的任務有關的兩個最重要的構建塊是卷積層和池化層。 我們將首先看一下卷積層的內部工作原理。

So how do we go from the input layer to the first convolutional layer? Let us look at the following illustration:

那麼我們如何從輸入層轉到第一個卷積層呢? 讓我們看下圖:

Image for post

In the illustration above, the sub-matrix of size 3*3*3 (only two dimensions are shown here for easy illustration) at the top-left corner will go through a filter of the same size, which transforms the sub-matrix by applying the convolution operation to it, the result of which becoming the activation of the neuron at the top-left corner of the convolutional layer.

在上面的插圖中,左上角大小為3 * 3 * 3的子矩陣(此處為方便說明僅顯示了兩個維度)將通過相同大小的過濾器,該子矩陣通過向其應用卷積運算,其結果成為在卷積層左上角的神經元啟用。

But what is a filter? For now you can understand it as a way to identify certain features that can describe an image: right angles, curvatures, textures etc. It does so by convolving itself with the input sub-matrix. For a fuller treatment, please follow the pointers in this Wikipedia page.

但是什麼是過濾器? 現在,您可以將其理解為識別可以描述影象的某些特徵的方法:直角,曲率,紋理等。它可以通過與輸入子矩陣進行卷積來實現。 要獲得更全面的處理,請按照Wikipedia頁面上的指示進行操作。

Sliding the filter by one pixel to the right, we have the following:

將濾鏡向右滑動一個畫素,可以得到以下內容:

Image for post

The filter will be applied to every 3*3*3 sub-matrix and we will have our first complete convolutional layer.

該濾鏡將應用於每個3 * 3 * 3子矩陣,我們將擁有第一個完整的卷積層。

You can check that if we use a filter of size 3*3*3 to an input matrix of size 6*6*3 as above, then the resulting convolutional layer will be of size 4*4*1. However, generally, the number of filters are more than 1, which means we might want to apply several different filters in order to convert our input matrix into the first convolutional layer. Imagine that we stack the resulting matrices of size 4*4*1 from 4 different filters on top of each other; we will end up with a convolutional layer of size 4*4*4, which in turns becomes input to the next layer, be it another convolutional one or pooling one. Note that in the computer vision jargon, each of the filter output of size 4*4*1 can be called a feature map; therefore here we have 4 feature maps.

您可以檢查是否如上所述對大小為6 * 6 * 3的輸入矩陣使用大小為3 * 3 * 3的過濾器,則結果卷積層的大小將為4 * 4 * 1。 但是,通常,濾波器的數量大於1,這意味著我們可能想應用幾個不同的濾波器,以便將輸入矩陣轉換為第一卷積層。 想象一下,我們將來自4個不同過濾器的大小為4 * 4 * 1的矩陣彼此堆疊。 我們將得到一個大小為4 * 4 * 4的卷積層,該卷積依次成為下一層的輸入,無論是另一卷積層還是池化一層。 注意,在計算機視覺術語中,大小為4 * 4 * 1的每個過濾器輸出都可以稱為特徵圖。 因此,這裡有4個功能圖。

The mechanism of the pooling layer can be understood as dimensionality reduction. In the illustration below, the effect of the pooling layer is to reduce any sub-matrix of size 2*2 to 1*1 in the next layer. Popular methods to do this down-sampling include taking the max or the average.

池化層的機制可以理解為降維。 在下圖中,池化層的作用是將下一層中大小為2 * 2的任何子矩陣減小為1 * 1。 進行這種下采樣的流行方法包括採用最大值或平均值。

Image for post

Generally speaking, the architecture of CNN will include alternations of convolutional layers and pooling layers. An example of a typical architecture is provided below:

一般來說,CNN的體系結構將包括卷積層和池化層的交替。 下面提供了典型架構的示例:

Image for post
here 這裡採用的VGG19架構,通常用於神經樣式轉換

神經樣式轉移(NST)演算法(The Neural Style Transfer (NST) Algorithm)

With the fundamentals cleared, let us get down into the details of the algorithm.

清除基礎知識後,讓我們深入瞭解演算法的細節。

NST makes use of the VGG19 neural network illustrated above, excluding the three full-connection layers at the right end, which has been pre-trained to perform object recognition using the ImageNet dataset. VGG19 is shipped with popular deep learning frameworks such as PyTorch and TensorFlow so you don’t need to actually implement it yourself.

NST利用了上面說明的VGG19神經網路,但不包括右端的三個全連線層,這些層已經過預先訓練,可以使用ImageNet資料集執行物件識別。 VGG19隨附了流行的深度學習框架,例如PyTorch和TensorFlow,因此您不需要自己真正實現它。

Let us review what we have so far. We have the pre-trained VGG19 CNN at our disposal, one image matrix S to be converted to the style of the image matrix T, as well as the intermediate image matrix S’ at each intermediate layer of the network (the initial value of S’ can be set to white noise, or simply to S).

讓我們回顧一下到目前為止。 我們有預先訓練VGG19 CNN在我們的處置,一個影象矩陣S轉換為影象矩陣T的風格,以及作為中間影象矩陣S”在網路的每個中間層(S的初始值可以被設定為白噪聲,或簡單地S)。

The next step is to formulate the whole problem into optimization tasks. The NST breaks it into the minimization of the sum of two loss functions, the content loss and the style loss. Let us dive in now.

下一步是將整個問題制定為優化任務。 NST將其分為兩個損失函式(內容損失和樣式損失)之和最小化。 現在讓我們潛入。

Image for post
Photo by Jonathan Formento on Unsplash
Jonathan FormentoUnsplash上的 照片

Intuitively, the content loss quantifies the distance between the intermediate image we have at a certain layer and the content image. So at each layer l, we denote the current state of S’ as x, and the original image as p, and further we have the feature maps of x, denoted as F, and those of p, denoted P. The content loss at layer l is thus simply:

直觀地講,內容損失量化了我們在某一層上擁有的中間影象與內容影象之間的距離。 因此,在每層l處,我們將S'的當前狀態表示為x ,將原始影象表示為p ,並且進一步,將x的特徵圖表示為F ,將p的特徵圖表示為P。 因此,第l層的內容損失很簡單:

Image for post
At layer l, we sum over the squared errors between F and P, looping over i for all feature maps, and j for all cells in a given feature map
在第l層,我們對F和P之間的平方誤差求和,對給定特徵圖中的所有特徵圖遍歷i,對所有像元遍歷j。

And to get the total content loss, simply sum over the terms for all layers.

為了獲得總的內容損失,只需對所有層的條款求和即可。

Let us now look at the style loss. Here the style can be loosely defined as the correlation between different feature maps. For our intermediate image x, let us define G:

現在讓我們看一下樣式損失。 在這裡,可以將樣式寬鬆地定義為不同特徵圖之間的相關性。 對於中間影象x ,讓我們定義G

Image for post
This term denotes the correlation between two feature maps i and j at layer l. The summation is over all feature map cells.
該術語表示第l層的兩個特徵圖i和j之間的相關性。 該求和遍及所有特徵影象元。

Similarly, we can define a matrix A for the original content image p. Then our style loss at layer l can be defined as:

類似地,我們可以為原始內容影象p定義矩陣A。 然後,我們在l層的樣式損失可以定義為:

Image for post
N and M represent the number of feature maps and the size of any given feature map at layer l
N和M表示要素圖的數量以及層1上任何給定要素圖的大小

Summing across all layers, we have the total style loss:

總結所有層,我們總會損失樣式:

Image for post

In practice, the weight terms w above can be set to be equal for all layers: 1/(number of layers), or you can make more refined decision according to the original paper.

實際上,可以將所有層的權重w設定為相等:1 /(層數),或者您可以根據原始論文做出更精細的決策。

Ultimately, the total loss function is the weighted sum of the content loss and the style loss: a*L(content) + b*L(style), to be minimized with regards to F(i,j) at each layer using your favorite optimizer. The final F matrix will be your result image.

最終,總損失函式是內容損失和樣式損失的加權和: a * L(content)+ b * L(style) ,使用您喜歡的優化器將每層的F(i,j)最小化。 最終的F矩陣將成為您的結果影象。

結果 (Results)

PyTorch has provided some sample code for neural style transfer that is very easy to follow and experiment with. Please refer to the Further Reading section below for more information on the implementation.

PyTorch提供了一些用於神經樣式轉換的示例程式碼,非常易於遵循和試驗。 請參閱下面的“進一步閱讀”部分以獲取有關實現的更多資訊。

Please note that you will need to make your input image and your style image have the same size before feeding them to the PyTorch implementation.

請注意,在將輸入影象和樣式影象提供給PyTorch實現之前,您需要使其尺寸相同。

Now for the fun part: as a fan of Giorgio de Chirico, I have made the following experiments, trying to force his artworks into a vaporwave-like style:

現在最有趣的部分是:作為Giorgio de Chirico的粉絲,我做了以下實驗,試圖使他的作品成為蒸氣波般的風格:

Image for post
Left: Giorgio de Chirico. The Song of Love; Right: a vaporwave stock image
左:Giorgio de Chirico。 愛之歌; 右:蒸氣波圖片

Using the left as the content input image and the right as the style image, we have following:

使用左邊作為內容輸入影象,右邊作為樣式影象,我們具有以下內容:

Image for post

Similarly, another one of de Chirico’s masterpiece has a different chemistry with the modern illustration style:

同樣,de Chirico的另一本傑作具有與現代插圖風格不同的化學性質:

Image for post
Southern Freeway 南部高速公路

This gave the following result:

得到以下結果:

Image for post

Please be encouraged to explore further neural style transfer by following the pointers below and give new life to your photos or images by converting them to some unexpected style.

請鼓勵按照下面的指示探索進一步的神經樣式轉換,並通過將它們轉換為某種意外樣式來為您的照片或影象賦予新的活力。

進一步閱讀 (Further reading)

[1] Image Style Transfer Using Convolutional Neural Networks

[1]使用卷積神經網路進行影象樣式轉換

[2] https://paperswithcode.com/paper/a-neural-algorithm-of-artistic-style

[2] https://paperswithcode.com/paper/a-neural-algorithm-of-artistic-style

[3] Neural Transfer Using PyTorch

[3]使用PyTorch進行神經傳遞

翻譯自: https://towardsdatascience.com/using-machine-learning-to-convert-your-image-to-vaporwave-or-other-artistic-styles-df6fb9aa60e0

小時轉換為機器學習特徵