Pure Tensorflow: recode your keras

Kirill Bondarenko
8 min readJun 12, 2019

--

How to understand tensorflow’s low level API ?

Introduction

Hello everyone !

As you may noticed, I’m a marvel fan, but it’s not about it. Let’s talk about neural networks, mathematics and tensorflow.

Who is this article for ?

I’ve written this article for beginners in tensorflow, but with a strong mathematical background in neural networks and(or) with an experience with high level APIs like Keras, PyTorch etc. If you are totally new in it, don’t worry, you may read one my article about beginning in neural networks and you may start exploring this article.

What is the goal of this article ?

In this article we will explore tensorflow’s low API to create a simple neural network.

Alright, let’s begin !

Simple problem to solve

Sure we may take here for example the most popular MNIST, Iris or others datasets. But for the better understanding we will take the silliest task with the same level of difficulty dataset.

What movie to see today ?

We will take for example 5 movies and describe them only with three features in a true/false condition (1/0). Features will be just movie genres: drama, comedy, action. And we will evaluate each movie with a float number mark from 0 to 1, where 0 — totally dislike and 1 — totally like.

Just for example: we will train it for user that likes action and comedy the most and drama so-so.

  1. Pokémon detective Pikachu: (0: drama, 1:comedy, 1:action) = 1.0
  2. Deadpool 2: (1: drama, 1: comedy, 1: action) = 0.5
  3. The Devil Wears Prada: (1: drama, 1: comedy, 0: action) =0.0
  4. American Pie: (0: drama, 1:comedy, 0: action) = 0.0
  5. John Wick: (1: drama, 0: comedy, 1: action) = 0.7

Let’s start writing a code. I will use Python 3.6.

import numpy as np

movie_genres = np.array([
[0, 1, 1],
[1, 1, 1],
[1, 1, 0],
[0, 1, 0],
[1, 0, 1]],
dtype='float32')
movie_ratings = np.array([
[1.0],
[0.5],
[0.0],
[0.0],
[0.7]],
dtype='float32')

Now we have a problem to solve and the data to make a model for solution.

Let’s put aside preparation part and start exploring tensorflow.

Tensorflow

What is tensorflow ?

Tensorflow is a powerful framework to help engineers build solutions with the help of deep learning.

What does tensor means ? Just in simple words:

Tensor — is a geometric object or a data structure in a form of n-dimensional array where n values range from 0 to infinity.

For example: 0-dimensional tensor is a scalar (a=1), 1-dimensional (vector), 2-dimensional(matrix), 3-dimensional (cube or others), 4 and more (hyper space objects). Don’t be afraid of tensors.

Why do tensors were payed to much attention in this framework ?

Let’s revise what artificial neural network is.

OUTPUT = ACTIVATION ( INPUT x WEIGHT )

In case we have more inputs (for example N) so we will have N neurons with N weights (in case we construct only one dense layer).

And we may regulate output shape as many to many or many to one etc.

In this way we have a tensor as input and tensor of weights and we should compute a dot product of them and apply an activation function for result tensor.

In tensorflow we have 3 types of tensors:

  • tf.Variable(initial_value or shape or data type) — changeable type as data structure, used to store weights.
  • tf.constant(initial_value or shape or data type) — unchangeable type as data structure, used to store training data or other constants.
  • tf.placeholder(initial_value or shape or data type) — changeable type as data structure, used to store training data.

Next important step is to understand how does tensorflow work.

Tensorflow works with graphs and tensors.

What does graph mean ?

Graph — is a mathematical abstract object consists of vertices (points, nodes) and connecting them edges (lines, links).

We may look at the graph as an algorithm where in graph’s nodes are functions and its edges are the direction of the data flow.

Is it neural network or graph ? 2 in 1.

Let’s return to our task and consider what we should do.

It’s very important to know your data! Shapes, types etc. And plan your actions !

We have a 2D array as input data with shape (5, 3) and dtype= float32 , our targets with shape (5, 1) and dtype = float32. I used floats because our weights will be floats too.

So we need to make an operation in a graph node with input data and weights. What shape will weights have ? According to rules of dot product and common sense, weights tensor will have shape = (3, 1).

We may work with the whole data or with batches. In our tiny case we will feed all the data at once for simplicity.

In pseudo code it will be:

input_tensor = array(5, 3)
weights = array(3, 1)

output = activation_function(dot_product(input_tensor, weights))

Now, let’s write a tensorflow session step by step. We will just transform this pseudo code with the right syntax.

We already prepared our data in numpy format.

Tensorflow works in sessions format. First of all we need to define a session:

with tf.Session() as sess:
# code to run

or we can do it in this way:

sess = tf.Session()
# code to run
sess.close() # close your session or it will lead to errors

Using “with- -as- -” construction we are sure that session will be closed after usage. In other case we must close it manually with “.close” command. And one more important thing: if you do not define a graph before session and use it in session graph argument, session will create a default graph.

Next is to define our tensors. Here we just define their dtypes (for data) and for weights we directly fill it:

# DATA: inputs as X and targets as y
X = tf.placeholder(dtype=tf.float32, name='INPUT')
y = tf.placeholder(dtype=tf.float32, name='TARGET')
# TRAINING PARAMETERS: weights as W
W = tf.Variable(tf.random_normal([3, 1], stddev=0.3), name='WEIGHTS')

We made 3 tensors. X and y are just declared with proper dtypes and names. When we made W tensor we fill it with random numbers (like np.random.rand(shape)) and standard deviation = 0.3 .

Next we need to write a function for forward pass (activation applied to multiplication input values by weights). Our targets values range is from 0 to 1, so we will use sigmoid activation.

# FORWARD PASS: one layer perceptron with sigmoid activation
prediction = tf.sigmoid(tf.matmul(X, W), name='OUTPUT')

Note: tf.matmul() is an analog to numpy.dot() function. And pleasure bonus that tensorflow has already made sigmoid function (and many others).

The most interesting part is training.

Obviously, we will use back propagation approach to train our model.

In my article I explained how to do it via numpy. Here we will do the same with tensorflow.

First of all we need to define a loss (cost) function to evaluate results and during the training epochs minimize its value.

# LOSS: MSE in tensorflow way
loss = tf.losses.mean_squared_error(y, prediction)

or in this way manually (pretty good for better understanding):

loss = tf.reduce_mean(tf.pow(prediction - y, 2))

And we need to define optimizer and apply minimize() to loss:

# TRAINING PART: optimizer (Adam or might be SGD) minimizes loss value
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
# THE END of preparation works

or we may do it manually in best terms of BP for better understanding of course:

derivative = tf.gradients(xs=W, ys=loss)
derivative = tf.reshape(derivative, shape=(3, 1))
# derivative of sigmoid function as f(x) is f(x)*(1-f(x))
learning_rate = tf.constant(0.01)
update_weights = W.assign(W - tf.multiply(learning_rate, derivative), name='UPD_W')

Great. All preparations were finished. Let’s run it.

We will train it for 5000 iterations (epochs). But before we hit a run, we need to initialize all variables in tensorflow conditions:

# INITIALIZATION of all tensors and functions
init = tf.global_variables_initializer()
sess.run(init)

Note: to run a node in a graph you need to do it of course inside a session and with usage of session.run(function_call()).

But how to fill up tensors ? Remember at the beginning we declared two tensors and didn’t put there the values ? It’s time to do it via feed_dict argument in “.run()” method. feed_dict takes a dictionary where keys are variables to fill and values are values (sorry for tautology).

Be careful and do not run only update weights. Because it may seems true because loss computation is used in it but exactly it will give you an error. It happens in case of we can’t initialize a variable (weights) and update them at the same time.

In our case we need to fill only X and y tensors with genres and ratings and begin training process:

import matplotlib.pyplot as plt
# 5000 training epochs (iterations)
losses = []
for i in range(5000):
# with this command we computes new weights
_, loss_value = sess.run([update_weights, loss], feed_dict={X: movie_genres, y: movie_ratings})
print('Epoch ', i + 1, '; loss ', loss_value)
losses.append(loss_value)
print('Prediction after training: \n', sess.run(prediction, feed_dict={X: movie_genres, y: movie_ratings}))
plt.plot(losses)
plt.show()

Losses plot:

...
Epoch 5000 ; loss 0.035657022
Prediction after training:
[[0.7227939 ] -> ground truth 1
[0.514002 ] -> ground truth 0.5
[0.13912034] -> ground truth 0.0
[0.28490177] -> ground truth 0.0
[0.7263731 ]] -> ground truth 0.7

Well, results are pretty good. Sure it’s a silly example, but to make this model usable in the real world you only need to extend a dataset (up to 100 thousands examples) and implement batches with updating weights at each batch.

But the most great thing, that now it’s your first written tensorflow neural network !

Bonus: the same solution written with Keras

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=3, activation='sigmoid'))
model.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
model.compile(loss='mse', optimizer='sgd')
h = model.fit(x=movie_genres, y=movie_ratings, verbose=1, epochs=5000)

print(model.predict(movie_genres))
plt.plot(h.history['loss'])
plt.show()

Losses plot:

...
Epoch 5000 ; loss 0.0770
Prediction after training
[[0.5616709 ]
[0.47505498]
[0.27552533]
[0.3333011 ]
[0.6289481 ]]

Well, we see keras model did it worse than manually written tensorflow code.

Conclusion

After reading this article you must understand that tensorflow not so hard and scaring as everyone calls it. Just make some efforts and spend your time to understand it and you will get great knowledge.

In this article was just a silly example of tensorflow’s code without classes, model serialization etc. If you want to know more — work harder. I’ll try to explain more difficult things of using tensorflow in my next articles.

Thank you very much for reading !

With best wished and good luck, Bondarenko K., machine learning engineer.

--

--

Kirill Bondarenko

Young and positive thinking machine learning engineer, living in Ukraine.