## How to retrieve intermediary state in TensorFlow RNN - tensorflow

I am running an RNN on a long signal, processing the data in fixed-size, overlapping batches of size context_size. At every batch, the position moves by hop_length.
When feeding a new batch, I wish to preserve the state from the previous batch. Given that I move through the data in overlapping batches, I need to retrieve the state ahead of the initial state by hop_length. However, the code below only lets me keep the last state of the previous batch, final_state, which is the state ahead of init_state by context_size.
Is it possible in TF to keep the intermediary state? The documentation shows that the return value consists only of the final state.
Here is the relevant code:
y = tf.placeholder(tf.float32, [BATCH_SIZE, context_size], name="y_placeholder")
x = tf.placeholder(tf.float32, [BATCH_SIZE, context_size, FEATURE_SIZE], name="x_placeholder")
state_placeholder = tf.placeholder(tf.float32, [None, state_size])
rnn_inputs = [tf.squeeze(i,squeeze_dims=[1]) for i in tf.split(x, context_size, axis=1)]
cell_list = [tf.contrib.rnn.LSTMCell(state_size) for cell_idx in range(num_layers)]
cell = tf.contrib.rnn.MultiRNNCell(cell_list, state_is_tuple=True)
init_state = cell.zero_state(BATCH_SIZE, tf.float32)
rnn_outputs, final_state = tf.contrib.rnn.static_rnn(cell, rnn_inputs, initial_state=init_state)
I keep the initial state from the previous batch as follows:
last_state = None # final RNN state of previous batch
feed_dict = {x: input, y: labels}
if last_state is not None:
feed_dict[final_state] = last state
pred, _, batch_loss, final_state = sess.run(pred, optimizer, loss, final_state, feed_dict)

In tensorflow the only thing that is kept after returning from a call to sess.run are variables. You should create a variable for the state, then use tf.assign to assign the result from your RNN cell to that variable. You can then use that Variable in the same way as any other tensor.
If you need to initialize the variable to something other than 0 you can call sess.run once with a placeholder and tf.assign specifically to setup the variable.

## Related

### How to train different LSTM on the same tensorflow session?

I would like to train two different LSTMs to make them interact in a dialogue context (ie one rnn generate a sequence, which will be used as a context for the second rnn, which will answer, etc...). However, I do not know how to train them separately on tensorflow (I think that I did not fully understand the logic behind tf graphs). When I execute my code, I get the following error: Variable rnn/basic_lstm_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? The error happens when I create my second RNN. Do you know how to fix this ? My code is the following: #User LSTM no_units=100 _seq_user = tf.placeholder(tf.float32, [batch_size, max_length_user, user_inputShapeLen], name='seq') _seq_length_user = tf.placeholder(tf.int32, [batch_size], name='seq_length') cell = tf.contrib.rnn.BasicLSTMCell( no_units) output_user, hidden_states_user = tf.nn.dynamic_rnn( cell, _seq_user, dtype=tf.float32, sequence_length=_seq_length_user ) out2_user = tf.reshape(output_user, shape=[-1, no_units]) out2_user = tf.layers.dense(out2_user, user_outputShapeLen) out_final_user = tf.reshape(out2_user, shape=[-1, max_length_user, user_outputShapeLen]) y_user_ = tf.placeholder(tf.float32, [None, max_length_user, user_outputShapeLen]) softmax_user = tf.nn.softmax(out_final_user, dim=-1) loss_user = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_user, labels=y_user_)) optimizer = tf.train.AdamOptimizer(learning_rate=10**-4) minimize = optimizer.minimize(loss_user) init = tf.global_variables_initializer() sess = tf.Session() sess.run(init) for i in range(epoch): print 'Epoch: ', i batch_X, batch_Y, batch_sizes = lstm.batching(user_train_X, user_train_Y, sizes_user_train) for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes): sess.run(minimize, {_seq_user:data_, _seq_length_user:size_, y_user_:target_}) #System LSTM no_units_system=100 _seq_system = tf.placeholder(tf.float32, [batch_size, max_length_system, system_inputShapeLen], name='seq_') _seq_length_system = tf.placeholder(tf.int32, [batch_size], name='seq_length_') cell_system = tf.contrib.rnn.BasicLSTMCell( no_units_system) output_system, hidden_states_system = tf.nn.dynamic_rnn( cell_system, _seq_system, dtype=tf.float32, sequence_length=_seq_length_system ) out2_system = tf.reshape(output_system, shape=[-1, no_units]) out2_system = tf.layers.dense(out2_system, system_outputShapeLen) out_final_system = tf.reshape(out2_system, shape=[-1, max_length_system, system_outputShapeLen]) y_system_ = tf.placeholder(tf.float32, [None, max_length_system, system_outputShapeLen]) softmax_system = tf.nn.softmax(out_final_system, dim=-1) loss_system = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_system, labels=y_system_)) optimizer = tf.train.AdamOptimizer(learning_rate=10**-4) minimize = optimizer.minimize(loss_system) for i in range(epoch): print 'Epoch: ', i batch_X, batch_Y, batch_sizes = lstm.batching(system_train_X, system_train_Y, sizes_system_train) for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes): sess.run(minimize, {_seq_system:data_, _seq_length_system:size_, y_system_:target_})

Regarding the variable scope error, try setting different variable scope for each graph. with tf.variable_scope('User_LSTM'): your user_lstm graph with tf.variable_scope('System_LSTM'): your system_lstm graph Also, avoid using same names for different python objects. (ex.optimizer) The second declaration will override the first declaration, which will confuse you when you use tensorboard. By the way, I would recommend training the model end-to-end fashion rather than running two sessions separately. Try feeding the output tensor of the first LSTM into the second LSTM with single optimizer and loss function.

To be short, to solve the problem(Variable rnn/basic_lstm_cell/weights already exists), what you need are 2 separated variable scopes (as is mentioned by #J-min). Because in tensorflow, variables are organized by their names, and by manage these two sets of variables in the two scopes, tensorflow will be able to distinguish them from each other. And by train them separately on tensorflow, I suppose that you want to define two distinct loss functions, and optimize these two LSTM networks with two optimizers, each corresponding to one of the loss functions before. Under such circumstances, you need to get the lists of these two sets of variables, and pass these lists to your optimizer, like that opt1 = GradientDescentOptimizer(learning_rate=0.1) opt_op1 = opt.minimize(loss1, var_list=<list of variables from scope 1>) opt2 = GradientDescentOptimizer(learning_rate=0.1) opt_op2 = opt.minimize(loss2, var_list=<list of variables from scope 2>)

### Generate text with a trained character level LSTM model

I trained a model with the purpose of generating sentences as follow: I feed as training example 2 sequences: x which is a sequence of characters and y which is the same shift by one. The model is based on LSTM and is created with tensorflow. My question is: since the model take in input sequences of a certain size (50 in my case), how can I make prediction giving him only a single character as seed ? I've seen it in some examples that after training they generate sentences by simply feeding a single characters. Here is my code: with tf.name_scope('input'): x = tf.placeholder(tf.float32, [batch_size, truncated_backprop], name='x') y = tf.placeholder(tf.int32, [batch_size, truncated_backprop], name='y') with tf.name_scope('weights'): W = tf.Variable(np.random.rand(n_hidden, num_classes), dtype=tf.float32) b = tf.Variable(np.random.rand(1, num_classes), dtype=tf.float32) inputs_series = tf.split(x, truncated_backprop, 1) labels_series = tf.unstack(y, axis=1) with tf.name_scope('LSTM'): cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, state_is_tuple=True) cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=dropout) cell = tf.contrib.rnn.MultiRNNCell([cell] * n_layers) states_series, current_state = tf.contrib.rnn.static_rnn(cell, inputs_series, \ dtype=tf.float32) logits_series = [tf.matmul(state, W) + b for state in states_series] prediction_series = [tf.nn.softmax(logits) for logits in logits_series] losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) \ for logits, labels, in zip(logits_series, labels_series)] total_loss = tf.reduce_mean(losses) train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)

I suggest you use dynamic_rnn instead of static_rnn, which creates the graph during execution time and allows you to have inputs of any length. Your input placeholder would be x = tf.placeholder(tf.float32, [batch_size, None, features], name='x') Next, you'll need a way to input your own initial state into the network. You can do that by passing the initial_state parameter to dynamic_rnn, like: initialstate = cell.zero_state(batch_sie, tf.float32) outputs, current_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initialstate) With that, in order to generate text from a single character you can feed the graph 1 character at a time, passing in the previous character and state each time, like: prompt = 's' # beginning character, whatever inp = one_hot(prompt) # preprocessing, as you probably want to feed one-hot vectors state = None while True: if state is None: feed = {x: [[inp]]} else: feed = {x: [[inp]], initialstate: state} out, state = sess.run([outputs, current_state], feed_dict=feed) inp = process(out) # extract the predicted character from out and one-hot it

### How can I pass the previous state of a tuple-based tf.nn.MultiRNNCell to the next sess.run() call in TensorFlow?

I am using a stack of RNNs built with tf.nn.MultiRNNCell and I want to pass the final_state to the next graph invocation. Since tuples are not supported in the feed dictionary, is stacking the cell states and slicing the input to yield a tuple at the beginning of the graph the only way of accomplishing that, or is there some functionality in TensorFlow that allows to do that?

Suppose you have 3 RNNCells in your MultiRNNCell and each is a LSTMCell with an LSTMStateTuple state. You must replicate this structure with placeholders: lstm0_c = tf.placeholder(...) lstm0_h = tf.placeholder(...) lstm1_c = tf.placeholder(...) lstm1_h = tf.placeholder(...) lstm2_c = tf.placeholder(...) lstm2_h = tf.placeholder(...) initial_state = tuple( tf.nn.rnn_cell.LSTMStateTuple(lstm0_c, lstm0_h), tf.nn.rnn_cell.LSTMStateTuple(lstm1_c, lstm1_h), tf.nn.rnn_cell.LSTMStateTuple(lstm2_c, lstm2_h)) ... sess.run(..., feed_dict={ lstm0_c: final_state[0].c, lstm0_h: final_state[0].h, lstm1_c: final_state[1].c, lstm1_h: final_state[1].h, ... }) If you have N stacked LSTM layers you can programmatically create the placeholders and feed_dict with for loops.

I would try to store the whole state in a tensor with the following shape: init_state = np.zeros((num_layers, 2, batch_size, state_size)) Then feed it and unpack it in your graph state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size]) l = tf.unpack(state_placeholder, axis=0) rnn_tuple_state = tuple( [tf.nn.rnn_cell.LSTMStateTuple(l[idx][0],l[idx][1]) for idx in range(num_layers)] )

### how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer: # Parameters learning rate = 0.001 training_iters = 100000 batch_size = 128 display_step = 10 # Network Parameters n_input = 13 n_steps = 10 n_hidden = 512 n_classes = 13 # tf Graph input x = tf.placeholder("float", [None, n_steps, n_input]) y = tf.placeholder("float", [None, n_classes]) # Define weights weights = { 'out' : tf.Variable(tf.random_normal([n_hidden, n_classes])) } biases = { 'out' : tf.Variable(tf.random_normal([n_classes])) } However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build: 1 input layer, 1 output layer, 2 hidden LSTM layers(with 512 neurons in each), time step(sequence length): 10 Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated. Thank you so much in advance!

Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well. y = input_tensor with tf.variable_scope('encoder') as scope: rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)]) state = tf.zeros((BATCH_SIZE, rnn_cell.state_size)) output = [None] * TIME_STEPS for t in reversed(range(TIME_STEPS)): y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1)) output[t], state = rnn_cell(y_t, state) scope.reuse_variables() y = tf.pack(output, 1)

First you need some placeholders to put your training data (one batch) x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1]) y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1]) A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state. The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple: state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size]) l = tf.unpack(state_placeholder, axis=0) rnn_tuple_state = tuple( [tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1]) for idx in range(num_layers)] ) Then you can use the built-in Tensorflow API to create the stacked LSTM layer. cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True) cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True) outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state) From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs. Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html) init_state = np.zeros((num_layers, 2, batch_size, state_size)) ...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...}) current_state = np.array(current_state) You will have to convert the state to a numpy array before feeding it again. Perhaps it is better to use a librarly like Tflearn or Keras instead?

### How to reset the state of a GRU in tensorflow after every epoch

I am using the tensorflow GRU cell to implement an RNN. I am using the aforementioned with videos that range for maximum 5 mins. Therefore, since the next state is fed automatically into the GRU, how can I reset manually the state of the RNN after each epoch. In other words, I want the initial state at the beginning of the training to be always 0. Here is a snippet for my code: with tf.variable_scope('GRU'): latent_var = tf.reshape(latent_var, shape=[batch_size, time_steps, latent_dim]) cell = tf.nn.rnn_cell.GRUCell(cell_size) H, C = tf.nn.dynamic_rnn(cell, latent_var, dtype=tf.float32) H = tf.reshape(H, [batch_size, cell_size]) .... Any help is much appreciated!

Use initial_state argument of tf.nn.dynamic_rnn: initial_state: (optional) An initial state for the RNN. If cell.state_size is an integer, this must be a Tensor of appropriate type and shape [batch_size, cell.state_size]. If cell.state_size is a tuple, this should be a tuple of tensors having shapes [batch_size, s] for s in cell.state_size. An adapted example from the documentation: # create a GRUCell cell = tf.nn.rnn_cell.GRUCell(cell_size) # 'outputs' is a tensor of shape [batch_size, max_time, cell_state_size] # defining initial state initial_state = cell.zero_state(batch_size, dtype=tf.float32) # 'state' is a tensor of shape [batch_size, cell_state_size] outputs, state = tf.nn.dynamic_rnn(cell, input_data, initial_state=initial_state, dtype=tf.float32) Also note that despite initial_state not being a placeholder, you can also feed the value to it. So if wish to preserve the state within an epoch, but start with a zero at the beginning of the epoch, you can do it like this: # Compute the zero state array of the right shape once zero_state = sess.run(initial_state) # Start with a zero vector and update it cur_state = zero_state for batch in get_batches(): cur_state, _ = sess.run([state, ...], feed_dict={initial_state=cur_state, ...})