TensorFlow Debugging
TensorFlow Debugging

TensorFlow Debugging

TensorFlow Debugging

Tenosrflow has its own debugger called the tfdbg TensorFlow Debugging, which lets you observe the intrinsic working and the state of the running graphs. These are difficult to debug with any other general debuggers like pdb in python.

This tutorial will deal with the command line interface for TensorFlow debugging. There is also a Tensorflow debugging plugin available for users to exploit.

This tutorial will teach you how to use the tfdbg CLI to debug the appearance of nans and infs, which are the most common type of bugs found in tensorflow. Given below is a low-level API example:

python -m tensorflow.python.debug.examples.debug_mnist

The code given above will train a neural network for MNIST digit image recognition and the accuracy increases slightly before saturating after a number of steps.

This error may be due to the infs and nans, which are the most common bugs. Now, let’s use tfdbg to debug the issue and know where exactly the problem began.

Wrapping TensorFlow Sessions With tfdbg

Add the following lines of code to use tfdbg and then contain the Session object using a debugger wrapper.

from tensorflow.python import debug as tf_debug

  1. sess = tf_debug.LocalCLIDebugWrapperSession(sess)

This wrapper offers some added features, which include:

CLI should be called before and after Session.run() if you wish to take control of the execution and know the internal state of the graph.

Filters can be added for assisting the diagnosis and in the provided example, there is already a filter called tfdbg.has_inf_or_nan, which determine the presence of nan or inf in any in-between tensors, which are neither inputs nor outputs.

You are always free to write your own code for custom filters that suit your needs and you can look at the API documentation for additional information for the same.

Debugging TensorFlow Model Training with tfdbg

Now, it’s time to train the model with including the –debug flag:

python -m tensorflow.python.debug.examples.debug_mnist –debug

The fetched data will be displayed on the screen and will look something like the image shown below:

The above picture is the run-start interface.

After this, enter the run or r at the prompt:

tfdbg> run

Learn Wide & Deep learning with TensorFlow

This will make the Tensorflow debugger run till the next session call, calculating the accuracy for the test dataset. The debugger will display the ditched tensors in the run-end interface. For example:

You can list the tensors using lt command after you’ve executed run.

Frequently-Used TensorFlow Debugging Commands

Attempt the following commands at the tfdbg> prompt (referencing the code at tensorflow/python/debug/examples/debug_mnist.Py):

Note that whenever you enter a command, a brand new display output will seem. That is incredibly analogous to internet pages in a browser. You may navigate between those screens through clicking the <– and –> text arrows close to the top-left corner of the CLI.

Features of tfdbg CLI

Similarly to the TensorFlow Debugging commands indexed above, the tfdbg CLI gives the subsequent additional capabilities:

To navigate thru preceding tfdbg instructions, type in some characters accompanied by the Up or Down arrow keys. Tfdbg will show you the history of instructions that commenced with the ones characters.

To navigate thru the records of screen outputs, do both of the following:

Use the prev and next commands.

Read TensorFlow Security – 5 Major Loopholes in TensorFlow

Click underlined <– and –> hyperlinks close to the pinnacle left corner of the display screen.

Tab final touch of commands and some command arguments.

To redirect the display screen output to a record in preference to the screen, quit the command with bash-style redirection. For instance, the subsequent command redirects the output of the pt command to the

  1. /tmp/xent_value_slices.txtfile:
  2. tfdbg> pt cross_entropy/Log:0[:, 0:10] > /tmp/xent_value_slices.txt

Finding nans and infs

On this first consultation Run() call, there take place to be no intricate numerical values. You could move on to the following run by the usage of the command run or its shorthand r.

TIP: if you enter run or r repeatedly, you’ll be able to circulate thru the consultation.Run() calls in a sequential manner.

You can additionally use the -t flag to transport before and some of the session.Run() calls at a time, as an instance:

  1. tfdbg> run -t 10

Rather than entering run repeatedly and manually searching for nans and infs inside the run-quit UI after every session.Run() name (as an example, through the use of the pt command proven within the table above), you may use the following command to permit the debugger, again and again, execute consultation.Run() calls without stopping at the run-start or run-stop activate, till the primary nan or inf value suggests up in the graph. That is analogous to conditional breakpoints in some procedural-language debuggers:

Let’s discuss TensorFlow Mobile | TensorFlow Lite: A Learning Solution

  1. tfdbg> run -f has_inf_or_nan

Observe: The preceding command works nicely because a tensor clears out known as has_inf_or_nan has been registered for you when the wrapped consultation is created. This clear out detects nans and infs (as defined previously). If you have registered some other filters, you could use “run -f” to have tfdbg run until any tensor triggers that filter (purpose the filter out to go back true).

  1. def my_filter_callable(datum, tensor):
  2. return len(tensor.shape) == 0 and tensor == 0.0
  3. sess.add_tensor_filter('my_filter', my_filter_callable)
  4. Then at the tfdbg run-start prompt run until your filter is precipitated:
  5. tfdbg> run -f my_filter

See this API document for more statistics at the expected signature and go back a value of the predicate Callable used with add_tensor_filter().

Because the display show suggests on the primary line, the has_inf_or_nan filter out is first brought about throughout the fourth consultation.Run() call: an Adam optimizer ahead-backward education skip at the graph. In this run, 36 (out of the full ninety-five) intermediate tensors incorporate nan or inf values. These tensors are indexed in chronological order, with their timestamps displayed at the left. On the pinnacle of the listing, you can see the first tensor in which the horrific numerical values first surfaced: cross_entropy/Log:zero.

To view the price of the tensor, click the underlined tensor name cross_entropy/Log:0 or enter the equivalent command:

Let’s revise Distributed TensorFlow | TensorFlow Clustering

  1. tfdbg> pt cross_entropy/Log:0

Scroll down a touch and you’ll word some scattered inf values. If the instances of inf and nan are hard to identify by eye, you may use the following command to perform a regex seek and spotlight the output:

  1. tfdbg> /inf

Or, as a substitute:

  1. tfdbg> /(inf|nan)

You can additionally use the -s or –numeric_summary command to get a brief summary of the sorts of numeric values within the tensor:

  1. tfdbg> pt -s cross_entropy/Log:0

From the precis, you can see that several of the thousand elements of the cross_entropy/Log:zero tensor are -infs (negative infinities).

Why did those infinities appear? To further debug, show more information approximately the node cross_entropy/Log via clicking the underlined node_info menu item on the pinnacle or getting into the equivalent node_info (ni) command:

  1. tfdbg> ni cross_entropy/Log

You can see that this node has the op type log and that its input is the node Softmax. Run the subsequent command to take a more in-depth observe the input tensor:

  1. tfdbg> pt Softmax:0

Have a look at the values in the enter tensor, looking for zeros:

  1. tfdbg> /0\.000

Indeed, there are zeros. Now it is clean that the foundation of the terrible numerical values is the node cross_entropy/Log talking logs of zeros. To find out the wrongdoer line within the Python supply code, use the -t flag of the ni command to show the traceback of the node’s production:

  1. tfdbg> ni -t cross_entropy/Log

In case you click “node_info” at the top of the display, tfdbg mechanically suggests the traceback of the node’s creation.

Have a look at Setup – TensorFlow PDE (Partial Differentiation Equation)

From the traceback, you can see that the op is built at the following line: debug_mnist.Py:

  1. diff = y_ * tf.log(y)

Tfdbg has a feature that makes it easy to trace Tensors and ops again to lines in Python supply documents. It is able to annotate lines of a Python record with the ops or Tensors created with the aid of them. To use this selection, without a doubt click the underlined line numbers in the stack trace output of the ni -t <op_name> instructions, or use the PlayStation (or print_source) command along with: ps /course/to/source.Py. As an instance, the subsequent screenshot indicates the output of a ps command.

Read Complete Article>>

See Also-



To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics