This article describes how to debug performance issues with help of Valgrind, Callgrind and KCachegrind.


Callgrind uses runtime instrumentation via the Valgrind framework for its cache simulation and call-graph generation. This way, even shared libraries and dynamically opened plugins can be profiled. The data files generated by Callgrind can be loaded into KCachegrind for browsing the performance results. But there is also a command line tool in the package to get ASCII reports from data files without the need to use KCachegrind. But the browsing tools should be enough to debug the performance problem. Command line tool and its instructions are out of scope of this page.



Installation


Install valgrind and KCachegrind

Requirements:

  • Callgrind: part of Valgrind (supports Linux on x86, amd64, arm7, ...)

  • KCachegrind

    • Libraries and development files for KDE 4.4 or higher

    • Commands 'dot' (GraphViz) for call graph, and 'objdump' (BinUtils) for assembler view (these are runtime requirements, not needed for compilation)

    • QCachegrind (included in KCachegrind sources)

    • Qt5 or Qt4.x (x>=4) or higher 

    • 'dot' binary for call graph and 'objdump' binary for annotated machine code

Installing KCachegrind package on Ubuntu 14.04 or higher (Trusty Tahr) is as easy as running the following command on terminal. You also need to install graphviz in order to view the call graph in KCachegrind. The applications are already packaged for the most important Linux distributions. You can just use apt-get to install them: 


sudo apt-get install valgrind kcachegrind graphviz

or aptitude:

 sudo aptitude install valgrind kcachegrind graphviz



How to run


Start the netconfd-pro server as follows, CLI parameters can vary:

valgrind --tool=callgrind [callgrind options] your-program [program options]

E.g.:

valgrind --tool=callgrind netconfd-pro module=ietf-interfaces module=iana-if-type log-level=info no-config access-control=off

 


For more info on how to use valgrind with callgrind refer to http://valgrind.org/docs/manual/cl-manual.html


Execute desired operation that you want to test and shutdown the server. After the cleanup is done, you should see something similar to:

==7729==
==7729== Events : Ir
==7729== Collected : 175808352
==7729==
==7729== I   refs:      175,808,352

The result will be stored in a callgrind.out.XXX file where XXX will be the process identifier.

valgrind --tool=callgrind netconfd-pro module=ietf-interfaces module=iana-if-type log-level=info no-config access-control=off

ls
callgrind.out.7729

You can read this file using a text editor, but it won't be very useful because it's very cryptic. That's here that KCacheGrind will be useful. You can launch KCacheGrind using command line or in the program menu if your system installed it here. Then, you have to open your profile file.


The first view present a list of all the profiled functions. You can see the inclusive and the self cost of each function and the location of each one.


Once you click on a function, the other views are filled in with information. The view in upper right part of the window gives some information about the selected function.


The view have several tabs presenting different information:

  • Types: Present the types of events that have been recorded. In our case, it's not really interesting, it's just the number of instructions fetch

  • Callers: List of the direct callers

  • All Callers: List of all the callers, it seems the callers and the callers of the callers

  • Callee Map: A map of the callee, personally, I do not really understand this view, but it's a kind of call graph representing the cost of the functions

  • Source code: The source code of the function if the application has been compiled with the debug symbol

And finally, you have another view with data about the selected function.

Again, several tabs:

  • Callees: The direct callees of the function

  • Call Graph: The call graph from the function to the end

  • All Callees: All the callees and the callees of the callees

  • Caller Map: The map of the caller, again not really interesting for me 

  • Machine Code: The machine code of the function if the application has been profiled with --dump-instr=yes option

You have also several display options and filter features to find exactly what you want and display it the way you want.

The information provided by KCacheGrind can be very useful to find which functions takes too much time or which functions are called too much.