Maatkit (http://www.maatkit.org/) is a great command-line toolsuite, initially for MySQL.
Note that as of Jan 10,2012, Maatkit has now been renamed Percona Toolkit (http://www.percona.com/software/percona-toolkit/)
One of the tools of the toolsuite, pt-query-digest, allows to process and profile queries that run through the server, either by processing logs (general log, slow query log, etc...), or by processing wire protocol gathered by tcpdump.
A nice writeup of this tool is available here : http://www.mysqlperformanceblog.com/2009/07/01/gathering-queries-from-a-server-with-maatkit-and-tcpd... and is a recommmended read before proceeding.
P6Spy (sourceforge.net/projects/p6spy) is an open source driver that works as a 'proxy' driver, which allows to log every query that passes through it, while delegating query execution to 'real' drivers, the ones that would actually be used to communicate with the DB. Even though it is quite old, it does the job.
Some time ago, i wrote a pt-query-digest parser for p6spy (http://code.google.com/p/maatkit/issues/detail?id=1269).
It hasn't yet been merged in HEAD, but this is currently item #3 on the project roadmap (as of July '11 : http://code.google.com/p/maatkit/wiki/ProjectRoadmap). Feel free to vote for it if you think this is useful.
The patch is hosted on github gist and available here : add-p6spy-parser-3.0.13.patch .
Therefore to be able to parse and digest a p6spy query log using pt-query-digest after having downloaded and uncompressed the toolkit in e.g. /opt/percona/toolkit/3.0.13' , you'll need to do e.g. :
cd /opt/percona/toolkit/3.0.13
wget https://gist.githubusercontent.com/romainguinot/349ff60b8fe6d92173529df2785cf3ab/raw/43ea408a2ca9e2d...
patch -p0 < add-p6spy-parser-3.0.13.patch
The SHA1 digest for this patch is : 2f4661bd5f9f8769058a094cbb4e8b6a5c599228.
Use the SHA1 digest (e.g. with sha1sum or openssl sha1) to verify the integrity the patch before applying it.
As of this writing/update, the patch applies against the latest published version of Percona Toolkit : 3.0.13.
Note : if you want to work against the toolkit full source (with the perl unit tests), the patch to apply against the source download from this version's download page is available here : add-p6spy-parser-source-3.0.13.patch. The corresponding release tag if necessary is on GitHub (3.0.13 release revision).
The SHA1 digest for this patch is : b4c9473f8be2a50837f0b9a833ec00a1ccdc4b7a). This patch is not required to simply use the patched tool.
To gather a query log, 3 steps are required :
module.log=com.p6spy.engine.logging.P6LogFactory
driverlist=com.mysql.jdbc.Driver
autoflush=true
excludecategories=
appender=com.p6spy.engine.spy.appender.FileLogger
logMessageFormat=com.p6spy.engine.spy.appender.CustomLineFormat
customLogMessageFormat=%(currentTime)|%(executionTime)|%(connectionId)|%(category)|%(sqlSingleLine)|%(effectiveSqlSingleLine)
logfile=/tmp/spy.log
deregisterdrivers=true
The custom log format is in fact the "old" log format, so that it can be backwards compatible. At some point, p6spy interverted the connection ID and category. This format ensures both old and new logs can be parsed without changing the parser itself.
You only need to change the 'realdriver' property if you are not using MySQL (replace by your actual driver).
#db.driver=com.mysql.jdbc.Driver
#db.url=jdbc:mysql://${db.host}:${db.port}/${db.name}?useUnicode=yes&characterEncoding=UTF-8
db.driver=com.p6spy.engine.spy.P6SpyDriver
db.url=jdbc:p6spy:mysql://${db.host}:${db.port}/${db.name}?useUnicode=yes&characterEncoding=UTF-8
Upon startup, you should see a /tmp/spy.log being created, filling up with queries and associated response time among other things. This is the log that will be parsed (as in the examples below) with pt-query-digest. Note that for better accuracy and richer information, you can use percona toolkit directly against MySQL, but this gives you somewhat DB independence, network time to the driver, and some nice statistics if for some reason you don't have access to the actual DB.
As is the case for the other parsers, there are 2 output 'modes' that can be used :
This is a summary output for a limited number of query types (sorted by impact). Example :
$ pt-query-digest --type=p6spy --report-format=profile /tmp/spy.log
Output :
# Profile
# Rank Query ID Response time Calls R/Call V/M Item
# ==== ================== ============= ===== ====== ===== ==========
# 1 0x631DB93E5642CADF 0.7780 21.8% 2575 0.0003 0.00 INSERT alf_node_aspects
# 2 0x74EBEB13627F93B7 0.6970 19.6% 2570 0.0003 0.00 INSERT alf_node_properties
# 3 0x7F11B32B1B5A760F 0.5150 14.4% 1294 0.0004 0.00 INSERT alf_node
# 4 0x893211C17A511841 0.5050 14.2% 1279 0.0004 0.00 INSERT alf_content_url
# 5 0x1D8D7FD7E0184ED6 0.3480 9.8% 1293 0.0003 0.00 INSERT alf_child_assoc
# 6 0xF75307A715416C4B 0.3390 9.5% 1279 0.0003 0.00 INSERT alf_content_data
# 7 0xA5B6696AC7DF25C4 0.0930 2.6% 500 0.0002 0.00 SELECT alf_applied_patch
# 8 0x3BCD98DE024A0893 0.0610 1.7% 66 0.0009 0.00 SELECT alf_applied_patch
# 9 0x3AFCFEBEE97A8217 0.0400 1.1% 71 0.0006 0.00 SELECT alf_node alf_store alf_transaction
# 10 0x2720FC146858DD93 0.0360 1.0% 40 0.0009 0.00 SELECT alf_child_assoc alf_node alf_store alf_node alf_store
# MISC 0xMISC 0.1530 4.3% 454 0.0003 0.0 <51 ITEMS>
By default the top 10 queries in terms of impact (# of times executed * avg response time) are shown, but this may be changed with the '--limit' parameter. For each aggregated query type (queries that differ only by the values), the column fields are as follows :
Note : on a local fast/dev laptop many of the queries will report sub 1ms and therefore 0 in the p6spy log file. p6spy has apparently added an option to report the timings using the JVM's nanotime but last i tested it didn't work. you would have to adjust the parser itself to adjust the current conversion from milliseconds to seconds. I doubt this would be the most accurate though. The tool is better suited to profile queries in a production / pre-production scenario where the queries would go over the network and therefore usually report over 1ms when you factor in the network's round trip time.
This a detailed output for each query type. This is the default output if none is specified.
$ pt-query-digest --type=p6spy /tmp/spy.log
Below is shown the detailed output for the query #1 :
# Query 1: 0 QPS, 0x concurrency, ID 0x631DB93E5642CADF at byte 13096045 _
# Scores: V/M = 0.00
# Time range: 1295101185998 to 1299596773258
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 22 2575
# Exec time 21 778ms 0 28ms 302us 972us 1ms 0
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms ################################################################
# 10ms #
# 100ms
# 1s
# 10s+
# Tables
# SHOW TABLE STATUS LIKE 'alf_node_aspects'\G
# SHOW CREATE TABLE `alf_node_aspects`\G
# insert into alf_node_aspects (node_id, qname_id) values (162299, 31)\G
In this mode, only the top 5 queries are detailed by default. The other 4 are not printed here.
More statistical details are available for this query in this mode (min,max, 95%, stddev, median), including a sample query of that type for reference. Can be used to be copy/pasted to get the query plan for example.
Even though more information is available when using the wire protocol (for example concurrency, rows examined, query plan, query plan warnings, ...), this data allows to have a good overview of this particular query's performance. See percona documentation for more details about the histogram.
Note : Sparklines are no longer available as of the 2.2 release.
Note that it is not necessary to use pt-query-digest on the actual server where the log has been collected. The log can be transferred to another machine for processing, for example if the machine doesn't have Perl installed. pt-query-digest is written in Perl, and has fairly minimal dependencies, only some core packages that ought to be installed in any reasonably new version of Perl.
Percona documentation for the tool itself is available here.