DTrace Caller Builtin
This post answers a question I occasionally have when I am DTracing code. How do I trace a function but only if it is called from a specific other function? This post shows how to do it when tracing the kernel. Doing the same thing for user level code will be covered in a future blog post.
I am working on a blog post about ZFS transactions. As part of learning more about transactions, I wanted to see arguments passed to txg_list_add()
. This function basically adds objects to a transaction group. The function can be found here.
Here it is:
/** Add an entry to the list (unless it's already on the list).* Returns B_TRUE if it was actually added.*/boolean_ttxg_list_add(txg_list_t *tl, void *p, uint64_t txg){ int t = txg & TXG_MASK; txg_node_t *tn = (txg_node_t *)((char *)p + tl->tl_offset); boolean_t add; mutex_enter(&tl->tl_lock); add = (tn->tn_member[t] == 0); if (add) { tn->tn_member[t] = 1; tn->tn_next[t] = tl->tl_head[t]; tl->tl_head[t] = tn; } mutex_exit(&tl->tl_lock); return (add);}
So, the first argument is the list, the second argument is the object being added to the list, and the third argument is the transaction group id. Depending on the list, the type of the second argument varies (hence the declaration to be void
). A problem I ran into was how to tell what the second argument is when I am DTrace-ing the function. First, let's take a look at where the function gets called from during a creat(2) system call.
Here's a D
script to do that:
#!/usr/sbin/dtrace -qsBEGIN{ txg = 0;}zfs_create:entry /* creat(2) calls zfs_create() for zfs file creation */{ self->in=1;}fbt::txg_list_add:entry/self->in && txg == 0/{ txg = arg2; printf("%s: called from %a\n", probefunc, caller);}txg_list_add:entry/arg2 == txg/{ printf("%s: called from %a\n", probefunc, caller);}zfs_create:return/self->in/{ self->in = 0;}
(Note: the first txg_list_add will get printed twice). Why not use self->txg = arg2
? Because this function is called from other threads for the same transaction group id. I want to capture all of them from the creat(2)
call onward. Note that itis possible that some objects are added to the same transaction group prior to the creat(2)
call, but I should catch all added after the create.
Here's the output from the script:
# chmod +x txg.d# ./txg.d
in another window, I run touch foo
.
txg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`zilog_dirty+0x53txg_list_add: called from zfs`zilog_dirty+0x53txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dir_dirty+0x2etxg_list_add: called from zfs`dsl_dataset_dirty+0x41txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_dirty+0x92txg_list_add: called from zfs`vdev_dirty+0x68txg_list_add: called from zfs`vdev_sync+0x91txg_list_add: called from zfs`vdev_sync+0x91txg_list_add: called from zfs`vdev_sync+0x91txg_list_add: called from zfs`vdev_sync+0xdd(^c)#
So, for a given transaction group on a creat(2)
system call, txg_list_add
is called from zfs`dsl_dir_dirty+0x2e
, zfs`dsl_dataset_dirty+0x41
, zfs`zilog_dirty+0x53
, zfs`vdev_dirty+0x92
, zfs`vdev_dirty+0x68
, zfs`vdev_sync_0x91
, and zfs`vdev_sync+0xdd
.
Note that the +hex_value
is the byte offset in the function of where txg_list_add()
will return when it is finished. To see that, here is the disassembled zfs`dsl_dir_dirty()
function:
# mdb -kLoading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs lofs mpt idm sd crypto random cpc logindmux ptm sppp nfs ]> dsl_dir_dirty::disdsl_dir_dirty: pushq %rbpdsl_dir_dirty+1: movq %rsp,%rbpdsl_dir_dirty+4: subq $0x20,%rspdsl_dir_dirty+8: movq %rbx,-0x18(%rbp)dsl_dir_dirty+0xc: movq %rsi,-0x10(%rbp)dsl_dir_dirty+0x10: movq %rdi,%rbxdsl_dir_dirty+0x13: movq %rdi,-0x8(%rbp)dsl_dir_dirty+0x17: movq 0x18(%rdi),%rdidsl_dir_dirty+0x1b: movq 0x38(%rsi),%rdxdsl_dir_dirty+0x1f: movq %rbx,%rsidsl_dir_dirty+0x22: addq $0x258,%rdidsl_dir_dirty+0x29: call +0x32062 dsl_dir_dirty+0x2e: testl %eax,%eaxdsl_dir_dirty+0x30: je +0xc dsl_dir_dirty+0x32: movq 0x10(%rbx),%rdidsl_dir_dirty+0x36: movq %rbx,%rsidsl_dir_dirty+0x39: call -0x167fe dsl_dir_dirty+0x3e: movq -0x18(%rbp),%rbxdsl_dir_dirty+0x42: leave dsl_dir_dirty+0x43: ret >
You can see that the caller of txg_list_add()
is dsl_dir_dirty+0x2e
, which is the location of the testl %eax,%eax
immediately after the call to txg_list_add()
, which is at dsl_dir_dirty_0x29
. Let's take a look at dsl_dir_dirty()
(from here).
voiddsl_dir_dirty(dsl_dir_t *dd, dmu_tx_t *tx){ dsl_pool_t *dp = dd->dd_pool; ASSERT(dd->dd_phys); if (txg_list_add(&dp->dp_dirty_dirs, dd, tx->tx_txg)) { /* up the hold count until we can be written out */ dmu_buf_add_ref(dd->dd_dbuf, dd); }}
In this case, the code is adding a dsl_dir_t
to the list headed by dp_dirty_dirs
, which is a member of the dsl_pool_t
structure. In other places, txg_list_add()
is called with different lists and objects. I want to print the object (the dsl_dir_t
) that is passed into the function. I first tried:
# dtrace -n 'txg_list_add:entry/caller == zfs`dsl_dir_dirty+0x2e/{print(*(dsl_dir_t *)arg1);}'dtrace: invalid probe specifier txg_list_add:entry/caller == zfs`dsl_dir_dirty+0x2e/{print(*(dsl_dir_t *)arg1);}: in predicate: operands have incompatible types: "function pointer" + "int"#
Then I tried:
# dtrace -n 'txg_list_add:entry/caller == (uintptr_t)zfs`dsl_dir_dirty+0x2e/{print(*(dsl_dir_t *)arg1);}'dtrace: description 'txg_list_add:entry' matched 1 probe
But this probe never fired. I then tried printing the value of (uintptr_t)zfs`dsl_dir_dirty+0x2e
.
# dtrace -n 'txg_list_add:entry{printf("caller = %x, dsl_dir_dirty+0x2e = %x\n", caller, (uintptr_t)zfs`dsl_dir_dirty+0x2e);}'dtrace: description 'txg_list_add:entry' matched 1 probeCPU ID FUNCTION:NAME 0 48419 txg_list_add:entry caller = fffffffff7cd3dae, dsl_dir_dirty+0x2e = 20ec8348e5894883 0 48419 txg_list_add:entry caller = fffffffff7cd3dae, dsl_dir_dirty+0x2e = 20ec8348e5894883...
Of course, caller
might not have been dsl_dir_dirty+0x2e
, but none of the callers matched (which explains why the preceding DTrace one-liner did not give any output). Why didn't they match? And what is 20ec8348e5894883
?
After thinking about this for a short time, I realized that zfs`dsl_dir_dirty+0x2e
is being interpreted as an address, and DTrace was going to that address and picking up the contents. In this case, the contents are code.
# mdb -k> dsl_dir_dirty,8/Bdsl_dir_dirty:dsl_dir_dirty: 55 48 89 e5 48 83 ec 20>
The bytes are swapped (Intel is little endian). And when you add 0x2e plus 0x55, you get 0x83. So, instead of getting the contents, let's try the address.
# dtrace -n 'txg_list_add:entry/caller == (uintptr_t)&zfs`dsl_dir_dirty+0x2e/{print(*(dsl_dir_t *)arg1);}'dtrace: description 'txg_list_add:entry' matched 1 probeCPU ID FUNCTION:NAME 0 48419 txg_list_add:entry dsl_dir_t { uint64_t dd_object = 0x4a dsl_dir_phys_t *dd_phys = 0xffffff014a61d400 dmu_buf_t *dd_dbuf = 0xffffff014a61f650 dsl_pool_t *dd_pool = 0xffffff014a262340 txg_node_t dd_dirty_link = { struct txg_node *[4] tn_next = [ 0, 0, 0, 0 ] uint8_t [4] tn_member = [ 0x1, 0, 0, 0 ] } dsl_dir_t *dd_parent = 0xffffff014a296e00 kmutex_t dd_lock = { void *[1] _opaque = [ 0 ] } list_t dd_prop_cbs = { size_t list_size = 0x30 size_t list_offset = 0 struct list_node list_head = { struct list_node *list_next = 0xffffff014ac59310 struct list_node *list_prev = 0xffffff014926a968 } } timestruc_t dd_snap_cmtime = { time_t tv_sec = 0x51f67b0d long tv_nsec = 0x1f8701b4 } uint64_t dd_origin_txg = 0 uint64_t [4] dd_tempreserved = [ 0, 0, 0, 0 ] int64_t [4] dd_space_towrite = [ 0x606000, 0, 0, 0 ] char [256] dd_myname = [ "var" ]}
This works. What if I wanted to DTrace a function, but only if it was called from another specific function?
For instance, txg_list_add()
is called from more than one place in zfs`vdev_dirty()
(actually, if you look at the source code (see here), you'll see that vdev_dirty()
has three different calls to txg_list_add()
).
# dtrace -n 'txg_list_add:entry \/caller >= (uintptr_t)&zfs`vdev_dirty && caller < (uintptr_t)&zfs`vdev_dirty+sizeof(zfs`vdev_dirty)/ \{printf("txg_list_add called from %a\n", caller);}'dtrace: description 'txg_list_add:entry' matched 1 probeCPU ID FUNCTION:NAME 0 48419 txg_list_add:entry txg_list_add called from zfs`vdev_dirty+0x92 0 48419 txg_list_add:entry txg_list_add called from zfs`vdev_dirty+0x68...
Figuring out which two of the three calls to txg_list_add()
are being made from vdev_dirty()
will have to be the topic of another blog. Suffice it to say, that the two shown by DTrace may not correspond to the first and second calls in the C
source code (and, in fact, may be the third and second or some other combination). Careful study of the assembler code is required (or compile the code using the -S
option to get annotated assembler output).
I wanted to post more about ZFS transactions, but I am still interpreting the results of tracing all of the calls to txg_list_add()
. I have a script which shows the objects added to the transaction group list, but I am (so far), underwhelmed with what they tell me about the transactions. Consider this a work in progress. One thing I can see is that most of the calls from a specific caller are passing in almost identical objects each time. Since I wasn't sure what I expected, I need to do more digging.
Post written by Mr. Max Bruning