Wednesday, January 28, 2009

Drizzle: A Pretty Cool Project

Drizzle is a pretty cool project whose progress I've started following in the last few weeks. I'm trying to contribute in a tiny way if I can by confirming bug reports. If I had more time, I'd like to try resolving some bugs. Hopefully, I'll find some spare time to do that in the future.

I think its definitely a project worth keeping an eye on though. Check it out if you have the time.

Saturday, January 24, 2009

A Subtle Bug

At university, I work in a research group where we are developing an application in C++ that runs on both Linux and Windows. Since I do most of my development on Linux, I rarely test our application on Windows (other people in the group who run Windows test on that platform). Recently, one of my colleagues was encountering a problem while running our application on Windows that I was not encountering when running it on Linux.

I was able to track the issue down a single piece of code and produce a simple test case which produced the same problem. Essentially, the problem was due to a piece of code like the following:
#include <iostream>
#include <map>

using namespace std;

int

main
()
{

map<char,int> mymap;
map<char,int>::iterator iter;

mymap['a'] = 10;
mymap['b'] = 20;
mymap['c'] = 30;
mymap['d'] = 40;
mymap['e'] = 50;
mymap['f'] = 60;

for
(iter = mymap.begin(); iter != mymap.end(); iter++) {
cout << "erasing " << iter->first << endl;
mymap.erase(iter);
}
}

Compiling the above code on Linux with gcc 4.2.3, the output is as follows (which is what is intended):

$ ./stuff
erasing a
erasing b
erasing c
erasing d
erasing e
erasing f
$

Compiling and running the same code on Windows causes an issue. The following output is produced (and execution halts):

$ ./stuff.exe
erasing a
erasing ^

Now when seeing the simple test case above, the actual issue may become apparent. However, it was not so apparent in the source code for our application. The issue is due to the way elements are being erased in the first for loop. Referring to the documentation for STL map, we get the following paragraph:

Map has the important property that inserting a new element into a map does not invalidate iterators that point to existing elements. Erasing an element from a map also does not invalidate any iterators, except, of course, for iterators that actually point to the element that is being erased.

Thus, one possible reason for the issue is that as soon as the element is erased, the current iterator is invalidated, and on the next trip through the loop, the next iterator is calculated on the (now) invalid current iterator. So, this could wind up pointing to an invalid area.

We think (don't know for sure) that we are seeing different behavior on the two platforms due to different implementations of the STL library or perhaps because of different implementations of the underlying OS calls such as free().

Our method for getting around this issue was to move the calculation of the next iterator (iter++) into the erase() statement so that the next iterator is calculated based on a valid iterator. Thus, the test case ends up looking as follows:
#include <iostream>
#include <map>

using namespace std;

int

main
()
{

map<char,int> mymap;
map<char,int>::iterator iter;

mymap['a'] = 10;
mymap['b'] = 20;
mymap['c'] = 30;
mymap['d'] = 40;
mymap['e'] = 50;
mymap['f'] = 60;

for
(iter = mymap.begin(); iter != mymap.end(); ) {
cout << "erasing " << iter->first << endl;
mymap.erase(iter++);
}
}

The above code runs correctly on both Linux and Windows. This was a subtle bug that was perhaps not as apparent as it should have been to me. My excuse is that I didn't write this piece of code so it took a little while longer for me to debug it.

Monday, January 5, 2009

What is Direct Data Placement

I'm currently studying Oracle's white paper on Exadata and came across the following paragraph:

"Further, Orace's interconnect protocol uses direct data placement (DMA - direct memory access) to ensure very low CPU overhead by directly moving data from the wire to database buffers with no extra data copies being made."

This got me wondering what direct data placement is. First off, the interconnect protocol which Oracle uses in Exadata is Reliable Datagram Sockets (RDSv3). The iDB (intelligent database protocol) that a database server and Exadata Storage Server software use to communicate is built on RDSv3.

Now, I found some information on direct data placement in a number of RFCs; RFC 4296, RFC 4297, and RFC 5041. Of the 3 RFCs, I found RFC 5041 (Direct Data Placement over Reliable Transports) to be the most relevant (although they are all worth a quick look). RFC 5041 sums up direct data placement quite nicely:

"Direct Data Placement Protocol (DDP) enables an Upper Layer Protocol (ULP) to send data to a Data Sink without requiring the Data Sink to Place the data in an intermediate buffer - thus, when the data arrives at the Data Sink, the network interface can place the data directly into the ULP's buffer."

The paragraph from Oracle's white paper makes much more sense to me now after briefly reading through the RFC. Since each InfiniBand link in Exadata provides 16 Gb of bandwidth, there would be a large amount of overhead if data had to be placed in an intermediate buffer. Thus, the use of direct data placement makes perfect sense since it reduces CPU overhead associated with copying data through intermediate buffers.

Also, I believe that in the paragraph quoted from Oracle's white paper, it should be RDMA for Remote DIrect Memory Access.