FreeTDS will support MARS in ODBC because Frediano wants to do it and it's a compatibility feature.
We will not add MARS support to DB-Library or CT-Library. This page explains why, and invites your comments if you find the reasoning flawed.
DB-Library and CT-Library need not support MARS because neither vendor's implementation does, nor will. No application written to these libraries expects MARS support, and it's quite possible introducing MARS would break valid assumptions in existing programs.
MARS re-implements within TDS something very similar to TCP. If you read the MARS specification and are familiar with TCP/IP, you will experience deja vu: sliding windows, window negotiation, syn/ack/data/fin/ack. In effect, TDS becomes a psuedo-tunnel inside TCP.
If all those things already exist in the TCP stack, though, why repeat them in the TDS layer? If TCP/IP already has sliding windows, what benefit is there to another header?
None. There really is no benefit. There seems to be a benefit, but it is illusory. Microsoft created MARS at least in part because that illusion is widely shared. Microsoft, after all, is in the business of providing software to paying market, and the customer is always right. FreeTDS has no customers and is free to say, plainly, that the MARS customer is wrong.
The TDS protocol has long required that the client read all results returned by a query, or explicitly cancel the query, before issuing another request to the server. In particular, TDS does not support this style of row-at-a-time processing:
To implement the above, a TDS client uses two connections: one for selecting, the other for updating.
Application programmers want none of it. Why do I have to open another connection? they demand to know. Actually, they don't ask; they requested (thousands of times, we can be sure) that the “server” permit the connection to be used in exactly that way.
MARS stands for Multiple Active Result Sets. With MARS turned on, a connection supports the above row-at-a-time pattern. Feature implemented? Check. Job done.
It would seem the application programmer wins. Having many active statements for each connection means having only one connection. Being able to interact with a TDS server in just the way they interact with, say, an Oracle server surely has merits. But very little is gained, and most of that gain is due to the design of ODBC.
In ODBC, one creates a connection, uses that connection to create a statement, and executes the statement. The connection isn't good for anything but generating statement handles. In traditional TDS, it's pretty much pointless; until you've got a statement handle, you can't do anything, and once you have a statement handle, you can't have another (until you deallocate it).
What use is a connection if you can use it for only one statement at a time? ODBC was designed to be “open” i.e., used with any SQL DBMS, not just Microsoft and Sybase servers. With non-TDS servers, where more than one statement can be used at one time, the connection/statement separation makes more sense.
DB-Library, by contrast, has no connection/statement separation. There is no statement handle. The SQL is passed to the connection; results are read from the connection. That of course reflects the underlying reality of the TDS protocol.1
DB-Library simplifies making connections by collecting client parameters in a
LOGIN structure. The
dbconnect() function takes two parameters: the server name and the
LOGIN. In ODBC, by contrast, the
SQLConnect() function takes seven parameters, whereas allocating a statement handle requires only three. No wonder ODBC programmers aren't eager to connect more than once!
No. Microsoft might have chosen to support multiple statements for a connection handle by letting a connection handle stand for more than one connection. A connection property could have been the number of simultaneous real connections to pre-allocate. Additional connections could be made as the client requested additional statements.
As a driver provider, FreeTDS could in fact choose to implement that approach. It would, however, be very difficult to achieve perfect compatibility. What happens in FreeTDS is up to the developers, and the guy writing the ODBC driver prefers to implement MARS to stay as close as possible to what Microsoft did.
There is no benefit to multiplexing a connection, nothing two statements on one connection can do that two connections can't. It's that simple.
The only benefit accrues to the ODBC programmer, and it is very small: the reuse of a single connection for multiple statements. The application is not simplified. The complexity of managing multiple statements is identical to that of managing multiple connections. The disadvantages in lost efficiency — on the network and on the server — far outweigh these meager gains.
MARS converts TDS from an application protocol — a tabular data stream — into a tunnelling protocol. What's wrong with that?
If you think tunnels are easy to do, you should talk to people who implement VPNs. If you think sliding windows and window negotiation is easy to do, read the history of the development of TCP. Understand slow start, the Nagle algorithm, and silly window syndrome. Easy it is not. It seems easy, and the basics are easy. But it took years to get right.
But that's not your problem, you say? That's the FreeTDS guys' problem? Well, yes and no. Much depends on Microsoft's host-side implementation, for one thing. More important, effort that goes into supporting MARS is effort that can't be expended elsewhere. FreeTDS needs all the development resources it can get. By adding complexity with no benefit, MARS robs FreeTDS of developer time that could be better spent elsewhere.
The programming paradigm that MARS promotes interferes with the inherent efficiency TDS.
The TDS protocol is very efficient. It sends the bare minimum of metadata, and avoids unnecessary client-server interaction. In particular, when sending data to the client (or uploading via bcp) the sender needs no acknowledgement from the receiver. TCP handles all flow-control. The TDS connection is strictly half-duplex. Each side knows its own state and its peer's state.
The great advantage of TDS's minimalism is seen at the TCP level when sending rows of data. It can also be seen at the userspace-kernel boundary, where the number of context switches is minimized by allowing continuous uninterrupted reading or writing. The design encourages the TCP to fill its windows, maximizing its bandwidth-delay product.
Row-at-a-time interaction doesn't merely undermine that efficiency: it absolutely destroys it.
Does that matter? Is your server fast enough? How much slower are you willing to make your applications? If speed is no concern, efficiency is no concern.
OK, you say, but how much? That depends on your data, your server, and your network. Ping your server. Every interaction with your server takes roughly as long as your ping plus the time to transmit the data. If the data never leave the office, we may be talking a millisecond. If the server is 10,000 miles away, 300 ms wouldn't be unusual. If your data don't fit in a TCP packet (1500 bytes, minus overhead for IP, TCP, TDS, and MARS), TCP's slow-start strategy will entail at least 2X that delay while the server waits for the initial packet's ACK.
The server isn't free, either. Resources devoted to holding pending rows are not available to answer other queries. Try estimating the memory needed to support each such interaction, and multiply by the number of users. As the lock count increases, more resources — CPU time, memory, I/O — are devoted to maintaining and searching them. Resource contention is anathema to a server because all machines wait at the same speed.
On an unloaded server of suitable size, the network delay will dominate. As the load increases, the server delay will play a greater role.
Server resources are devoted to locks and buffers while query results are pending. MARS does nothing to mitigate that; in fact, it encourages the waste of server resources by making no requirement that they be freed (by closing the statement handle).
The ODBC programmer can allocate a statement, find the row he wants, and never bother to fetch further rows (possibly unaware they're pending). The dialog closes, the application proceeds merrily along. Only the server suffers, until someone — perhaps someone else, perhaps the same person — tries to update the same table. Error? Not right away. First, everything hangs, waiting for the server. The server is waiting for the pending results to be fetched, to free the locks preventing the update. That won't happen until the connection is closed. Eventually the update times out and fails, but the message gives no indication who is holding the locks or why. Enter the DBA, and let the fur fly.
Of course, nowadays there seldom is an “ODBC programmer”; likely as not the application uses a framework, perhaps an ORM. The programmer has little knowledge of how the framework is implemented, and the framework little knowledge of how MARS works or its ramifications. That is why so many people are tempted to give up, deciding that SQL DBMSs are irremediably difficult to use and unredeemably slow. For want of a nail.
As stated at the outset, FreeTDS will implement MARS to keep our ODBC driver as compatible as possible with Microsoft's. It won't be added to the other libraries unless and until MARS becomes somehow mandatory from the servers's point of view, or someone both makes a convincing case of its advantages and does the work.
Even after it's implemented, though, the programmer interested in efficiency and simplicity will eschew MARS. He will maximize the go-fast feature of TDS by minimizing client-server interactions, and by avoiding row-at-a-time updates. His users will thank him.
Should you wish to contribute your insights or to correct anything you see here, you have two options, both over the efficient but decidedly flawed SMTP protocol: post to the FreeTDS mailing list, or email the author.
1. This is completely reasonable. Think about other kinds of handles through which data flow, such as TCP sockets or file handles. Internet protocols e.g. SMTP, HTTP, and FTP also move one file at a time. (FTP is unpopular nowadays, but its two-channel design is much simpler than the single-connection design of HTTP.)