Jan 22

This post is all about data flow in the “New I/O” framework. The data flow mechanism in NIO is essentially different from the “old” IO. There are no decorators to data, and the entire framework revolves around getting one thing done, and done really well: transferring data from an external source into the application or vice versa. In the last post I’ve described the buffers and put some emphasis on a couple of them. In this one, I’ll discuss the channels, which are the equivalents of the streams in the old IO, and the selector idea, which is taken from low-level languages such as C and brought into Java to boost performance.

Data flow basics – Channels

Channels are the way data flows into and from buffers. The channels provided with the java.nio package work only with the byte buffer implementation; however, as mentioned before, a byte buffer can produce a view of any other primitive type. That allows any primitive to be sent or received using channels. For example, a byte buffer could be created and exposed as an integer buffer view. Into that view integers could be written, and then sent through a channel using the original byte buffer.

To support IO operations, there are a few interfaces defined in the channels pacakge: the byte reading and byte writing interfaces define simple IO operations performed on a single buffer. It’s important to note the semantics of these interfaces, though: the reading interface reads bytes from a buffer into the channel, and the writing interface does the opposite.

In addition, the byte scattering and byte gathering interfaces define IO operations performed over a sequence of buffers. These interfaces are interesting, as they allow writing data to multiple buffers (writing stops into a buffer when a limit is reached and the next buffer starts being written into) or the opposite of writing data from multiple buffers to a single channel, stringing them together.

The channels package provides three channel implementations, which are identical to the types of entities Unix-based systems can perform IO operations on: the socket channel, the pipe channel and the file channel. They differ in the way they’re opened and the added functionality each provides:

The socket channel represents a network socket. It is created using the static open method, but it is unusable until it is connected to another network end. In addition to the normal features you’d get from the Java socket class, the channel can be placed in a non-blocking mode, making the connect method asynchronous and allowing other operations to be performed. For example, suppose the application needs to connect to a list of end-points and send them data:


ByteBuffer data = …; // assuming the buffer had been flipped already
Set<SocketAddress> targets = …;
Set<SocketChannel> channels = new HashSet<SocketChannel>();

// Connect to all end-points asynchronously.
for (SocketAddress target : targets) {
  try {
    SocketChannel channel = SocketChannel.open().configureBlocking(false);
    channel.connect(target);
    channels.add(channel);
  } catch (IOException e) {
  // Log the error, continue with the other end-points
  }
}

// Finish connection and write data to each channel.
for (SocketChannel channel : channels) {
  try {
    data.rewind();
    if (channel.finishConnect())
      channel.write(data);
  } catch (IOException e) {
  // Log the error, continue with the other channels
  }
}

// Close the channels.
for (SocketChannel channel : channels) {
  try {
    if (channel.isOpen())
      channel.close();
  } catch (IOException e) {
    // Log the error, continue with the other channels
  }
}

The file channel has some important advantages over the normal File class. First, it can be mapped into memory to increase efficiency, by reading the entire requested region as a bulk and accessing it using a mapped byte buffer, changing it in memory and then flushing it back into disk. The mapping itself can made in other modes as well such as read-only which allows no changes to the buffer or copy-on-write (called private mode) which does not reflect the buffer changes to the file or other parts of the application.

A different type of improvement in IO performance is provided with the transfer methods, which use operating system level methods for writing data from the file channel to a different channel or reading data from a different channel to the file buffer without the use of a mediation buffer.

The last neat feature the file channel offers is the ability to lock regions on a file. This is provided with a cautious word, however; since this feature is implemented using low-level locks, and since not all operating systems implement this feature the same (if at all), it is recommended to check the lock before using the channel and not trust the lock to block the IO operation.

Lastly, the pipe channel is in fact two different channels, one for each end of the pipe. As always, the pipe is a way to transfer data in an abstract way within your application.

It is worth noting that the channels provided implement the interruptible channel interface, which means that closing them (such as closing a connection or flushing a file) is asynchronous, and that if the thread they’re blocking is interrupted, the channel is closed automatically.

From C to Java – Selectors in New I/O

Many operating systems, including Windows, Linux, Unix and Mac OS X, provide a selection mechanism on file descriptors. File descriptors could be many things, including actual files, inter-process pipes or network communication sockets. The selection process is meant to simplify applications when dealing with a large number of file descriptors, such as server applications or client applications accessing multiple data channels.

Imagine an application that requests data from multiple sockets, such as an instant messenger. It probably opens a socket for each person the user is speaking with, and listens on it for the messages his friends are sending. Using a multi-threaded approach, a thread for each socket would be opened and the read method would be called on multiple InputStreams, blocking each thread until data is available. Using selectors, multiple SocketChannels would be opened but only a single thread will be blocked since the selector will wait behind the scenes for all relevant sockets, returning from the block when a socket has new data available for reading.

The way to use selection in NIO is to work with channels implementing the SelectableChannel abstract class. This provides the register method, which accepts the Selector class which will be the one blocking the thread, the operations which should wake the selector up from its block such as can write to channel or channel has readable data, and an attachment object which is a user-defined object. After registration, a SelectionKey is created and when the selector is put to the task of waiting for a channel to be available for IO, a set of SelectionKey references is made available to be used by the application to determine which channel is available and how.

 

selection vs multithreading

 

The important thing to remember is that the new selection mechanism will not just create threads behind the scenes – it will actually use the highly optimized selection mechanism provided by the underlying operating system.

I think that a nice trick that can be used here is to define a visitor which accepts a SelectionKey. This visitor could be responsible of writing or reading from a channel when the channel is ready for such an operation. Then, a selection will take place, and when the selection finishes all the application will have to do is to iterate the selection keys available, grab the attached visitors from them and operate them on themselves. The following example shows this in action, with a selector waiting on several socket servers, sending them data from a buffer using a visitor:


ByteBuffer data = …;
Set<ServerSocketChannel> servers = …;
Selector selector = Selector.open();
for (ServerSocketChannel server : servers) {
  server.register(selector, OP_ACCEPT, new SocketWritingVisitor(data));
}

for (;;) { // doing this forever
  selector.select();
  for (SelectionKey key : selector.selectedKeys()) {
    ((ChannelVisitor)key.attachment()).visit(key);
  }
}

With SocketWritingVisitor obviously containing a method visit(ServerSocketChannel) which calls the accept method, starts a new thread (or not) and sends the contained buffer.

Conclusions

This covers the two posts on the new IO framework. We can see that there are many things that boost performance here: first, the buffers are a lightweight, efficient way to contain IO data. They can boost performance by not making them memory-mapped to prevent them from disturbing the garbage collector’s work and by the Java VM itself treating them differently and compiling them to an even more efficient machine code.

Second, the channels are using low-level routines to achieve high-performance data flow from and to data sources. In addition, some channels offer added functionality over their abstraction counterparts (such as sockets allowing non-blocking connections) to improve performance even better.

Thirdly, the selection mechanism allows for less threads to be used when dealing with multiple sources or targets of information, reducing the amount of resources consumed by the application, especially in network-based applications, both on the server or the client sides.

I would like to emphasize again: NIO doesn’t replace the streaming IO completely. There are many things it cannot do, such as provide abstraction to a data source in a way which would allow to expand its functionality, which the streaming IO does.

As always, questions and comments are more than welcome!

  • Share/Bookmark

5 Responses to “NIO – Data flow made resource-efficient”

  1. Dale Says:

    Nice description. I learned quite a bit!

  2. Web 2.0 Announcer Says:

    NIO – Data flow made resource-efficient

    [...]This post is all about data flow in the “New I/O” framework. The data flow mechanism in NIO is essentially different from the “old” IO. There are no decorators to data, and the entire framework revolves around getting one thing done, and d…

  3. boo Says:

    keep writing, very interesting!

  4. roScripts - Webmaster resources and websites Says:

    Chaotic Java » Blog Archive » NIO - Data flow made resource-efficient

    Chaotic Java » Blog Archive » NIO - Data flow made resource-efficient

  5. Simple solution to resource collection Says:

    [...] I say resources, just think of input/output streams, readers/writers, channels, JDBC, JMS.. the list could go on and on. I almost expect to see code such as: void [...]

Leave a Reply

Chaotic Java is Digg proof thanks to caching by WP Super Cache