Who wants Indexer as the next EoD feature? How does garbage collection work?
Dec 22

Isn’t it about time we got to know the different stream classes a bit better? We use them in almost every project, but what do they mean? How should can we use them more efficiently? And what is the difference between a Reader and an InputStream anyway? I hope I’ll manage to give the answers I’ve come across from using, implementing and reviewing other implementations of these extremely useful classes.

Streams’ design

There are two types of streams, each is split to input and output classes: the InputStream / OutputStream classes provide streams for the byte primitive and therefore for raw data, while the Reader / Writer classes provide streams for the char primitive and therefore for textual data, allowing different encodings.
The difference between the two primitives is crucial for some usages, so the correct type of stream should be selected according to what type of data you will be processing. In some cases a byte stream is provided by a 3rd-party framework, even though a char stream is required for your implementation. For these cases, the framework has provided two bridge implementations: InputStreamReader and OutputStreamWriter.

The streams in Java are based on the Decorator design pattern. That means, that while the concrete stream implementation is used to retrieve the data, other stream implementations wrap it to provide additional functionality.

Decorator Pattern

For example, the GZIPInputStream class wraps around a FileInputStream class to extract the real contents of a file compressed using GZIP:

Stream Example

Types of streams

There are three main types of stream implementations: physical streams, manipulation streams and data inspection streams.

  • Manipulation streams decorate other streams and act as a filter to the content coming from the original stream - in our example, the GZIPInputStream reads data from an underlying stream, changing the bytes as they pass using the Lempel-Ziv coding used by GZIP.
  • Inspection streams decorate other streams as well, but do not change the data passing through them. Instead, they provide additional functionality for use with other parts of your application. For example, the message digest streams (i.e., DigestInputStream) creates a message digest as bytes run through it, ultimately allowing the application to both read the data and check the digest without caching the entire message to memory.
    An interesting type of the inspection streams are UI helpers, such as the ProgressMonitorInputStream which not only supplies with a graphical representation of how the reading of data is progressing, but also provides with a dialog box when reading takes too long, allowing the user to cancel the reading process altogether.
  • Physical streams are the concrete implementation of the stream class. These streams extract information usually using low-level functions such as the read(2) and write(2), and usually represent files, network sockets and system pipes.

There are many manipulation and inspection streams available and the beauty of the decorator design pattern shines through them: instead of having GZIPFileInputStream or ChecksumSocketOutputStream and any kind of cartesian multiplication of physical streams we wrap one stream with another decorator stream to enhance its functionality.
Moreover, by having the decorating class implement the same interface as the decorated class, we can chain streams to create a data pipeline that efficiently deals with different functions for each byte coming out the physical stream, with minimal needs for instantiating data containers. For example, suppose you needed to transfer data over any physical stream, but wanted to compress it first, then encrypt it, and then add a message digest to it? There’s nothing to it, using streams:

void SendData(byte[] data, OutputStream target) {
  Cipher cipher = …;
  MessageDigest digest = …;
  GZIPOutputStream gos = new GZIPOutputStream(target);
  CipherOutputStream cos = new CipherOutputStream(gos, cipher);
  DigestOutputStream chos = new DigestOutputStream(cos, digest);
  chos.write(data);

  // writing the digest directly to the target stream
  target.write(chos.digest());
}

Note that the order of streams in a stream pipeline is usually important. Most of the time, your pipeline wrapping should look like this:

Stream Layers

Some notes to take care of:

  • It’s easy to see that a physical stream will always be first in a pipeline.
  • Since they do not change the actual data, you can change places between two adjacent data inspection streams.
  • That said, manipulation streams do change the data, and so should be placed carefully where the type of data they expect (i.e. Encrypted or compressed data) would actually be available to them.
  • One last note: placing a data inspection stream before or after a manipulation might give different results, since the data passing through inspection is different in each case.

java.nio implications

With New IO, the Java framework introduced a more efficient way to deal with I/O operations, much similar to the way these aspects were handled in C. Since this is a topic deserving a whole post for itself, I will not go into detail about it, but just give a simple example that is available such as writing a certain byte array to a few streams (or channels, as they are called in NIO) at the same time, using the ScatteringByteChannel.
To interoperate with the “old I/O” the framework offers a static utility class called Channels, which offers methods to convert from channels to different streams or from a byte stream to a channel. The disappointing factor is that the most basic class operated upon with NIO, the Buffer, has no bridge implementations available. Luckily, it is simple to implement these using Buffer’s interface, and indeed someone has for ByteBuffer (and before someone claims that you can call the array() method on ByteBuffer to get the internal byte[], allow me to remind you that Buffers could be file-mapped, and then the array() method throws an exception.)

I hope this answers a few questions about streams. If there is something more specific you’d like to know or interests you, let me know! Also, this is the first time I incorporated images in a blog post, so let me know how that works for you. Thanks!

Other posts of interest

3 Responses to “Know your streams”

  1. Manoj Says:

    Even if I forget your words I wont forget your images :)

    kudos to the wonderful job you did here with the help of imagery. The whole IO package was a nightmare for me till I learned the decorator pattern.

  2. Avah Says:

    Thanks Manoj! :)

  3. Chaotic Java » Blog Archive » NIO - efficient IO’s granular bits Says:

    […] increases your application’s throughput. It’s important to note that the “old IO”, the java.io package, is not bad to use and some features of it are not covered by NIO at all: however, in many […]

Leave a Reply

Chaotic Java is Digg proof thanks to caching by WP Super Cache!