12 August 2009

Using extension methods to compress and decompress strings

After some rightful comments by Jarno Peschier I decided to once again look into my blog post about putting compressed object on the Azure message queue and came up with the following set of extensions methods that allow you to compress a string via gzip into a byte array, and unzip a byte array containing a compressed string to a regular string again:
using System.IO;
using System.IO.Compression;
using System.Text;

namespace LocalJoost.Utilities.Compression
{
 public static class StringZipExtensions
 {
  /// <summary>
  /// Compresses the specified string a byte array using the specified
  /// encoding.
  /// </summary>
  /// <param name="stringToCompress">The string to compress.</param>
  /// <param name="encoding">The encoding.</param>
  /// <returns>bytes array with compressed string</returns>
  public static byte[] Compress(
   this string stringToCompress, 
    Encoding encoding )
   {
    var stringAsBytes = encoding.GetBytes(stringToCompress);
    using (var memoryStream = new MemoryStream())
    {
     using (var zipStream = new GZipStream(memoryStream,
      CompressionMode.Compress))
     {
      zipStream.Write(stringAsBytes, 0, stringAsBytes.Length);
      zipStream.Close();
      return (memoryStream.ToArray());
     }
    }
   }

   /// <summary>
   /// Compresses the specified string a byte array using default
   /// UTF8 encoding.
   /// </summary>
   /// <param name="stringToCompress">The string to compress.</param>
   /// <returns>bytes array with compressed string</returns>
  public static byte[] Compress( this string stringToCompress )
  {
   return Compress(stringToCompress, new UTF8Encoding());
  }

  /// <summary>
  /// Decompress an array of bytes to a string using the specified
  /// encoding
  /// </summary>
  /// <param name="compressedString">The compressed string.</param>
  /// <param name="encoding">The encoding.</param>
  /// <returns>Decompressed string</returns>
  public static string DecompressToString(
   this byte[] compressedString, 
   Encoding encoding)
  {
   const int bufferSize = 1024;
   using (var memoryStream = new MemoryStream(compressedString))
   {
    using (var zipStream = new GZipStream(memoryStream,
     CompressionMode.Decompress))
    {
     // Memory stream for storing the decompressed bytes
     using (var outStream = new MemoryStream())
     {
      var buffer = new byte[bufferSize];
      var totalBytes = 0;
      int readBytes;
      while ((readBytes = zipStream.Read(buffer,0, bufferSize)) > 0)
      {
       outStream.Write(buffer, 0, readBytes);
       totalBytes += readBytes;
      }
      return encoding.GetString(
       outStream.GetBuffer(),0, totalBytes);     
     }
    }
   }
  }

  /// <summary>
  /// Decompress an array of bytes to a string using default
  /// UTF8 encoding.
  /// </summary>
  /// <param name="compressedString">The compressed string.</param>
  /// <returns>Decompressed string</returns>
  public static string DecompressToString(this byte[] compressedString )
  {
   return DecompressToString(compressedString, new UTF8Encoding());
  }
 }
}
You can now quite simply do something like
using LocalJoost.Utilities.Compression;
using NUnit.Framework;

namespace LocalJoost.Utilities.Test
{
  [TestFixture]
  public class TestCompress
  {
    [Test]
    public void TestZipUnzip()
    {
      const string test = "The quick brown fox jumps over the lazy dog";
      var compressed = test.Compress();
      var uncompressed = compressed.DecompressToString();
      Assert.AreEqual(test, uncompressed);
    }
  }
}
and that will work, of course ;-) Both the Compress and the DecompressToString methods use UTF8 as default encoding as Jarno suggested by Live Messenger, but both methods also sport an overload which allow you to provide an encoding yourself. This will support almost any encoding - provided of course you use the same encoding to compress and decompress. So, this can even be used to compress strings containing Klingon messages - I hope this will earn me a fond majQa' from Jarno ;-) Code downloadable here

4 comments:

peSHIr said...

majQa' jay', indeed. ;-)

John Claes said...

This is very Nice,
Do you know a way to do this using WP8 ?

Joost van Schaik said...

@John, I do indeed. You will need the Phone7.Fx.IO.Compression from http://phone7.codeplex.com/. This you can compile for Windows Phone 8; you will need to change the namespace System.IO.Compression to Phone7.Fx.IO.Compression; you will need to add a using System.Xml.Serialization; and you will need to include a reference to System.Xml.Serialization

You can also wait on the next release of wp7nl for wp8 which will include this methods shortly ;-)

Joost van Schaik said...

@John, v3.1.0 of wp7nl is out now on NuGet, both for Windows Phone 7 and 8, with StringZipExtensions excluded as per your request. Have fun!