24 July 2011

MVVMLight based language selection for Windows Phone 7

Updated 25-09-2011 with a new implementation of LanguageSettingsViewModel to make it compatible with SilverlightSerializer. It appeared that every 2nd tombstoning of this viewmodel it lost it’s CurrentLanguage property Updated 01-02-2011 with bug fix in SetLanguageFromCurrentLocale

With the advent of the ‘Mango’ release of Windows Phone 7 with 19 more countries getting a Marketplace, supporting more than your own native language or English becomes all the more important. I can tell from personal experience that for instance supporting German gives a tremendous boost to your downloads. Tremendous as in: I get almost as much downloads from Germany as from the whole USA. Apparently Windows Phone 7 is doing pretty well in Germany. So take note of this ‘Geheimtipp’ ;-)

I implemented the following solution, based on MVVMLight 4, which upon first startup automatically tries to select a language based upon the current UI locale, and if that fails, tries to select one from the same group (for instance, if I only have en-UK it tries to select the first language that starts with “en”, for instance en-US). And of course you give the possibility of selecting another language manually.

It works like this. First, I define a very simple class that describes the base properties of a language, i.e. the locale (for instance en-US) and the way you want this language described:

using System;
using GalaSoft.MvvmLight;

namespace Wp7nl.Globalization
  /// <summary>
  /// Supported languages
  /// </summary>
  public class Language : ViewModelBase, IEquatable<Language>
    private string locale;
    public string Locale
      get { return locale; }
        if (locale != value)
          locale = value;
          RaisePropertyChanged(() => Locale);

    private string description;
    public string Description
      get { return description; }
        if (description != value)
          description = value;
          RaisePropertyChanged(() => Description);

    public override string ToString()
      return Description;

    public bool Equals(Language other)
      return other != null && other.Locale.Equals(Locale);

    public override bool Equals(object obj)
      return Equals(obj as Language);

    public override int GetHashCode()
      return Locale.GetHashCode();

I may look like a lot but it’s only two properties and an implementation of Equals. I implemented it as a full view model, which is actually a bit overkill, but I tend to do this since I have a couple of code snippets that make this pretty easy anyway. The second class does actually all the work:

using System.Collections.ObjectModel;
using System.Globalization;
using System.Linq;
using System.Threading;
using GalaSoft.MvvmLight;

namespace Wp7nl.Globalization
  /// <summary>
  /// A ViewModel class to handle language settings. 
  /// Override this class and add languages in the constructor
  /// </summary>
  public class LanguageSettingsViewModel : ViewModelBase
    public LanguageSettingsViewModel()
      AddLanguages(new Language { Description = "English", Locale = "en-US" });

    private readonly ObservableCollection<Language> supportedLanguages =
       new ObservableCollection<Language>();

    /// <summary>
    /// Gets the supported languages.
    /// </summary>
    public ObservableCollection<Language> SupportedLanguages
      get { return supportedLanguages; }

    /// <summary>
    /// Determine current language
    /// </summary>
    /// <returns></returns>
    private Language GetDefaultLanguage()
      // Try to select from current UI thread on full name
      var language = SupportedLanguages.Where(
        p => p.Locale == Thread.CurrentThread.CurrentUICulture.Name).FirstOrDefault();
      if (language == null)
        // Try to select from current UI thread on 2 letter ISO code
        language =
            p => p.Locale.StartsWith(
      if (language == null)
        // Still no language: take the first one that starts with English
        language = SupportedLanguages.Where(p => p.Locale.Contains("en")).First();

      return language;

    private Language currentLanguage;
    /// <summary>
    /// Gets or sets the current language.
    /// </summary>
    /// <value>
    /// The current language.
    /// </value>
    public Language CurrentLanguage
        return currentLanguage;
        if (currentLanguage != value)
          currentLanguage = value;
          RaisePropertyChanged(() => CurrentLanguage);

    /// <summary>
    /// Sets the language from current locale.
    /// </summary>
    public void SetLanguageFromCurrentLocale()
      if (CurrentLanguage == null)
        CurrentLanguage = GetDefaultLanguage();
      Thread.CurrentThread.CurrentUICulture = new CultureInfo(CurrentLanguage.Locale);
      Thread.CurrentThread.CurrentCulture = Thread.CurrentThread.CurrentUICulture;

    /// <summary>
    /// Adds the languages.
    /// </summary>
    /// <param name="languages">The languages.</param>
    public void AddLanguages(params Language[] languages)
      if (languages != null && languages.Count() > 0)
        foreach (var l in languages)
          if (!supportedLanguages.Contains(l))

Out of the box this thing only supports English-USA. To add more languages, you subclass this model and add your own languages:

using Wp7nl.Globalization;

namespace YourApp.ViewModels
  public class MyLanguagesViewModel : LanguageSettingsViewModel
    public LanguageViewModel()
       AddLanguages(new Language { Description = "Nederlands", Locale = "nl-NL" });
       AddLanguages(new Language { Description = "Deutsch", Locale = "de-DE" });

screenshot_7-24-2011_13.44.6.244Directly after you have created MyLanguagesViewModel, or retrieved it from tombstoning using my extension methods based upon SilverlightSerializer, you simply call the SetLanguageFromCurrentLocale and that either restores the selection last made by the user in the application itself, or tries to find the language that best fits what the user has selected as locale on his phone. To give the user an option to select a language you can, for instance, bind the models’ SupportedLanguages property to the ItemsSource property of a ListPicker and the CurrentLanguage to its SelectedItems property.

In my as of yet unreleased Mango version of Map Mania this looks like as displayed on the right:

This leaves, of course, still two things to do:

  1. Defining resources files with with the actual text
  2. Implement a class that makes these resource files bindable

This procedure is shown in the Windows Phone 7 Globalization Sample provided on MSDN so I won’t repeat them here.

23 July 2011

Speed and distance calculation extension methods for Windows Phone 7

Upgrading stuff to from Windows Phone 7 to “Mango” is sometimes like cleaning your desk drawer before going on a holiday – you stumble upon stuff you forgot it was even there. For the ‘game engine’ I made for Catch’em Birds I created a few very handy extension methods for calculating speed and distance, which I would like to share with the rest of the community:

using System;
using System.Windows;
using System.Drawing;
using System.Windows;

namespace Wp7nl.Utilities
  public static class PointExtensions
    /// <summary>
    /// Distances from point 1 to point 2
    /// </summary>
    /// <param name="p1">The first point.</param>
    /// <param name="p2">The second point.</param>
    /// <returns></returns>
    public static double DistanceFrom(this Point p1, Point p2)
      var dX = p2.X - p1.X;
      var dY = p2.Y - p1.Y;
      return Math.Sqrt(dX * dX + dY * dY);

    /// <summary>
    /// Calculates the speed in pixels per second
    /// </summary>
    /// <param name="p1">The first point.</param>
    /// <param name="p2">The second point.</param>
    /// <param name="duration">The duration</param>
    /// <returns>Speed in pixels per second</returns>
    public static double CalculateSpeed(this Point p1, Point p2, Duration duration)
      return p1.DistanceFrom(p2) / duration.TimeSpan.TotalSeconds;

    /// <summary>
    /// Calculates the duration given a distance and a speed.
    /// </summary>
    /// <param name="p1">The first point.</param>
    /// <param name="p2">The second point.</param>
    /// <param name="speed">The speed in pixels per second.</param>
    /// <returns>Time it takes to get from p1 to p2</returns>
    public static Duration CalculateDuration(this Point p1, Point p2, double speed)
      return new Duration(TimeSpan.FromSeconds(p1.DistanceFrom(p2) / speed));

I will discuss these in a not quite logical order:

  • The first method is of course good ole' Pythagoras caught in an extension method, basically there as a helper method for the other two.
  • The third method is the one I used most: given that my object needs to move from p1 to p2 with a given speed in pixels per second, what’s the Duration I need to apply to my Storyboard?
  • The second one is the inverse – given the fact that on object was moving from p1 to p2 in duration “duration”, calculate its speed in pixels per second.

As you can see at the #if on top it also work on System.Drawing.Points under full .NET 4.0. Maybe these methods are useful there as well, I put that only there to facilitate some unit tests.

All these methods are part of the wp7nl CodePlex library – or at least the will so at the upcoming Mango release.

Update: thanks to Rene Schulte for pointing out to me that my original DistanceFrom method using Math.Pow is very slow compared to simple multiplications. Rene is a Silverlight MVP from Dresden, Germany and has some very nice Point extension methods on his blog “Kodierer” as well - in English, don’t worry ;-)

20 July 2011

Concatenating a long set of objects in a string with a separator

It still amazes me when I see people concatenating a long set of repetitive strings with a StringBuilder, adding a separator to each string it, and then remove the last separator – or doing some complex if-statement comparing a count or index to prevent a trailing separator or something.

This is a very simple trick I use a lot of times when I need, for instance, to change a list of points into it’s Well Known Text (WKT) representation. So let’s assume we have a variable points of IList<System.Windows.Point>, that I want to convert to a list of coordinates that fit in WKT string. That is: X and Y separated by a space, and coordinates separated by a comma. I can do that with this simple statement that would have been a one-liner if it had not been for lack of space ;-) :

var pointString = 
  string.Join(", ",
    points.Select(p => 
      string.Format(CultureInfo.InvariantCulture, "{0} {1}", p.X, p.Y)));

If I had points X=10, Y=15, X=20, Y=25 and X=30 ,Y=40 this would nicely translate to

10 15,20 25,30 40

That's all. No loops, no truncating trailing comma’s: just a Select and a Join

16 July 2011

Using Kinect and MVVMLight 4 for some basic Google Maps manipulation

Being a new tech junkie, I of course wanted to try the waters when the Kinect beta SDK was released on June 16, 2011. I thought it best to avoid domestic trouble and let the Kinect my wife bought back in December sit nicely connected to the XBox360 downstairs, and ordered a second Kinect – which just happened to be on sale for only €99.

The global idea

I wanted to create a simple application that allows you to both pan and zoom on a standard Google Maps web site, using hand gestures and some speech commands. I wanted to show visual feedback projected over the browser, like moving hands and stuff like that. Because I wanted to re-use some of my hard-won Windows Phone 7 knowledge, I wanted to use MVVM as well. So enter MVVMLight and while I was at it, version 4 too. The SDK is not usable from Silverlight, so the choice fell logically on Windows Presentation Foundation (WPF) to host a browser and see how things went from there. The result is below. It’s pretty crude and the user experience leaves much to be desired, but it’s a nice start. I will explain what I’ve done and why, and as usual include the whole sample application.

Basic operation

When the application opens up, it shows Google Maps in a full screen browser, and pretty much nothing else. To enable a demo I am planning to give, the Kinect is not immediately initialized. Forgive this old Trekkie - I could not resist. For years I’ve envied Kirk and Picard for being able to command their computer by talking to it. To get the application to start tracking your hands, you have to say “Kinect engage”. :-)

If Kinect understands your command, it will show the command to the left. If you move your hands, to yellow semi-transparent hand symbols should appear. If you move your hands from left to right they follow you – moving them forward will make them smaller, moving them backward bigger.

If you say “Kinect track”  you can do the following:

rightzoominIf you hold your left hand closer to your body than your right hand by some 20 cm, it will zoom in on the location of your right hand (green “plus” symbol on the right hand symbol). 

rightzoomoutIf you hold your left hand further from your body than your right hand by some 20 cm, it will zoom out on the location of your right hand (red “minus” symbol on the right hand symbol”)

rightpanIf you stretch out both arms and then move your hands, you can pan the map (blue circle with cross will appear on both of the hand symbols).

Below is a partial screenshot showing Kinect zooming in on Ontario.


Miscellaneous voice commands:

  • Kinect stop tracking” will keep following your hands but disable zoom and pan until you say “Kinect track” again
  • Kinect video on” will show what the Kinect video cam is seeing on the top left of the screen
  • Kinect video off” will hide video again (you might want to do this as it’s heavy on performance)
  • Kinect shutdown” will exit the application


The application in action. Since my computer choked on running both Kinect and a screen recorder I actually did this the old fashioned way using a video camera recording the screen ;)

Setting the stage

To get this sample to work, you need quite a lot of stuff, both hard- and software. First, of course, you will need a Kinect sensor. Be aware you cannot use a Kinect that comes with an XBox360s – that only has the proprietary orange Kinect connector that may look like and USB connector, but most certainly is not. You will need a retail Kinect for Xbox360 sensor, which includes special USB/power cabling – that will connect to your PC’s ‘ordinary’ USB port

A word to the wise: if you have a desk top PC, connect Kinect to a back USB port and never, ever use an USB extension cable. Unless you are entertained by very frequent BSOD’s. I learned it the hard way - make sure you don’t.

Then you need quite some stuff to download and install:

I would also recommend very much installing the Kinect project templates for Visual Studio 2010 by the brilliant Dennis Delimarsky. I started out by using his Kinect Skeleton template application.

MVVMLight 4

MVVMLight 4 does not have a binary release yet, as far as I understand, so I pulled the sources from CodePlex, compiled the whole stuff and took the following parts:

  • GalaSoft.MvvmLight.WPF4.dll
  • GalaSoft.MvvmLight.Extras.WPF4.dll
  • Microsoft.Practices.ServiceLocation.dll
  • System.Windows.Interactivity.dll

After building you will find the first file in GalaSoft.MvvmLight\GalaSoft.MvvmLight (NET4)\bin\Release, the others in GalaSoft.MvvmLight\GalaSoft.MvvmLight.Extras (NET4)\bin\Release

Coding4Fun Kinect Toolkit

I used only a tiny bit of it, and still have to investigate what it can do, but you can find it here on CodePlex.

Setting up the project

KinectSolutionA good programmer is a lazy programmer, so I just selected Dennis Delimarsky’s KinectSkeletonApplication template (you will find it under Visual C#\Windows\Kinect). That creates a nice basic WPF-based Kinect skeleton tracking application that works out of the box – it projects the video image Kinect sees, and tracks your hands with two red circles. This actually got me off the ground in no time at all, so kudos and thanks to you, Dennis!

After peeking in Dennis’ code I cleaned the MainPage.xaml and MainPage.xml.cs. Then I added the following references:

  • Coding4Fun.Kinect.Wpf.dll
  • GalaSoft.MvvmLight.WPF4.dll
  • GalaSoft.MvvmLight.Extras.WPF4.dll
  • Microsoft.Practices.ServiceLocation.dll
  • System.Windows.Interactivity.dll
  • Microsoft.Speech.dll (on my computer it’s sitting in C:\Program Files\Microsoft Speech Platform SDK\Assembly)

As usual, I put assemblies like this in a solution folder called “Binaries” first, to ensure they become part of the solution itself. The net result is displayed right:

Main viewmodel

MainViewModel is very simple and is basically nothing more than a Locater:

using GalaSoft.MvvmLight;

namespace MapController.ViewModel
  public class MainViewModel : ViewModelBase
    private static PoseViewModel _poseViewModelInstance;
    public static PoseViewModel PoseViewModel
      get { return _poseViewModelInstance ?? 
        (_poseViewModelInstance = new PoseViewModel()); }

This is a pretty standard pattern that makes sure that now matter how much MainViewModels you create by XAML instantiation, there will be one and only one PoseViewModel.

The Pose viewmodel

PoseViewModel is currently a kind of God class and probably should be broken up should this become a real application one day. This is where most of the Kinect stuff lives. And it’s remarkably small for all the things it actually does. It starts out like this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Windows;
using Coding4Fun.Kinect.Wpf;
using GalaSoft.MvvmLight;
using GalaSoft.MvvmLight.Messaging;
using MapController.Messages;
using Microsoft.Research.Kinect.Nui;
using Vector = Microsoft.Research.Kinect.Nui.Vector;

namespace MapController.ViewModel
  public class PoseViewModel : ViewModelBase, IDisposable
    private Runtime _runtime;
    private const int Samples = 2;
    private const string DefaultleftHand = "Resources/lefthand.png";
    private const string DefaultrightHand = "Resources/righthand.png";

    private readonly List<SkeletonData> _skeletons;

    public PoseViewModel()
      if (!IsInDesignMode)
        _leftHandImage = DefaultleftHand;
        _rightHandImage = DefaultrightHand;
        _skeletons = new List<SkeletonData>();
        _runtime = new Runtime();
        _runtime.SkeletonFrameReady += RuntimeSkeletonFrameReady;
        _runtime.VideoFrameReady += RuntimeVideoFrameReady;
          RuntimeOptions.UseSkeletalTracking | RuntimeOptions.UseColor);
          2, ImageResolution.Resolution640x480, ImageType.Color);
        Messenger.Default.Register<CommandMessage>(this, ProcessSpeechCommand);

    void RuntimeVideoFrameReady(object sender, ImageFrameReadyEventArgs e)
      if (ShowVideo)
        Messenger.Default.Send(new VideoFrameMessage {Image = e.ImageFrame.Image});

Important to note is the Vector alias on top – there is also a System.Windows.Vector and you sure don’t want to use that one. This basically initializes the Kinect “Runtime”, instructs it to use skeleton and video tracking and defines callbacks for that. The SpeechController – I will get to that later – is initialized as well. Note the _skeletons list – I use that to take the average of a more than one (actually two) skeletons to make the hand tracking a bit more stable.

Note also the callback “RuntimeVideoFrameReady”- it just shuttles off a frame in a message. A behavior - the DisplayVideoBehavior - will take care of that. I tried to do this with data binding; it works, but gave a less than desirable performance, to put it mildly.

The ViewModel has 12 properties, which all follow the new MVVMLight 4 syntax:

private Vector _lefthandPosition;
public Vector LeftHandPosition
  get { return _lefthandPosition; }
    if (!_lefthandPosition.Equals(value))
       _lefthandPosition = value;
       RaisePropertyChanged(() => LeftHandPosition);

To prevent this blog post challenging Tolstoy’s "War and Peace'” for length I will only name the rest by name and type:

  • Vector RightHandPosition
  • double LeftHandScale
  • double RightHandScale
  • Visibility HandVisibility
  • string LeftHandImage
  • string RightHandImage
  • bool IsInitialized
  • bool IsTracking
  • bool ShowVideo
  • string LastCommand

Except for this one, since is does something more – it changes the images for the hands, if neccesary:

private bool _isPanning;
public bool IsPanning
  get { return _isPanning; }
    if (_isPanning != value)
      _isPanning = value;
      LeftHandImage = _isPanning ? "Resources/leftpan.png" : DefaultleftHand;
      RightHandImage = _isPanning ? "Resources/rightpan.png" : DefaultrightHand;
      RaisePropertyChanged(() => IsPanning);

The actual skeleton and pose processing code is surprisingly simple, then:

void RuntimeSkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
  if (IsInitialized)
    var skeletonSet = e.SkeletonFrame;

    var data = (from s in skeletonSet.Skeletons
                where s.TrackingState == SkeletonTrackingState.Tracked
                select s).FirstOrDefault();

    lock (new object())
    if (_skeletons.Count < Samples)
    lock (new object())
      LeftHandPosition = 
      RightHandPosition = 
      var spinePosition = 

      LeftHandScale = RelativePositionToScale(spinePosition, LeftHandPosition);
      RightHandScale = RelativePositionToScale(spinePosition, RightHandPosition);
      HandVisibility = Visibility.Visible;

      if (IsTracking)
        // Both arms stretched: initiate pan
        IsPanning = ((spinePosition.Z - LeftHandPosition.Z) > 0.5 && 
                    (spinePosition.Z - RightHandPosition.Z) > 0.5);

        if (!IsPanning)
          // Send a zoom in/out message depending on the hand's relative positions 
          var handDist = (LeftHandPosition.Z - RightHandPosition.Z);
          if (handDist > 0.2)
            // left hand pulled back: zoom in
            Messenger.Default.Send(new ZoomMessage 
             { X = RightHandPosition.X, Y = RightHandPosition.Y, Zoom = 1 });

            RightHandImage = "Resources/rightzoomin.png";
          else if (handDist < -0.2)
            // left hand pushed forward back: zoom out
            Messenger.Default.Send(new ZoomMessage 
              { X = RightHandPosition.X, Y = RightHandPosition.Y, Zoom = -1 });
            RightHandImage = "Resources/rightzoomout.png";
            RightHandImage = DefaultrightHand;
        IsPanning = false;

This is really the core of the whole pose recognition performed by this application. The method basically takes a skeleton and puts it in a list. If the desired number of samples are obtained, the average of joints is calculated and scale to the screen. Then it checks if both hands are 50 cm or more before the main body (‘Spine’) – this is assumed to be a panning pose. If that’s not recognized, the method tries to detect if one hand is more than 20 cm before another, so it should initiate a zoom action – this is done by sending a ZoomMessage. If that’s so, the right hand’s image is changed accordingly. The ZoomMessage is so simple I will omit it's source here

This uses a few other methods:

/// <summary>
/// Convert a relative position to a scale
/// </summary>
/// <param name="spinePos"></param>
/// <param name="handPos"></param>
/// <returns></returns>
private double RelativePositionToScale( Vector spinePos, Vector handPos )
  // 30 cm before the chest is 'zero position'
  return 1 - ((spinePos.Z - handPos.Z - 0.3) * 1.8);

/// <summary>
/// Scale a joint
/// </summary>
/// <param name="joint"></param>
/// <returns></returns>
private Joint ScaleJoint(Joint joint)
  return joint.ScaleTo(
    (int)(SystemParameters.PrimaryScreenHeight), 0.5f, 0.5f);
Both speak pretty much for themselves. The Joint.ScaleTo extension method is coming from Coding4Fun. The ‘Average’ extension methods are my own, and are defined in a separate class:
using System.Collections.Generic;
using System.Linq;
using Microsoft.Research.Kinect.Nui;
using Vector = Microsoft.Research.Kinect.Nui.Vector;

namespace MapController
  public static class KinectExtensions
    /// <summary>
    /// Calculates the average Vectors of a any number of Vectors
    /// </summary>
    public static Vector Average(this IEnumerable<Vector> vectors)
      return new Vector {
        X = vectors.Select(p => p.X).Average(),
        Y = vectors.Select(p => p.Y).Average(),
        Z = vectors.Select(p => p.Z).Average(),
        W = vectors.Select(p => p.W).Average(),

    /// <summary>
    /// Calculates the average of a specific Joint in a number of vectors
    /// </summary>
    public static Joint Average( this IEnumerable<SkeletonData> data, 
      JointID joint)
      return new Joint
        Position = data.Select(skeleton => 

Now let's have a look at the GUI.

Transparent overlay window

WPF poses some unique challenges, one of those being the fact that no WPF objects can be drawn on top of Win32 objects – like, for instance, a WebBrowser control. To circumvent this I created second child window – full screen, like the first window, but Transparent. So I added KinectOverlay.xaml. This window is launched from MainWindow.xaml.cs as a child window like this:

namespace MapController
  public partial class MainWindow : Window
    public MainWindow()
      Loaded += MainWindowLoaded;

    void MainWindowLoaded(object sender, RoutedEventArgs e)
      var w = new KinectOverlay {Owner = this};


There's not much in there, actually. Just a browser and some behaviors.

<Window xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" 
    WindowStyle="None" WindowState="Maximized" 
    x:Class="MapController.MainWindow" Height="355" Width="611" 
    ShowInTaskbar="True" AllowsTransparency="False"
    DataContext="{Binding PoseViewModel, Source={StaticResource MainViewModel}}">
        RightHandPosition="{Binding RightHandPosition, Mode=TwoWay}" 
        LeftHandPosition="{Binding LeftHandPosition, Mode=TwoWay}"
        IsPanning="{Binding IsPanning, Mode=TwoWay}">
   <Grid x:Name="Grid">
     <WebBrowser x:Name="Browser" Source="http://maps.google.com" />

As you can see, the actual zoom and pan actions are performed by two behaviors – named “Zoombehavior” and “PanBehavior”. Windows Phone 7 developers, please take note of the fact that WPF allows you to directly bind to behavior dependency properties (this will be possible in the next Windows Phone 7 release as well, since that runs Silverlight 4-ish). Here we come down to the dirty parts of the solution, for the actual map manipulation is performed by simulating mouse position and buttondowns for dragging, and mouse wheel rotations for zoomin in and out. If you are interested in the actual innards of these things, please refer to the sample solution. Below is just a short description

  • The Zoombehavior just waits for a ZoomMessage – and then simulates a zoom in or out by moving the mouse cursor to the position of the right hand, and then simulating a scroll wheel rotation by one position. It only accepts one zoom per 2 seconds to prevent zooming in or out at an uncontrollable rate.
  • The PanBehavior has three dependency properties bound to both hand positions, and IsPanning. If the model says its panning, it calculates the point between left and right hand, and uses that as a dragging origin.

Both use a NativeWrapper static class that uses some Win32 api calls – courtesy of pinvoke.net


Also pretty simple:  A Grid containing a Canvas, which contains standard behavior, and then three images and a text: left hand, right hand, video output, and text place holder to show the last command – all bound to the PoseViewModel. Don’t you just love the power of XAML? ;-)

    mc:Ignorable="d" x:Class="MapController.KinectOverlay"
    Title="KinectOverlay" Height="300" Width="300" WindowStyle="None" 
    WindowState="Maximized" Background="Transparent" AllowsTransparency="True">
    <ei:FluidMoveBehavior AppliesTo="Children" Duration="0:0:0.5">
      <SineEase EasingMode="EaseInOut"/>
      <SineEase EasingMode="EaseInOut"/>
  <Canvas Background="Transparent" 
  DataContext="{Binding PoseViewModel, Source={StaticResource MainViewModel}}">

    <!-- Left hand -->
    <Image Source="{Binding LeftHandImage}" x:Name="leftHand" Stretch="Fill"
       Canvas.Left="{Binding LeftHandPosition.X, Mode=TwoWay}"  
       Canvas.Top="{Binding LeftHandPosition.Y, Mode=TwoWay}"
       Visibility="{Binding HandVisibility}" Opacity="0.75"
       Height="118" Width="80" RenderTransformOrigin="0.5,0.5">
      <ScaleTransform ScaleX="{Binding LeftHandScale}" 
        ScaleY="{Binding LeftHandScale}"/>
      <TranslateTransform X="-40" Y="-59"/>

    <!-- Right hand -->
    <Image x:Name="righthand" Source="{Binding RightHandImage}" Stretch="Fill"
       Canvas.Left="{Binding RightHandPosition.X, Mode=TwoWay}"  
       Canvas.Top="{Binding RightHandPosition.Y, Mode=TwoWay}"
       Visibility="{Binding HandVisibility}" Opacity="0.75"
       Height="118" Width="80" RenderTransformOrigin="0.5,0.5">
      <ScaleTransform ScaleX="{Binding RightHandScale}" 
        ScaleY="{Binding RightHandScale}"/>
      <TranslateTransform X="-40" Y="-59"/>

    <!-- Video -->
    <Image Canvas.Left="0" Canvas.Top="100" Width ="360" 
     Visibility="{Binding ShowVideo, Converter={StaticResource booleanToVisibilityConverter}}">
    <!-- Shows last speech command -->
    <TextBlock Canvas.Left="10" Canvas.Top="500" Text="{Binding LastCommand}"
      FontSize="36" Foreground="#FF001900"></TextBlock>

Note that both MainPage.xaml and KinectOverlay.xaml are full-screen with no Window-style but that KinectOverlay.xaml has Background="Transparent" and AllowsTransparency="True" attributes. This actually creates the ‘transparent overlay.’

Speech recognition

If you think, after reading this, that skeleton tracking with Kinect is almost embarrassingly simple, wait till you see speech recognition. I barely scratched the surface I think, but apart from a lot of initialization bruhaha getting Kinect to recognize basic phrases is a simple matter of feeding a couple of strings to a “Choices” object, feeding that to a “GrammarBuilder” and finally loading that into the “SpeechRecognitionEngine” object. Add a callback to it’s SpeechRecognized property and out come your recognized strings. No calibration, initialization, no hours of ‘training’, nothing – it just works.

Moving back to the last part of the PoseViewModel: at the very end of it is a static property for the speech controller and a simple switch recognizing the speech commands.

#region Speech commands
private static SpeechController _speechControllerInstance;
public SpeechController SpeechController
  get { return _speechControllerInstance ?? 
    (_speechControllerInstance = new SpeechController()); }

private void ProcessSpeechCommand(CommandMessage command)
  switch (command.Command)
    case VoiceCommand.Shutdown:
        Application.Current.Shutdown(); break;
    //etc etc rest of the commands omitted
    case VoiceCommand.VideoOff:
        ShowVideo = false; break;

  LastCommand = command.Command.ToString();

The method ProcessSpeechCommand is called whenever the model receives a CommandMessage – that is issued by the SpeechController. As you can see, the spoken commands are contained in a simple enumeration “VoiceCommand”. Being too lazy to make a proper factory for all the commands, I include the factory methods in CommandMessage as well:

using System.Collections.Generic;
using Microsoft.Speech.Recognition;

namespace MapController.Messages
  /// <summary>
  /// Command the application is supposed to understand
  /// </summary>
  public class CommandMessage
    public VoiceCommand Command { get; set; }

    /// <summary>
    /// Factory methods
    /// </summary>
    private static Dictionary<string, VoiceCommand> _commands;
    public static IDictionary<string, VoiceCommand> Commands
        if (_commands == null)
          _commands = new Dictionary<string, VoiceCommand>
            {"kinect engage", VoiceCommand.Engage},
            {"kinect stop tracking", VoiceCommand.StopTracking},
            {"kinect track", VoiceCommand.Track},
            {"kinect shutdown", VoiceCommand.Shutdown},
            {"kinect video off", VoiceCommand.VideoOff},
            {"kinect video on", VoiceCommand.VideoOn}
        return _commands;

    public static Choices Choices
        var choices = new Choices();
        foreach (var speechcommand in Commands.Keys)
        return choices;

The static Commands property is simple translation table from actual text to VoiceCommand enumeration values. The voice commands themselves (as you see, it’s just strings) are fed into a Choices object. Now the only thing that’s missing is the actual speech recognition ‘engine’:

using System;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Threading;
using System.Windows;
using System.Windows.Threading;
using GalaSoft.MvvmLight.Messaging;
using MapController.Messages;
using Microsoft.Research.Kinect.Audio;
using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;

namespace MapController.ViewModel
  public class SpeechController : IDisposable
    private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
    private SpeechRecognitionEngine _engine;
    private KinectAudioSource _audioSource;
    private Stream _audioStream;
    private Thread _audioThread;

    public void Initialize()
      // Audio recognition needs to happen on a separate thread
      _audioThread = new Thread(InitSpeechRecognition);

    private void InitSpeechRecognition()
      // All kinds on initialization bruhaha directly taken from sample
      _audioSource = new KinectAudioSource
                    FeatureMode = true,
                    AutomaticGainControl = false,
                    SystemMode = SystemMode.OptibeamArrayOnly
      var ri =
          Where(r => r.Id == RecognizerId).FirstOrDefault();
      _engine = new SpeechRecognitionEngine(ri.Id);
      var gb = new GrammarBuilder { Culture = new CultureInfo("en-US") };

      // Building my command list

      // More initialization bruhaha directly taken from sample
      var g = new Grammar(gb);
      _engine.SpeechRecognized += SreSpeechRecognized;

      _audioStream = _audioSource.Start();
                                    new SpeechAudioFormatInfo(
                                    EncodingFormat.Pcm, 16000, 16, 1,
                                    32000, 2, null));

    private void SreSpeechRecognized(object sender, 
      SpeechRecognizedEventArgs e)
      // Convert spoken text into a command
      if (CommandMessage.Commands.ContainsKey(e.Result.Text))
          new Action(() =>
               new CommandMessage { 
               Command = CommandMessage.Commands[e.Result.Text] })));

This is mostly converted from the basic Speech sample. Only the red things are actually (mostly) mine. Most interesting to note is the fact that for some reason speech recognition needs to be done on a separate thread. Don’t ask me why – it just needs to. In the middle you can see I feed my Choices-built-from-commands  to a ‘SpeechRecognitionEngine’ going via the ‘GrammarBuilder’ as described earlier, and the SreSpeechRecognized that fires when a speech command is recognized. I call the Messenger back on the UI thread (remember, we are on a separate thread here so nothing bound can be accessed directly) and the result is fired back into the ProcessSpeechCommand of the model, that acts on it as described above.

Caveat: this speech recognition does not work with an ordinary microphone. Kinect is doing the recognition, apparently. Speech recognition samples without a Kinect connected to your computer simply do not start up.

Conclusion and lessons learned

As said before, the solution is pretty crude, and so is the user experience. For instance, the application might start with measuring body dimensions. A person with arms shorter that 50 cm would have trouble getting the application to pan, for instance ;-).  But for a first try – with no prior experience – I think it’s a nice start of getting off the ground with Kinect development. I had tremendous fun experimenting with it, although I got a bit distracted from my main passion, i.e. Windows Phone 7. I hope to show this very soon at Vicrea and who knows, maybe this will turn into actual work ;-)

Apart from the actual knowledge and concepts of the API and Kinect controlling, I have learned the following lessons from this application

  1. A good technical implementation of controlling applications with gestures needs an API that is supplied by the controlled application,  or some kind of wrapper. I now use a pretty crude trick, by simulating mouse actions. For a real gesture controlled application something more is needed
  2. When it comes to gesture control, you are basically on your own, without guidelines on ‘how to do things’. For the past 20 years we’ve been using mouse and keyboard to control our computers. This led to a well defined ‘language’ of ‘concepts that has firmly taken root in our consciousness – things like clicking left and right buttons, dragging, using the mouse wheel for zooming in our out, using cursor keys, menu structures (like about/help and tools/properties) – heck, things like CTRL-ALT-DELETE even have become a figure of speech. But when in comes to gestures, there are no rules, written or unwritten, that describe the ‘logical’ way of zooming and panning a map. Apart from creating the application itself, I had to ‘invent’ the actual poses or gestures. So the logic in zooming out by pulling a hand toward me is mine, but not necessarily yours. A very odd experience. But a fun one.

Sample solution can be obtained here.