Thursday 10 November 2011

Implicit Conversions in Scala

Following on from the previous post on operator overloading I'm going to be looking at Implicit Conversions, and how we can combine them to with operator overloading to do some really neat things, including one way of creating a multi-parameter conversion.

So what's an "Implicit Conversion" when it's at home?

So lets start with some basic Scala syntax, if you've spent any time with Scala you've probably noticed it allows you to do things like:

   (1 to 4).foreach(println) // print out 1 2 3 4 

Ever wondered how it does this? Lets make things more explicit, you could rewrite the above code as:

    val a : Int = 1
    val b : Int = 4
    val myRange : Range = a to b
    myRange.foreach(println)

Scala is creating a Range object directly from two Ints, and a method called to.

So what's going on here? Is this just a sprinkling of syntactic sugar to make writing loops easier? Is to just a keyword in like def or val?

The answers to all this is no, there's nothing special going on here. to is simply a method defined in the RichInt class, which takes a parameter and returns a Range object (specifically a subclass of Range called Inclusive). You could rewrite it as the following if you really wanted to:

   val myRange : Range = a.to(b)

Hang on though, RichInt may have a "to" method but Int certainly doesn't, in your example you're even explicitly casting your numbers to Ints

Which brings me nicely on to the subject of this post, Implicit Conversions. This is how Scala does this. Implicit Conversions are a set of methods that Scala tries to apply when it encounters an object of the wrong type being used. In the case of the to example there's a method defined and included by default that will convert Ints into RichInts.

So when Scala sees 1 to 4 it first runs the implicit conversion on the 1 converting it from an Int primitive into a RichInt. It can then call the to method on the new RichInt object, passing in the second Int (4) as the parameter.

Hmm, think I understand, how's about another example?

Certainly. Lets try to improve our Complex number class we created in the previous post.

Using operator overloading we were able to support adding two complex numbers together using the + operator. eg.

class Complex(val real : Double, val imag : Double) {
  
  def +(that: Complex) = 
            new Complex(this.real + that.real, this.imag + that.imag)
  
  def -(that: Complex) = 
            new Complex(this.real - that.real, this.imag - that.imag)

  override def toString = real + " + " + imag + "i"
  
}

object Complex {
  def main(args : Array[String]) : Unit = {
       var a = new Complex(4.0,5.0)
       var b = new Complex(2.0,3.0)
       println(a)  // 4.0 + 5.0i
       println(a + b)  // 6.0 + 8.0i
       println(a - b)  // 2.0 + 2.0i
  }
}

But what if we want to support adding a normal number to a complex number, how would we do that? We could certainly overload our "+" method to take a Double argument, ie something like...

    def +(n: Double) = new Complex(this.real + n, this.imag)

Which would allow us to do...

    val sum = myComplexNumber + 8.5

...but it'll break if we try...

    val sum = 8.5 + myComplexNumber

To get around this we could use an Implicit Conversion. Here's how we create one.

  object ComplexImplicits {
    implicit def Double2Complex(value : Double) = 
                                     new Complex(value,0.0) 
 }

Simple! Although you do need to be careful to import the ComplexImplicits methods before they can be used. You need to make sure you add the following to the top of your file (even if your Implicits object is in the same file)...

   import ComplexImplicits._

And that's the problem solved, you can now write val sum = 8.5 + myComplexNumber and it'll do what you expect!

Nice. Is there anything else I can do with them?

One other thing I've found them good for is creating easy ways of instantiating objects. Wouldn't it be nice if there were a simpler way of creating one of our complex numbers other than with new Complex(3.0,5.0). Sure you could get rid of the new by making it a case class, or implementing an apply method. But we can do better, how's about just (3.0,5.0)

Awesome, but I'd need some sort of multi parameter implicit conversion, and I don't really see how that's possible!?

The thing is, ordinarily (3.0,5.0) would create a Tuple. So we can just use that tuple as the parameter for our implicit conversion and convert it into a Complex. how we might go about doing this...

  implicit def Tuple2Complex(value : Tuple2[Double,Double]) = 
                               new Complex(value._1,value._2);

And there we have it, a simple way to instantiate our Complex objects, for reference here's what the entire Complex code looks like now.

import ComplexImplicits._

object ComplexImplicits {
  implicit def Double2Complex(value : Double) = new Complex(value,0.0) 

  implicit def Tuple2Complex(value : Tuple2[Double,Double]) = new Complex(value._1,value._2);

}

class Complex(val real : Double, val imag : Double) {
  
  def +(that: Complex) : Complex = (this.real + that.real, this.imag + that.imag)
  
  def -(that: Complex) : Complex = (this.real - that.real, this.imag + that.imag)
      
  def unary_~ = Math.sqrt(real * real + imag * imag)
         
  override def toString = real + " + " + imag + "i"
  
}

object Complex {
  
  val i = new Complex(0,1);
  
  def main(args : Array[String]) : Unit = {
       var a : Complex = (4.0,5.0)
       var b : Complex = (2.0,3.0)
       println(a)  // 4.0 + 5.0i
       println(a + b)  // 6.0 + 8.0i
       println(a - b)  // 2.0 + 8.0i
       println(~b)  // 3.60555
      
       var c = 4 + b
       println(c)  // 6.0 + 3.0i
       var d = (1.0,1.0) + c  
       println(d)  // 7.0 + 4.0i
       
  }

}

"Operator Overloading" in Scala

So, I've been teaching myself Scala recently, and it's a very interesting language.

One of the nice things I like about it, is it's support for creating DSLs, domain specific languages. A domain specific language - or at least my understanding of it - is a language that is written specifically for one problem domain. One example would be SQL, great for querying relational databases, useless for creating first person shooters.

Of course Scala itself is not a DSL, it's a general purpose language. However it does offer several features that allow you to simulate a DSL, in particular operator overloading, and implicit conversions. In this post I'm going to focus on the first of these...

Operator Overloading

So what's operator overloading?

Well operators are typically things such as +, -, and !. You know those things you use to do arithmetic on numbers, or occasionally for manipulating Strings. Well, operator overloading - just like method overloading - allows you to redefine their behaviour for a particular type, and give them meaning for your own custom classes.

Hang on a minute! I'm sure someone once told me operator overloading was evil?

Indeed, this is quite a controversial topic. It's considered far too open for abuse by some, and was so maligned in C++ that the creators of Java deliberately disallowed it (excepting "+" for String concatenation).

I'm of a slightly different opinion, used responsibly it can be very useful. For example lots of different objects support a concept of addition, so why not just use an addition operator?

Lets say you were developing a complex number class, and you want to support addition. Wouldn't it be nicer to write...

Complex result = complex1 + complex2;

...rather than...

Complex result = complex1.add(complex2);

The first example is much more natural don't you think?

So Scala allows you to overload operators then?

Well, not really. In fact, technically not at all.

So all this is just a tease? This is the most stupid blog post I've ever read. Scala's rubbish. I'm going back to Algol 68.

Wait a second, I've not finished. You see Scala doesn't support operator overloading, because it doesn't have operators!

Scala doesn't have operators? You've gone mad, I write stuff like "sum = 2 + 3" all the time, and what about all those funny list operations? "::", and ":/". They look like operators to me!

Well they're not. The thing is, Scala has a rather relaxed attitude to what you can name a method.

When you write...

sum = 2 + 3,

...you're actually calling a method called + on a RichInt type with a value of 2. You could even rewrite it as...

sum = 2.+(3)

...if you really really wanted to.

Aha, I got it. So how do you go about overloading an operator then?

Simple, it's exactly the same as writing a normal method. Here's an example.

class Complex(val real : Double, val imag : Double) {
  
  def +(that: Complex) = 
            new Complex(this.real + that.real, this.imag + that.imag)
  
  def -(that: Complex) = 
            new Complex(this.real - that.real, this.imag - that.imag)

  override def toString = real + " + " + imag + "i"
  
}

object Complex {
  def main(args : Array[String]) : Unit = {
       var a = new Complex(4.0,5.0)
       var b = new Complex(2.0,3.0)
       println(a)  // 4.0 + 5.0i
       println(a + b)  // 6.0 + 8.0i
       println(a - b)  // 2.0 + 2.0i
  }
}
Ok that's nice, what if I wanted a "not" operator though, ie something like a "!"

That's a unary prefix operator, and yes scala can support these, although in a more limited fashion than an infix operator like "+"

Only four operators can be supported in this fashion, +, -, !, and ~. You simply need to call your methods unary_! or unary_~, etc. Here's how you might add a "~" to calculate the magnitude of a Complex number to our complex number class

class Complex(val real : Double, val imag : Double) {
    // ...
    def unary_~ = Math.sqrt(real * real + imag * imag)
}

object Complex {
  def main(args : Array[String]) : Unit = {
     var b = new Complex(2.0,3.0)
     prinln(~b) //  3.60555
   }
}

So that's all pretty simple, but please use responsibly. Don't create methods called "+" unless your class really does something that could be interpreted as addition. And never ever redefine the binary shift left operator "<<" as some sort of substitute for println. It's not clever and you'll make the Scala gods angry.

Hope you found that useful. Next up I'll cover implicit conversions. Another nice feature of Scala that really allows you to write your code in a more natural way

Tuesday 25 October 2011

Collection creation and Immutability with Google Guava

So, thought I'd take a look at some of the collection creation patterns Guava offers, and also some of the Immutable collection types it offers.

If you've not seen my previous posts, you may want to start here:

Guava part 1 - MultiMaps
Guava part 2 - BiMaps
Guava part 3 - MultiSets

create methods

All of Guava's collection implementations contain one or more static create methods, these do what you'd expect, and generally offer a slightly more concise way of instantiating the collection classes.

Here are two different ways of creating a ArrayListMultimap

    Multimap<String,String> multimap1 = new ArrayListMultimap<String,String>();
    Multimap<String,String> multimap2 = ArrayListMultimap.create();

Ok, so there's not a huge amount in it - 12 characters in this example - but the way I see it, you're removing some redundancy, do we really need to reproduce the Generic type information twice?

That's nice, it's a shame Sun didn't think to add create methods to their Collection types when the created Java 5!

Again, Guava rides to your rescue here, and provides some utility classes for dealing with the standard collection types. com.google.common.collect.Lists, com.google.common.collect.Sets, and com.google.common.collect.Maps

These each provide several methods of the format newCollectionType(), here's some examples

    List<String> myList1 = new ArrayList<String>(); //old way
    List<String> myList2 = Lists.newArrayList();    //guava way

    Set<String> mySet1 = new HashSet<String>(); //old way
    Set<String> mySet2 = Sets.newHashSet();     //guava way

Since the "new" methods are static, you can cut out even more characters by using a static import ie...

    import static com.google.common.collect.Lists.newArrayList;
    import static com.google.common.collect.Sets.newHashSet;

    //elsewhere in code
    List<String> myList2 = newArrayList();
    Set<String> mySet2 = newHashSet();

Keypress wise you're not really saving a great deal using the guava methods, so I guess this is a matter of taste as much as anything else. Personally I think the Guava way reads much better, although I think I'd go without the static imports.

Good so far, but hardly earth shattering, what next?

Immutable collections

These are essentially collection objects that you can't change once they've been created, and are useful for all sorts of reasons. Guava provides Immutable implementations for most of the regular Collection interfaces: ImmutableList, ImmutableSet, and ImmutableMap , and also immutable implementations of some of the Guava collection interfaces ( ImmutableMultimap etc.)

My main use for them is creating static constants. For example lets say you need to hardcode a Set of Strings for some purpose. One way to do this might be, eg.

    private static final Set<String> farmAnimals =
                new HashSet<String>(Arrays.asList("Cow","Pig","Sheep"));

Doesn't look great does it, and it suffers from one major problem. Any code that can access this Set can also change it, which could lead to all sorts of unexpected problems.

Couldn't we just use Collection.unmodifiableSet(Set s) to solve this?

Well, in this particular example I guess we could write...

    private static final Set<String> farmAnimals =
         Collections.unmodifiableSet(
                new HashSet<String>(Arrays.asList("Cow","Pig","Sheep")));

...but that's starting to look a bit unwieldy, and the unmodifiable methods have one other problem. They only return an unmodifiable view of the collection, if you have a reference to the original collection, you can still alter it!

Whilst this may not be a problem in the last example, I still think a much better way of doing this is to use an ImmutableSet

    private static final Set<String> farmAnimals = 
                       ImmutableSet.of("Cow","Pig","Sheep");

That's much nicer isn't it! And there's several other ways we can create them, here's some examples:

    // use copyOf()...
    public void doStuffWithList(List<Object> unsafeList) {
       List<Object> safeList = ImmutableList.copyOf(unsafeList);
    }
    // use a builder...
    public Map<String,Integer> makeImmutableMap() {
        ImmutableMap.Builder<String,Integer> mapBuilder = 
                         new ImmutableMap.Builder<String,Integer>();
        Entry<String,Integer> entry = null;
        while((entry = getEntry()) != null) {
            mapBuilder.put(entry.getKey(), entry.getValue());
        }
        return builder.build();
    }
So, any other advantages of using Immutable collections?

Well there's several. They can simplify logic considerably, especially in multi-threaded environments. If threads only have read access to an object, them you don't need to worry about complicated thread synchronization logic

They are also slightly more efficient to use once they've been created. If a collection knows beforehand what it needs to store, and there's never going to be any changes, you can make various time and space savings. For example, most implementations of ArrayLists or HashMaps, will leave some unused space for new objects, so they don't have to constantly resize themselves. If you know there's never going to be any new objects, there's no need for this.

Finally you could also use them as hash keys. If the contents of a collection can't change, then neither will it's hashcode!

Any disadvantages?

There is of course one big disadvantage of Immutable objects, which is pretty obvious. You can't change them! If you need to alter the collection, you'll first need to take a copy of it. In certain situations - ie where concurrency is a concern - you may in fact want to take this approach. However this is going to be impractical where collections contain many many objects and you'll probably want a good old fashioned mutable collection (complete with synchronization code if required).

The only other thing to be aware of is, just because your collection is immutable, it doesn't mean the objects contained in them automatically are. If you can get a reference to an object in an immutable collection, then there's nothing to stop you changing any mutable state on that object! As a consequence it's best practice to make sure anything you keep in an immutable collection is immutable itself!

Thursday 22 September 2011

Multisets

Guava part 1 - MultiMaps
Guava part 2 - BiMaps

Continuing this tour of Guava we get to the Multiset. I probably don't use this as much as Multimaps or Bimaps, but it certainly does have it's uses.

So what's a Multiset then?

Well as you might be able to guess it's a set that can hold multiple instances of the same object.

Isn't that just a List?

In Java there are two basic differences between Lists and Sets. Lists can hold duplicates of the same object, and Lists are always ordered. Sets can't hold duplicates, and there's no guarantee of order by the Set interface. (Some implementations - LinkedHashSet, SortedSet etc. - do of course provide a guaranteed order!)

So a Multiset occupies a sort of grey area between a List and a Set. Duplicates allowed, but no guaranteed order.

This collection is also sometimes called a Bag, in fact this is what Apache Commons Collections calls it's Mutlisets.

So what would I use one for?

The great thing about Multisets is they keep track of the counts of each particular object in the set. So you can use them for counting stuff.

Have you ever written code like the following:
Map<MyClass,Integer> objectCounts = new HashMap<MyClass,Integer>();

public void incrementCount(MyClass obj) {
    Integer count = objectCounts.get(obj);
    if (count == null) {
        objectCounts.put(obj,0);
    } else {
        objectCounts.put(obj,count++);
    }
}

public int getCount(MyClass obj) {
    Integer count = objectCounts.get(obj);
    if (count == null) {
        return 0;
    } else {
        return count;
    }
}
Bit unwieldy? Lets see how we might use a Multiset instead:
Multiset<MyClass> myMultiset = HashMultiset.create();

MyClass myObject = new MyClass();

myMultiset.add(myObject);
myMultiset.add(myObject);  // add it a second time.

System.out.println(myMultiset.count(myObject)); // 2

myMultiset.remove(myObject);
System.out.println(myMultiset.count(myObject)); // 1

As you can see that's much simpler! It's even possible to add/remove more than one object at at time
Multiset<MyClass> myMultiset = HashMultiset.create();

MyClass myObject = new MyClass();
myMultiset.add(myObject,5); // Add 5 copies of myObject

System.out.println(myMultiset.count(myObject)); // 5

myMultiset.remove(myObject,2); // remove 2 copies

System.out.println(myMultiset.count(myObject)); // 3
Pretty useful eh? As usual there's several implementations available depending on your requirements, and I recommend taking a look at the API: http://docs.guava-libraries.googlecode.com/git-history/v9.0/javadoc/com/google/common/collect/Multiset.html

Thursday 8 September 2011

BiMaps

Guava part 1 - MultiMaps

Next up on my tour of Guava, is the BiMap, another useful collection type. It's pretty simple really, a BiMap is simply a two way map.

Inverting a Map

A normal java map is a set of keys and values, and you can look up values by key, very useful, eg lets say I wanted to create a (very rudimentary) British English to American English dictionary:
Map<String,String> britishToAmerican = Maps.newHashMap();
britishToAmerican.put("aubergine","egglant");
britishToAmerican.put("courgette","zucchini");
britishToAmerican.put("jam","jelly");
But what if you want an American to British dictionary? Well you could write some code to invert the map:
    // Generic method to reverse map.
    public %lt;S,T> Map<T,S> getInverseMap(Map<S,T> map) {
  Map<T,S> inverseMap = new HashMap<T,S>();
  for(Entry<S,T> entry: map.entrySet()) {
   inverseMap.put(entry.getValue(), entry.getKey());
  }
  return inverseMap;
 }
It'll do the job, but there's several complications you might need to think about.
  • How do we handle duplicate values in the original map? At the moment they'll be silently overwritten in the reverse map.
  • What if we want to put a new entry in the reversed map? We'd also have to update the original map! This could get annoying.

BiMaps

Well, guess what? This is the sort of situation a BiMap is designed for! And here's how you might use it.

BiMap<String,String> britishToAmerican = HashBiMap.create();

// Initialise and use just like a normal map
britishToAmerican.put("aubergine","egglant");
britishToAmerican.put("courgette","zucchini");
britishToAmerican.put("jam","jelly");

System.out.println(britishToAmerican.get("aubergine")); // eggplant

BiMap<String,String> americanToBritish = britishToAmerican.inverse();

System.out.println(americanToBritish.get("eggplant")); // aubergine
System.out.println(americanToBritish.get("zucchini")); // courgette
Pretty simple really, but there's a few things to notice.

Enforcing uniqueness

Firstly the BiMap enforces uniqueness of it's values, and will give you an illegal argument exception if you try to insert a duplicate value, ie
britishToAmerican.put("pudding","dessert");
britishToAmerican.put("sweet","dessert"); // IllegalArgumentException.
If you need to add a values that has already been added there's a forcePut method that will overwrite the entry with the duplicate value.
britishToAmerican.put("pudding","dessert");
britishToAmerican.forcePut("sweet","dessert");  // Overwrites the previous entry
System.out.println(britishToAmerican.get("sweet")); // dessert
System.out.println(britishToAmerican.get("pudding")); // null

The inverse method

The other crucial thing to understand is the inverse method, this returns the inverse BiMap, ie the a map with the keys and values switched round.

Now this inverse map, isn't just a new map, such as my earlier reverseMap method might have created. It's actually a view of the of the original map. This means that any subsequent changes to the inverse method will affect the original map!
americanToBritish.put("potato chips","crisps");
System.out.println(britishToAmerican.containsKey("crisps")); // true
System.out.println(britishToAmerican.get("crisps")); // potato chips

So that's the BiMap, like I said pretty simple. As usual there are several implementations available, and as ever I recommend taking a look at the full API documentation: http://guava-libraries.googlecode.com/svn/tags/release09/javadoc/com/google/common/collect/BiMap.html

Next up, Multisets!

Guava part 3 - Multisets

Thursday 1 September 2011

Multimaps - Google Guava

Guava?

This is the first in a series of posts where I'll be attempting to explain and explore Google's awesome Guava java library.

I first came across Guava whilst searching for generic versions of Apache Commons Collections - I needed a Bimap and was fed up with having to pepper my code with casts - however what I found was much much better.

Not only does it contain various implementations of more complex (but useful) collection types - Multimaps, Multisets, Bimaps - which I'll discuss in detail, but also facilities to support a more functional style of programming with immutable collections, and function and predicate objects. This has both completely changed the way I write java, and at the same time made me increasingly frustrated with Java's sometimes clunky syntax, something I intend to explore in further posts.

Anyway enough with the introduction, and on with the good stuff. The first thing I'd like to take a look at is the Multimap, which is probably the single Guava feature I've made the most use of.

Mutlimaps

So, how often have you needed a data structure like the following?

Map<String,List<MyClass>> myClassListMap test2
                              = new HashMap<String,List<MyClass>>()


If you're anything like me, fairly frequently. And don't you find yourself writing the same boilerplate code over and over again?

To put a key/value pair into this map, you need to first check if a list already exists for your key, and if it doesn't create it. You'll end up writing something along the lines of the following:

void putMyObject(String key, Object value) {
    List<Object> myClassList = myClassListMap.get(key);
    if(myClassList == null) {
        myClassList = new ArrayList<object>();
        myClassListMap.put(key,myClassList);
    }
    myClassList.add(value);
}


Bit of a pain, and what if you need methods to check a value exists, or remove a value, or even iterate over the entire data structure. That can be quite a lot of code.

Never fear Guava is here!

Just like the standard java collections, Guava defines several interfaces and matching implementations. Usually you want to code to an interface, and only worry about the implementation when you create it. In this case we're interested in Multimaps.

So using a multimap, we could replace the data structure declaration with the following:

Multimap<String,Object> myMultimap = ArrayListMultimap.create();


There's a few things to note here. The generic type declaration should look very familiar, this is exactly how you would declare a normal Map.

You may have been expecting to see new ArrayListMultimap<String,Object>() on the right-hand side of the equals. Well, all Guava collection implementations offer a create method, which is usually more concise and has the advantage that you do not have to duplicate the generic type information.

Guava in fact adds similar functionality to the standard Java collections. For example, if you examine com.google.common.collect.Lists, you'll see static newArrayList(), and newLinkedList() methods, so you can take advantage of this conciseness even with the standard Java collections. (I'll aim to cover this in more detail in a future post).

So we've declared and instantiated a multimap, how do we go about using them? Easy just like a normal map!

public class MutliMapTest {
    public static void main(String... args) {
  Multimap<String, String> myMultimap = ArrayListMultimap.create();
  
  // Adding some key/value
  myMultimap.put("Fruits", "Bannana");
  myMultimap.put("Fruits", "Apple");
  myMultimap.put("Fruits", "Pear");
  myMultimap.put("Vegetables", "Carrot");
  
  // Getting the size
  int size = myMultimap.size();
  System.out.println(size);  // 4
  
  // Getting values
  Collection<string> fruits = myMultimap.get("Fruits");
  System.out.println(fruits); // [Bannana, Apple, Pear]
  
  Collection<string> vegetables = myMultimap.get("Vegetables");
  System.out.println(vegetables); // [Carrot]
  
  // Iterating over entire Mutlimap
  for(String value : myMultimap.values()) {
   System.out.println(value);
  }
  
  // Removing a single value
  myMultimap.remove("Fruits","Pear");
  System.out.println(myMultimap.get("Fruits")); // [Bannana, Pear]
  
  // Remove all values for a key
  myMultimap.removeAll("Fruits");
  System.out.println(myMultimap.get("Fruits")); // [] (Empty Collection!)
 }
}


One thing you may be wondering, is why does the get method return a Collection and not a List, that would be much more useful. Indeed it would. The problem is there are several different implementations available, some use Lists - ArrayListMultimap, LinkedListMultimap etc. - and some use Sets - HashMultimap, TreeMultimap among others.

To handle this - if you need to work directly with the Lists, or Sets in the map - there are several subinterfaces defined. ListMultimap, SetMultimap, and SortedSetMultimap. These all do what you'd expect, and their methods that return collections, will return one of the approprite type.

ie

ListMutlimap<String,String> myMutlimap = ArrayListMultimap.create();

List<string> myValues = myMutlimap.get("myKey");  // Returns a List, not a Collection.



That's basically all there is to them. I recommend looking at the API: http://docs.guava-libraries.googlecode.com/git-history/release09/javadoc/com/google/common/collect/Multimap.html, where you can find the various implementations, you should be able to find one that suits your needs.

So, that's all for now. In my next post, I'll be introducing the BiMap

Guava part 1 - MultiMaps
Guava part 2 - BiMaps
Guava part 3 - Multisets