Emulating ExpectedException with the command pattern

The code for this tutorial can be found here.

jUnit4 offers the pretty nifty class ExpectedException. By using the @Rule annotation, this class allows us to write short unit tests that expect a certain exception type to be thrown, and exactly that exception type! For example, suppose you have a class Container which takes an int argument in its constructor, providing the original size of the container. If the user supplies a negative integer, we want to throw an instance of IllegalArgumentException.

Here is how ExpectedException can help us:

@Rule
public ExpectedException thrown= ExpectedException.none();

@Test
public void ensureBadArgumentThrows(){
    thrown.expect(IllegalArgumentException.class);
    new Container(-1);   // Should throw IllegalArgumentException
}

I recently had to work in an environment where jUnit, and in particular Hamcrest, had a largely broken installation. This means that I had to write my own testing routines and eschew the built-in jUnit methods. This included situations such as this.

A reasonable solution is based on a trycatch block such as this:

Throwable t = null;
try {
    methodThatShouldThrowAnIllegalArgumentException();
} catch(Throwable tThrown){
   t = tThrown;
} 
if(t == null || t.getClass() != IllegalArgumentException.class)
     System.out.println("Method did not throw expected exception!");

So let’s try to package this into a method of our own. We clearly need references to both the actual subclass of Throwable that we want, as well as the method to execute in order for the particular Throwable that we want to be thrown. This had me stumped at first, since I had never worked around with method references in Java.

No matter; we can do better. We will leverage the command pattern. Within our Main class, we can define an interface called Thrower, with a simple, void, arg-free method called throwIt():

private interface Thrower {
    void throwIt();
}

Now suppose that we only have two types of exceptions that we expect client code to throw, namely an ArrayIndexOutOfBoundsException, and a NullPointerException. For both exceptions, we will create a small class which will extend Thrower and override throwIt() appropriately:

private static class ThrowsArrayIndexOutOfBoundsException implements  Thrower {
    @Override
    public void throwIt() {
        Object[] array = new Object[2];
        Object temp = array[3]; // ArrayIndexOutOfBoundsException thrown.
    }
}

private static class ThrowsNullPointerException implements  Thrower {
    @Override
    public void throwIt() {
        Object o = null;
        o.hashCode();   // NullPointerException thrown.
    }
}

It’s almost like we are using Thrower as a functional interface! We can now package the code snippet shared above in a method, which we will call expectThrowable():

private static boolean expectThrowable(Class<?> expectedClass, Thrower methodThatThrows) {
    Throwable exc = null;
    try {
        methodThatThrows.throwIt();
    } catch(Throwable thrown){
        exc = thrown;
    }
    return (exc != null && exc.getClass() == expectedClass);
}

Naturally, we could also make this method void and have it throw some other exception if our conditions are not met, but for simplicity let’s leave it as a boolean method for this example.

Some example calls to this method:

public static void main(String[] args) {
    System.out.println(expectThrowable(ArrayIndexOutOfBoundsException.class, new ThrowsArrayIndexOutOfBoundsException())); // true
    System.out.println(expectThrowable(AssertionError.class, new ThrowsArrayIndexOutOfBoundsException())); // false
    System.out.println(expectThrowable(NullPointerException.class, new ThrowsNullPointerException())); // true
    System.out.println(expectThrowable(AssertionError.class, new ThrowsNullPointerException())); // false
    System.out.println(expectThrowable(Throwable.class, new ThrowsNullPointerException())); // false, too generic
    System.out.println(expectThrowable(RuntimeException.class, new ThrowsNullPointerException())); // false, also too generic
}

So there you have it, one way to emulate ExpectedException with your own code. The way I see it, the bottlenecks here are the JVM’s set-up of the trycatch block and the upcastings in the main() method (late binding is slow, but it’s all we have in Java). Neither are avoidable even in the implementation of ExpectedException.

#java #junit #exceptions

A simple Rational API in Scala

This post is meant for Java programmers who are curious about what Scala has to offer them. It is often assumed that there is little reason to learn Scala ever since Java 8 included lambda expressions and streams. I find that assumption to be false, and with this example I hope to explain some of the reasons why.

I have been following the online course Functional Programming in Scala, offered by EPFL. In typical Jason fashion, instead of actually sticking to my deadlines and providing deliverables on time, I took one of the examples and extended it to my liking over a period of weeks. So let’s see how we would build an API for a rational number in Scala. As a reminder, a rational number is any number that can be written in the form a/b, where a and b are integer numbers, b non-zero. The goal of this exercise is to (a) Play around with Scala, showcasing that even the most basic features of the language can provide great benefits on readability and extensibility of code, and (b) Investigate whether our new library can offer time and accuracy improvements during computations that involve many rational numbers, when compared to the alternative that the compiler will do by default: generate the Double value that approximates a / b as good as it can within 64 bits, and work with those approximations. The entirety of the code is available under GPL here (src/main/scala/week2/rational). Let’s begin.

Initialization and basic methods

In Scala, we are allowed to execute statements inside a class’ body. Those statements make up the default constructor for the class. For our Rational type, we will do a neat trick: in order to make the various computations faster, we will make sure we reduce the fraction as much as possible during construction. For example, if the user requests of us to create the fraction 10/15, our implementation will reduce that to 2/3, as follows:

class Rational(x:BigInt, y:BigInt) {

  import Rational._

  require(y != 0, "Cannot create a Rational with denominator 0")

  // Simplify representation by dividing both numer and denom by gcd.
  @tailrec
  private def gcd(a:BigInt, b:BigInt) : BigInt = if(b == 0) a else 
                                             gcd(b, a % b)
  private val gcd : BigInt= gcd(x, y)
  private val numer = x / gcd // Make immutable
  private val denom = y / gcd

require will throw an IllegalArgumentException if the caller requests a rational with a denom of 0. The tail-recursive method gcd(a, b) will calculate the greatest common divisor of x and y, and then the constructor will consequently divide x and y and store the resulting values in numer and denom respectively. We use BigInt instead of Int for our inner representation so that we can deal with fractions of really large integers.

Of course, we can define other constructors if we would like, as in Java. For example, since all integers can be expressed as rationals (just give a denom of 1), we can do this:

def this(x:BigInt) = this(x, 1)

Furthermore, and this will be important for our experiments, we want to force the program to output the “pure” representation of our Rational instances instead of dividing them all the time, so we override toString as follows:

override def toString : String = numer + "/" + denom 

Such that, instead of println(20/100) printing 0.2, it will now print 1/5. Feel free to browse the code for overridings of equals and hashCode.

Operator overloading

Among the most discussed drawbacks of Java is lack of operator overloading, something that C++ programmers in particular may find limiting. In Scala, pretty much any alphanumeric string can be a method name, as long as it begins with a non-numeric character. Without doubt, for our rational number API, it would be very convenient for us if we could completely transparently overload the meaning of the classic binary arithmetic operators: +, -, *, /, etc. In our example, we overload all of those, as well as unary –, the caret operator (^) to denote exponentiation, as well as binary comparison operators (>, <, <=, etc). Here are some examples.

def + (that:Rational):Rational  = {
  require(that != null, "Rational + Rational: Provided null 
                                                       argument.")
  new Rational(this.numer * that.denom + that.numer * this.denom, 
                                      this.denom * that.denom)
}

def + (that:BigInt): Rational = this + new Rational(that, 1)

def * (that:Rational):Rational = {
    require(that != null, "Rational * Rational: Provided null 
                                                         argument")
    new Rational(this.numer * that.numer, this.denom * that.denom)
 }

def * (that:BigInt) = new Rational(numer * that, denom)

We use BigInt instances for the numerator and denominator representation since the stress tests we show later on can produce very large integers. In the example above, given two Rational instances a and b we define what the behavior of + should be in the expression a + b. We also define the behavior of a + i, where i is a BigInt instance. To re-use the existing implementation for a + b, a + i is translated to a + i/1, which is already defined.

For unary operators, Scala requires that we prefix the name of the operator with the keyword unary, to disambiguate them from binary operators:

def unary_- : Rational = new Rational (-numer, denom)

Binary comparison operators:

def > (that: Rational) :Boolean = numer * that.denom > that.numer * denom

def >= (that:Rational) :Boolean = (this > that) || (this == that)

def > (that:BigInt) : Boolean = this > new Rational(that, 1)

def >= (that:BigInt) : Boolean = {
  val r = new Rational(that, 1)
  (this > r) || (this == r)   // Scala calls "equals" when "==" is        
}                            //invoked

def < (that:Rational) : Boolean = !(this >= that)

def <= (that:Rational) : Boolean = !(this > that)

def <(that:BigInt): Boolean = {
  !(this >= that)
}

def <= (that:BigInt):Boolean = {
  !(this > that)
}

Re-using existing overloadings as much as we can. It’s a huge syntactic relief to be able to implement functionality via universally recognized operators instead of having to implement ugly methods like plus, minus, multiply, subtract,… And as we will see later, it makes it possible to alter the semantics of an entire computation chain just by changing one line of code!

“No” to bloated pattern-matching operators!

Notice in the code above that we overload the overloaded operators themselves by writing a different method for every type of argument. This seems a bit tedious at first, and one might think: isn’t it more of a “functional” approach to take an argument of type Any and then pattern match against it? Like our overriding of the equals method does:

private def canEqual(x:Any):Boolean = x.isInstanceOf[Rational]

private def sameRational(that:Rational ): Boolean = 
               (numer == that.numer) && (denom == that.denom) ||
               (numer == -that.numer) && (denom == -that.denom)

override def equals(that:Any):Boolean = {
  that match {
    case that:Rational => that.canEqual(this)  && sameRational(that)
    case that:BigInt => sameRational(new Rational(that, 1)) 
    case _ => false 
  }
}

For example, we could do something like this:

def + (that:Any):Rational  = {     
      require(that != null, "+(Rational, Any): Provided null 
                               argument.")     
      that match {         
          case that:Rational => new Rational(this.numer * that.denom 
                                         + that.numer *  this.denom, 
                                           this.denom * that.denom)         
          case that:Int | BigInt => new Rational(this.numer + that * 
                                    this.denom, this.denom) 
          case that:Double => /* ...*/
           case _ => throw new UnsupportedOperationException
                   ("+(Rational, Any): Unsupported operand.")      
      } 
 }

Based on the discussion here, I was able to ascertain that this would be a terrible idea, because it would change a compile-time error to a runtime error. For instance, with our current implementation which only overloads + for BigInt and Rational arguments, the code snippet:

val y = new Rational(10, 11)
y + 2.4

does not compile:

Demonstration of compile-time error when attempting to use an operator on unknown types. String is also an expected type because it’s possible to concatenate the output of Rational::toString with the string literal “2.4” to produce the concatenated String”10/112.4″.

On the other hand, if we implement + in the way above, the code would be prone to a runtime error, which is of course not preferred. So no matter how Java-esque and tedious it might seem, the multi-method way seems like the better solution.

Lazy evaluation of expressions that throw

The keyword lazy val in Scala does exactly what you think: it defines an expression head = body where body is evaluated only at the time of call of head. This can sometimes lead to interesting behavior. In the companion object to Rational, we can define some constants:

object Rational {

  val ONE = new Rational (1, 1)

  val ZERO = new Rational (0, 1)

  val MINUSONEOVERONE = new Rational(-1, 1)

  val ONEOVERMINUSONE = new Rational(1, -1)

  lazy val ZEROOVERZERO = new Rational(0, 0)
}

Note that the last constant, representing the undefined form 0/0, is a lazy val. Its RHS is a constructor call with both arguments set to zero. However, as a reminder, whenever a denominator of zero is provided to the Rational constructor, we make sure to throw an instance of IllegalArgumentException:

require(y != 0, "Cannot create a Rational with denominator 0")

Since the constant is lazy, its RHS, which leads to the exception throwing, will not be evaluated until after the companion object has been brought to life! This means that the class Rational itself has the capacity to internally define how it wants exceptional instantiations to behave! In this case, by “exceptional” instantiation we are referring to the undefined form 0/0 which has been encoded as a constant in the companion object, but we could imagine all sorts of classes with exceptional instantiations that might need to be documented! Especially for applications which monitor global state, where variables are mutated irrespective of what’s going on in the current thread’s call stack, having the capacity to define special cases of a type can be powerful!

Since all the statements within the body of the companion object have to be evaluated (unless lazy!) before the companion object can be brought to existence, stripping away the lazy keyword from the assignment’s LHS would not allow for object construction when requesting the constant ZEROOVERZERO, or any other constant for that matter. Feel free to pull the code, take away the keyword lazy from the line that defines ZEROOVERZERO and see that either one of those statements will throw:

object LazyEval extends App {

import Rational._
 // ONEOVERMINUSONE // This won't work if `ZEROOVERZERO` *ISN'T* lazy
 // ZEROOVERZERO    // Or this
}

It turns out that it is not possible to emulate this behavior in Java, that is, have an expression that is part of a type (so, having to be evaluated at construction time!) be evaluated lazily. The following type:

public class Uninstantiable {

    public static int badField = willThrow();

    public static int willThrow()  {
        throw new RuntimeException("willThrow()");
    }

}

Despite the fact that we only define a static field and method, is indeed uninstantiable:

public class UninstantiableRunner {

    public static void main(String[] args) {
        // Cannot instantiate;
        new Uninstantiable();
    }
}
NNNNNOOOOOOOOOOOOOOOO

Now, care must be paid to ensure that nobody thinks we are making the wrong claim. It is not the claim that lazy behavior cannot be implemented in Java. It absolutely can, and it’s a well-known and widely – used pattern. Here’s a simple example:

public class JavaLazyVal {

    private long val;

    // Pay once.
    public long getVal() {
        if (val == 1) {
            System.out.println("Performing an expensive 
                                computation...");
            val = (long) Math.pow(3, 10);   // Assumed expensive. Have 
                                           // to ensure that value 
                                          // CAN'T be equal to the 
                                          // initializer value!
        } else {
            System.out.println("No longer performing expensive 
                                  computation...");
        }
        return val;
    }

    public JavaLazyVal() {
        val = 1;
    }
}

A simple runner program:

public class JavaLazyValRunner {

    public static void main(String[] args) {
        JavaLazyVal jlv = new JavaLazyVal();
        System.out.println("val = " + jlv.getVal());
        System.out.println("val = " + jlv.getVal());
    }
}

And its output:

Lazy behavior of getVal() exemplified.

So it is absolutely possible to have lazy behavior in Java and this is a very known pattern for Java programmers. It’s not the claim that this can’t be done. The claim is three-fold. First, in Scala you can do this much, much more concisely:

object LazyEval extends App {
  lazy val x = {  // Entire scopes can be rvalues of Scala expressions     
    println("Evaluating x")
    scala.math.pow(3, 10)
  }
  println(x) ; println(x)
}

/* Output:
 *
 * Evaluating x
 * 59049.0
 * 59049.0
*/

Second, the Java code that we had to – necessarily – introduce is open to one possible source of logical error: that the expensive computation also returns the initializer value! In some applications, this can be hard to ensure!

Third, with the mechanism of lazy val, you can store expressions which would otherwise prohibit the construction of a class instance! And that can have power.

Tail-recursive repeated squaring

This isn’t in any way Scala or Functional Programming – specific, just something cool that can be written rather consisely in this language / paradigm. We overload the exponentiation operator (^) to perform repeated squaring with a tail-recursive method:

def ^ (n:Int):Rational = {

    // Inner two-arg function pow/2. Not tailrec: 
    // the inner method pow/4 is tailrec.
    def pow(base:BigInt, exp:Int):BigInt = {
      require(exp >= 0 && base >=0, "We need positive integer 
                      arguments for this method.")
      // Power computation with tail-recursive repeated squaring.
      @tailrec
      def pow(currExp:Int, maxExp:Int, currTerm:BigInt, 
                          prodAccum:BigInt): BigInt = {
        assert(currExp <= maxExp, "The current exponent should never 
                  surpass the original one.")
        if(currExp <= maxExp / 2)
             // Next iteration on current term.
             pow(2*currExp, maxExp, currTerm * currTerm, prodAccum)    
        else if (currExp == maxExp) 
               // Bottomed out.
             currTerm  * prodAccum  
        else  
           // Compute residual product (rest of terms) 
           // using the same method.
            pow(1, maxExp - currExp, base, currTerm * prodAccum)  
      }

      if (base == 0 && exp == 0) throw new IllegalArgumentException
                                       ("0^0 is an undefined form.")
      else if(base == 0 && exp > 0) 0
      else if(base > 0 && exp == 0) 1
      else if(exp == 1) base
      else if(base == 1) 1
      else pow(1, exp, base, 1)   // Call to tailrec pow/4
    }

    if(n == 1) this
    else if(n == 0) ONE

    // Calls to (non-tailrec) pow/2
    else if(n < 0) new Rational(pow(denom, -n), pow(numer, -n))
    else new Rational(pow(numer, n), pow(denom, n))
  }

}

The innermost 4-arg method pow is of interest here. The snippet (3 / 2)^19 ends up translated to the calls pow(1, 19, 3, 1) for the numerator 3^19and pow(1, 19, 2, 1) for the denominator 2^19 in the line else new Rational(pow(numer, n), pow(denom, n)). Observe that 3^19 = 3^16 * 3^2 * 3^1. Through the machinery of repeated squaring, the first if condition in pow/4 iterates (literally) over all the possible values of the current exponent that wouldn’t “surpass” the maximum exponent if squared. Through 4=log_2(16) iterations, it will calculate the value of 3^16 and then, using the recursive call pow(1, maxExp - currExp, base, currTerm * prodAccum), it can take the intermediate computation into consideration through the product currTerm*prodAccum.

Stress testing

So, all this is cool and all, but how do we fare in practice? We want to make two measurements:

(1) How well does our Rational type perform in terms of accuracy and time? To measure this, we run two map-reduce rasks on a large chain of rational numbers under both the Double and Rational representations, and compare results. We generally expect that we will be winning in accuracy but losing in time, since we have to spend significant time on calls to the Rational constructor.

(2) How well does our exponentiation method scale when compared to scala.math.pow, in terms of speed?

To measure time we used the following method presented here:

def time[R](block: => R, msg:String): R = {
    val t0 = System.nanoTime()
    val result = block    // call-by-name
    val t1 = System.nanoTime()
    println("Elapsed time for " + msg + " was " +  (t1 - t0) + "ns")
    result
  }

And use the following script for the first experiment:

println("======= EXPERIMENT 1: ACCURACY AND EFFICIENCY OF STANDARD  
                     MAP-REDUCE ========== ");

final val rng = new Random(47)
final val MAX_ITER = 1000 // Vary 
final val MAX_INT = 100   // Vary 
val intTupleSeq : Seq[(Int, Int)]= for (_ <- 1 to MAX_ITER) yield  
                                  (rng.nextInt(MAX_INT) + 1, 
                                        rng.nextInt(MAX_INT) + 1)
val quotientSeq : Seq[Double] = intTupleSeq map { case (a, b) => a / 
                                                 b.toDouble }
val rationalSeq : Seq[Rational] = intTupleSeq map { case (a, b) => 
                                       new Rational(a, b) }
// Sums first....
val quotientSum = time( quotientSeq.sum, "quotient sum")
val rationalSum = time(rationalSeq.reduce((x, y) => x+y), "Rational 
                                                     sum")
println("Value of quotient sum:" + quotientSum)
val evaluatedRationalSum = rationalSum.eval
println("Value of Rational sum:" + evaluatedRationalSum)
println("Error: " + Math.abs(quotientSum - evaluatedRationalSum))

// Products second...
val quotientProd = time( quotientSeq.product, "quotient product")
val rationalProd = time(rationalSeq.reduce((x, y) => x*y), "Rational 
                                            product")
println("Value of quotient product:" + quotientProd)
val evaluatedRationalProd = rationalProd.eval
println("Value of Rational product:" + evaluatedRationalProd)
println("Error: " + Math.abs(quotientProd - evaluatedRationalProd))

Keep in mind that the bigger the error, the better the case for our Rational type, since up to the point of the call to eval it has admitted zero representational degradation.

(i) Map-Reduce

We run two simple Map-Reduce tasks: sum and product over a list of 1000 randomly distributed strictly positive Ints. We vary the largest possible Int and generate time and error metrics for each.

Sum times:

We see that generating the chain of sums for the Rational type takes about seven-fold more time than that of generating the simple primitive Int sum. This is to be expected, since there are costs associated with claiming memory from the heap and performing several assignments. We should also remember that the Rational constructor performs the GCD reduction aggressively during construction, and the runtime of that algorithm is affected by the magnitude of the bigger of the two integers for which it is called.

What about accuracy?

Remember: the greater the error, the better for our Rational type. It seems as if we have to go all the way to the 12th decimal digit, in a 64-bit machine, to find any appreciable difference.

When it comes to the product task:

Now this is somewhat surprising, since it seems that the Rational type’s difference in execution speed is even greater than the case of sum. Once again, here is the source for +(Rational, Rational) and *(Rational, Rational).

def + (that:Rational):Rational  = {
    require(that != null, "Rational + Rational: Provided null 
                                        argument.")
    new Rational(this.numer * that.denom + that.numer * this.denom, 
                   this.denom * that.denom)
}

def * (that:Rational):Rational = {
    require(that != null, "Rational * Rational: Provided null 
                                        argument")
    new Rational(this.numer * that.numer, this.denom * that.denom)
}

This code describes the formulae for addition of a / b + c / d and (a/b) * (c/d). We see that in the constructor call of + we pay 4 additions / multiplications, whereas in the constructor call of multiplication * we pay 2 mults. So it’s definitely not the number of operations, but the cost of those operations, particularly when BigInt instances are involved. And, in a product formulation, BigInts can get really big.

What about product error?

Smaller benefits than the sum (note the different scale, from 10^-12 to 10^-16), to the point of not even being appreciable for small values! No idea why this happens. In all reality, I should have averaged all of those graphs over a number of calls to generate higher fidelity points.

One thing to consider as well is that in order to even produce error graphs, we need a way to evaluate the Rational instance analytically:

def  eval: Double = numer.doubleValue() /  denom.doubleValue()

(ii) Repeated squaring for exponentiation calculation

Just for fun, I elected to run an experiment to see how our overloading compares to scala.math.pow. Since a^n can be computed in log_2n steps using repeated squaring, we vary the exponent n and keep the base at 3/2 (1.5 in the Double representation). The computations are of course accurate (Error = 0). We are exclusively interested in time.

Results:

  println("=========== EXPERIMENT 2: EFFICIENCY OF 
                   EXPONENTIATION ========= ")
  final val EXPONENT = 500
  var quotientPower: Double = 0.0
  var rationalPower : Rational= _
  time(quotientPower = scala.math.pow(1.5, EXPONENT), "Double raised 
                             to the power of " + EXPONENT)
  time(rationalPower = new Rational(3, 2) ^ EXPONENT, "Rational raised 
                            to the power of " + EXPONENT)
  println("Value of power from Doubles: " + quotientPower + ".")
  println("Value of power from Rationals: " + rationalPower.eval + 
                                                         ".")
  println("Error: " + Math.abs(quotientPower - rationalPower.eval))

Results:

It is laughable to even assume that a custom implementation would in any way compare to scala.math.pow, but what is interesting is the fact that both implementations appear to scale very well as the exponent is increased. This is to be expected because of the power of the logarithm, which “tempers” even large values of n to very tractable values of magnitude ceil(log_2n).

Conclusion

In this post, we saw several advantages of the Scala implementation of a rather simplistic numerical type over the relevant Java implementation. The big advantages have to do not with the runtime, but with the source. In Scala, we are able to overload operators naturally, like with C++, albeit with more intuitive syntax and the usual perks of not having to deal with raw pointers or memory cleanup (if ever required in the overloading of some operator…). We can also use the power of built-in lazy vals in order to do cool things such as allow for types to define their own exceptional instances and deal with them in any way they please (even dynamically). We also compared two different ways of overloading methods in Scala, and deduced that it would be far better to actually emulate the Java-like way of several methods with the same name, instead of pattern – matching on an argument of type Any, as is the more “natural” way in Scala. Finally, we saw a simple application of the language’s Syntax and its emphasis on tail recursion and inner methods with our repeated squaring method.

In my view, the big advantage of this entire experiment is the elegance and modularity with which we can switch from Double to Rational and vice versa. An example can be found in the object Runner:

object Runner extends App {

  private def fraction(a:Int, b:Int) = {
    require(b!= 0, "Cannot create a fraction with denominator zero.")
     new Rational(a, b)
     // a.toDouble / b.toDouble
  }

  val a = fraction(5, 6)
  val b = fraction(2, 3)
  val c = fraction(1, 2)
  val d = fraction(10, 11)
  val e = fraction(25, 15) // Rational will reduce to 5/3
  val f = fraction(23, 13)
  val g = fraction(230, 17)
  println(a * b + c*d - e * (f + g))
}

The output of the above code as given is: -535765/21879, whereas, if we uncomment the final line of fraction(Int, Int), we have -24.487636546460074, the evaluated fraction. And the only thing we need change is the inclusion or exclusion of the character string //.

All that said, I would definitely not implement numerical software on top of the JVM. C / C++ / Native Assembly is the only way to go if we want to go that low-level. Furthermore, it is not clear to me, a person who is not professionally involved with commercial – grade numerical software, how a 12th decimal digit difference in accuracy can in any way be significant. Perhaps NASA and SpaceX care about such small differences, to make sure that autonomous vehicles land on Mars and not Venus. Perhaps not. Perhaps the advantage of being able to move to smaller bit widths (Arduino, IoT devices) would ratify the use of a “pure” numerical type such as our Rational. The matter of symbolic computation is huge, interesting, important, and with this blog post we are not even making a ripple on the surface.

Breaking @tailrec

To follow through this example, you will need to know some basic (really basic) Scala. You will also need to know what call-by-name and call-by-value means in the context of functional programming. Those are concepts very loosely related to “pass-by-value” and “pass-by-reference” in imperative programming; you should not confuse them. tl;dr at the end.

I have recently been learning Scala and functional programming in general. Recursive exercises are a great warm-up to make the imperative programmer start thinking in a more recursive manner, so one of the first few examples I wrote was factorial. I realized I needed to understand quickly how the language treats tail recursion, and factorial is a great exercise for that.

The following intuitive one-liner is, unfortunately, not tail recursive, but at least the stack grows linearly with n:

def factorialNaive(n:Int) = if(n == 0) 1 else n * factorialNaive(n-1)

So a tail-recursive implementation is quite appropriate here, and the following pair of functions will get the job done in exactly such a manner:

def factorialTailRec(n: Int) : Int = {
    @tailrec
    def factorialTailRec(n: Int, f:Int): Int = {
      if (n == 0) f
      else factorialTailRec(n - 1, n * f)
    }
    factorialTailRec(n, 1)
  }

Now at this point it’s important to remind ourselves that the @tailrec annotation, which can be used after we import scala.annotation.
tailrec, is not required, but is a very good idea to include since it will warn us at compile-time if the recursive call is not in tail position. If we were to use that annotation for factorialNaive, the code would not compile:

Now here is where an evil idea creeped into my mind. I noticed that I was passing the second parameter of factorialTailRec by value. I thought to myself: “This means that every “stack frame” (more like iteration scope at this point) is burdened with one multiplication… so all the way down the call chain we have n-1 multiplications. The alternative, of passing the parameter by name, would delay any multiplication up to the last stack frame, where we bottom out with a term of 1. So I would expect similar performance.”

Turns out, the above is only partly true, and the most interesting fact is not included in it! You see, while it is the case that the product computation is delayed until the very end, its terms are embedded within every one of the stack frames by the anonymous function that the called-by-name parameter builds! The evaluation of every multiplication i * (i+1) with 0 <= i < n, even when i+1 can be evaluated within the current stack frame, requires popping another stack frame, a frame that was previously pushed by the otherwise tail-recursive call that made a call-by-name for its 2nd argument!

To test this, we can use the following runner program. Note that I don’t care at all about the actual factorial values, so I let the result overflow and be as non-sensical as it likes. I’m not even assigning it anywhere. I’m interested exclusively in how the code affects the use of the JVM’s stack. For the hyper-exponential factorial function, even the Long data type is not sufficient and one had best use an efficient BigInteger library or Stirling’s approximation if they care about computing the values of large factorials.

package factorial

import scala.annotation.tailrec

object Factorial extends App {

  val ITERS = 100000      // Let's push things

  // Naive Factorial
  def factorialNaive(n:Int) : Int =  if(n == 0) 1 else n * factorialNaive(n-1)

  try {
    for (i <- 1 to ITERS) factorialNaive(i)
    println("Naive factorial worked up to " + ITERS + "!.")
  } catch{
    case  se:StackOverflowError => println("Naive factorial threw an instance of StackOverflowError.")
    case e: Exception => println("Naive factorial threw an instance of " + e.getClass + " with message: " + e.getMessage + ".")
  }

  // Tail-recursive factorial
  def factorialTailRec(n: Int) : Int = {
    @tailrec
    def factorialTailRec(n: Int, f:Int): Int = {
      if (n == 0) f
      else factorialTailRec(n - 1, n * f)
    }
    factorialTailRec(n, 1)
  }

  try {
    for(i <-1 to ITERS) factorialTailRec(i)
    println("Tail recursive factorial worked up to " + ITERS + "!.")
  } catch{
      case  se:StackOverflowError => println("Tail recursive factorial threw an instance of StackOverflowError.")
      case e: Exception => println("Tail recursive factorial threw an instance of " + e.getClass + " with message: " + e.getMessage + ".")
  }

  println("Exiting...")

}

Notice that in factorialTailRec, the accumulator argument is passed by value. Running this program for the given parameter of ITERS=100,000 yields the output:

Naive factorial threw an instance of StackOverflowError.
Tail recursive factorial worked up to 100000!
Exiting...

Nothing surprising so far. But what if I were to pass the second argument by name instead, by changing the method declaration to:

def factorialTailRec(n: Int, f: => Int): Int = ...

Then, our output is:

Naive factorial threw an instance of StackOverflowError.
Tail recursive factorial threw an instance of StackOverflowError.
Exiting...

Everybody has my express permission to print the above output and paste it on their office doors, perhaps appropriately captioned. Not to mention that @tailrec did not complain at all! The file compiled just fine. I’m not sure whether it’s possible to have @tailrec disallow call-by-name parameters. It’s one of those things that sound easy to do, but are probably very difficult in practice.

Now let’s play a little game. Suppose that for whatever reason you cannot change the signature of factorialTailRec and you are stuck with an idiotic call-by-name that has doomed your method to be tail-recursively constructing a linear-size stack for a function that it itself builds… ūüė¶ Is there anything we can do to achieve call-by-value?

Yup. This:

def factorialTailRec(n: Int) : Int = {
    @tailrec
    def factorialTailRec(n: Int, f: => Int): Int = {
      if (n == 0) {
        val fEvaluated = f
        fEvaluated
      }
      else {
        val fEvaluated = f
        factorialTailRec(n - 1, n * fEvaluated)
      }
    }
    factorialTailRec(n, 1)
  }

Which is a cheap general template to emulate call-by-value from a called-by-name parameter if you need it. Right now with my limited Scala experience, I can’t envision a scenario where this would be preferable to a simple call-by-value declared at the method signature level, but hey, it works:

Naive factorial threw an instance of StackOverflowError.
Tail recursive factorial worked up to 100000!
Exiting...

Finally, to once again underscore the difference between val and def, changing the two lines val fEvaluated = f into def fEvaluated = f does nothing to evaluate the value of the parameter, since the defs themselves are calls by name (aliases)! In fact, I’d be willing to bet that it doubles the constant factor in front of the O(n) space occupied by the stack.

Naive factorial threw an instance of StackOverflowError.
Tail recursive factorial threw an instance of StackOverflowError.
Exiting...

tl;dr I built a tail-recursive method in Scala that blows up the JVM’s stack.

A better number sequence for teaching Cantor’s diagonalization.

I’ve been teaching CMSC250: Discrete Structures in CS UMD for a while now. Essentially a more CS-friendly version of Discete Mathematics, with more logic, set theory, structural induction, countability. Countability, a rather esoteric subject, is challenging for our students. Naturally, a student that doesn’t get countability because they don’t get bijections cannot be helped until they understand bijections, but when it¬† comes to the Cantorian diagonalizing proof that the reals are uncountable (to be precise, that the interval [0, 1]) is uncountable), one can lose many more students.

I conjecture that one possible reason for this is that the sequence of numbers that is used to create  the 2D matrix is usually a bit arbitrary in both texts and slides. For example, in my own slides, I have this following sequence of numbers:

Screenshot 2018-02-28 12.50.38

With such a sequence of numbers, I have observed that even medium-to-strong students oftentimes have trouble. So let’s try to improve our example. Here’s a nice trick: Write down only the diagonal portion of the listing of reals:

r_1 = 0. \mathbf{2}xxxxxx\dots

r_2 = 0.x\mathbf{4}xxxx\dots

r_3 = 0.xx\mathbf{2}xxx\dots

r_4 = 0.xxx\mathbf{9}xx\dots

r_5 = 0.xxxx\mathbf{0}x\dots

r_6 = 0.xxxxx\mathbf{8}\dots

\ \vdots \qquad \vdots \qquad \vdots \qquad \vdots

Since the diagonal values are the only ones that you need to construct the number of interest, go ahead and construct it. I call it r' (r “prime”) only because I would like to use r later.

r' = 0.353019\dots

And now we will go replace all of those x‘s with the same-index digits of the number itself.¬†For example, we will replace the 2nd, 3rd, 4th, 5th and 6th digit of r_1 with 5, 3, 0, 1 and 9 respectively. We will also replace the 1st, 3rd, 4th, 5th and 6th digits of r_2 with the relevant digits of r', and so on and so forth:

r_1 = 0. \mathbf{2}53019\dots

r_2 = 0.3\mathbf{4}3019\dots

r_3 = 0.35\mathbf{2}019\dots

r_4 = 0.353\mathbf{9}19\dots

r_5 = 0.3530\mathbf{0}9\dots

r_6 = 0.35301\mathbf{8}\dots

\ \vdots \qquad \vdots \qquad \vdots \qquad \vdots

 

The benefit of carefully constructing the number sequence in this way is that now the student can¬†see¬†that those numbers, no matter how¬†hard they try¬†to look like our own constructed number r', they fail¬†in at least one decimal digit. In fact, with this visualization of the first 6 decimal digits for every real r_j, there is a difference of¬†exactly¬†one decimal digit. We have essentially pushed the argument to its limits: even if we had a real r_k whose 1st, 2nd, 3rd, …. (k-1)th, (k+1)th, (k+2)th, …. digits are the same, r_{k_k} is¬†guaranteed, by¬†construction of r', to be different from a_k.

The final piece of the argument can perhaps be shown as follows: The statement “[0, 1] is countable”, can be re-worded as: “For every real r in [0, 1], there is some positive integer j such that r = r_j.

The way I like to proceed from that point onward is the following: Since there is some positive integer j such that every single r \in [0, 1] can be written as r_j, then this is also the case for our constructed real number r' since, clearly, by the way that I have constructed r', it is a number between 0 and 1!

Maybe this j is equal to 1. But this can’t be, since we notice that the numbers differ in the first decimal digit, again by construction. And this is where the construction comes to help:¬†all the other (visible) digits are the same. However, it suffices for there to be a difference in a single digit in order for us to say that, for a given¬†j,\ r \neq r_j.

Maybe j = 2. The first digit is ok, it is 3 in both r_2 and r', But the second digit gives us a difference, despite the fact that all other digits are the same!

Maybe j = 3, and so on and so forth.

So the point is that this enumeration actually provides a Discrete Mathematics student with more intuition about why the proof works.

 

 

 

 

A deep take on countability, cardinality and ordering

I’ve been teaching CMSC250, Discrete Mathematics, over the past year in CS UMD. Last semester, I typed a more philosophical than mathematical post on Countability, Cardinality and Ordering, which I’m repeating here for the community’s sake.


After our ordinality lecture last Tuesday, I had a student¬†come to me and tell me that they were¬†not sure how to¬†think¬†about ordinality: they were¬†understanding the relationship between cardinality and size, since it is somewhat intuitive even for infinite sets (at least to them!), but ordinality still appeared esoteric. That’s 100% natural, and in this post I will¬†I’ll try to stray away from math and try to explain how I think about countability, cardinality and ordinality¬†intuitively.¬†This post has exactly zero things to do with the final, so if you want to limit your interactions with¬†this website to the exam-specific, you may stop reading now.

Before we begin, I would like to remind you of a definition that we had presented much earlier in the semester, I believe during an online quiz: A set S is dense if between any two elements of it, one can find another element. Note something interesting: only ordered sets can be qualified as dense or not! Technically, we had not presented the notion of an ordered set when we discussed dense sets, but it is intuitive enough that people can understand it.

Countability

We say that any enumerable set is countable. Enumerable, mathematically, means that we can find a bijection from the non-zero naturals to the set. Intuitively, it means¬†“you start from somewhere, and by sequentially making one step, no matter how long it takes, you are guaranteed to reach every single element of the set in finite time”. Whether this finite time will happen in one’s lifetime, in one’s last name’s lifetime, or before the heat death of the universe, is inconsequential to both the math and the intuition. Clearly, this is trivial to do for either the non-zero naturals or the full set of naturals: you start from either 1 or 0, and then you make one step “forward”.

However, we also saw in class that this is possible to also generalize for the¬†full set of integers:¬†we start from 0 and then start hopping around and about zero, making bigger hops every time. Those hops are our steps “forward”.

Those results are probably quite intuitive to you by now, and I feel that the reason for this might that both LaTeX: \mathbb{N} and LaTeX: \mathbb{Z} are non-dense sets.There are no naturals or integers between LaTeX: n and LaTeX: n+1 (LaTeX: n \in \mathbb{N}  or LaTeX: n \in \mathbb{Z} ).

Let’s stray away from¬†LaTeX: \mathbb{Q} ¬†for now and fast-forward to¬†LaTeX: \mathbb{R} . We have already shown the mathematical reason,¬†Cantor’s diagonalization,¬†for which the set of reals is uncountable. But what’s the intuition? Well, to each their own, but here’s how I used to think about it as a student:¬†Suppose that I start from zero¬†just to make things easier with respect to my intuitive understanding of the real number line¬†(I could’ve just as well started with¬†LaTeX: -e^8).¬†

Then, how do I decide to make my step forward? Which is my second number? Is it 0.1? Is it -0.05? But, no matter which I pick as my second number,¬†am I not leaving infinitely many choices in between, rendering it necessary that I recursively look into this infinite interval? Note that I have not qualified “infinite” with “countably infinite” or “uncountably infinite” yet. This was my personal intuition as a Discrete Math student about 11 years ago about why¬†LaTeX: \mathbb{R} ¬†is uncountable:¬†Even if you assume that you can start from 0, there is no valid ordering for you to reach the second element in the sequence of reals! Therefore, such a sequence cannot possibly exist!

But hold on a minute; is it not the case that this argument can be repeated for LaTeX: \mathbb{Q} ? Sure it can, in the sense that between, say, LaTeX: 0 and LaTeX: \frac{1}{2}, there are still infinitely many rationals. It is only after we formalize the math behind it all that we can say that this is a countable infinity and not an uncountable one, as is the case of the reals. But still, we have to convince ourselves: why in the world is it that the fact that every one of these infinite numbers can be expressed as a ratio of integers make that infinity smaller than that of the reals?

Here’s another intuitive reason why we will be able to scan every single one of these numbers in finite time: everybody open the slide where we prove to you that¬†LaTeX: \mathbb{Q}^{>0} ¬†is countable using the snaking pattern. Make the crucial observation that¬†every one of the diagonals scans fractions where the sum of the denominator and the numerator is static!¬†The first diagonal scans the single fraction (LaTeX: \frac{1}{1}) where the sum is 2. The second one scans the fractions whose denominator and numerator sum is 3 (LaTeX: \frac{1}{2},\ \frac{2}{1}). In effect, the¬†LaTeX: i^{th}¬†diagonal scans the following fractions:

LaTeX: \{ \frac{a}{b} \mid (a,b \in \mathbb{N}^{\geq 1}) \land (a + b=i+1)\}

For those of you that know what equivalence classes are, we can then define LaTeX: \mathbb{Q}^{>0}  as follows:

LaTeX: \mathbb{Q}^{>0} = \bigcup_{i \in \mathbb{N}}\{ \frac{a}{b} \mid (a,b \in \mathbb{N^{\geq 1}}) \land (a + b=i+1)\}

Let’s see this in action…

LaTeX: \mathbb{Q}^{>0} = \{ \color{red}{\underbrace{ \frac{1}{1}}_{i=1}}, \color{blue}{ \underbrace{\frac{1}{2}, \frac{2}{1}}_{i=2}} , \color{brown}{\underbrace{\frac{1}{3}, \frac{2}{2}, \frac{3}{1}}_{i=3}}, \dots \}

Note that essentially, with this definition, we have defined a bijection from¬†LaTeX: \mathbb{N^{\geq 1}} \times \mathbb{N^{\geq 1}}¬†to¬†LaTeX: \mathbb{Q}. We know that¬†LaTeX: \mathbb{N}^{\geq 1} \times  \mathbb{N}^{\geq 1}¬†is countable, so we now know that¬†LaTeX: \mathbb{Q}^{> 0} ¬†is also countable! ūüôā

Let’s constrain ourselves now to the original challenge that we (I?) are faced with: we have selected 0 as our first element in the enumeration of both¬†LaTeX: \mathbb{Q} ¬†and¬†LaTeX: \mathbb{R} ¬†(the latter is assumed to exist), and no matter which our second element is (say it’s¬†LaTeX: \frac{1}{2}), we have infinitely many elements in both sets between 0 and¬†LaTeX: \frac{1}{2}.¬†But now we know that those infinites are different: in the case of¬†LaTeX: \mathbb{Q} . we know for a fact that we will reach all of those fractions whose decimal values are in¬†LaTeX: (0, 0.5). In the case of¬†LaTeX: \mathbb{R} ,¬†there is no such enumeration: any enumeration we define will still leave an… uncountably infinite gap between any two elements in “sequence”.

Remember how in our lecture on Algebraic and Transcendental numbers, we gave only three examples of numbers in¬†LaTeX: TN, yet the fact that¬†LaTeX: TN¬†is uncountable when¬†LaTeX: ALG¬†is countable guarantees that there are¬†“many more”¬†Transcendental numbers than Algebraic?¬†Same thing applies here with the rationals and irrationals: given any interval of real numbers¬†LaTeX: (r_1, r_2), there are many more irrationals than rationals inside that interval... If you define a system of whole numbers (integers), there are many more quantities that you will¬†not¬†be able to express as a ratio of integers. That’s why back in the day (300 B.C) when Euclid proved¬†that¬†LaTeX: \sqrt 2¬†is not expressible as such a ratio¬†LaTeX: \frac{a}{b}¬†(or, more accurately, that¬†LaTeX: 2¬†cannot be expressed as the square¬†LaTeX: \frac{a^2}{b^2}) his result was so unintuitive; those Hellenistic people did not have rulers. They did not have centimeters or other accepted forms of measurement. The only thing they had were shoestrings, or planks of wood which they put in line and “saw” that they were the same length, and then they measured everything else as the¬†ratio¬†of such “whole” lengths.

 Cardinality

Recall something that we said when we were discussing the factorial function and its combinatorial interpretations when applied on positive integers. Bill’s explanation of why¬†LaTeX: 0!=1¬†was purely algebraic: If it were¬†LaTeX: 0, then, given the recursive definition¬†LaTeX: n!\:=\:n\:\cdot\left(n-1\right)!¬†for¬†LaTeX: n\:\ge1, every¬†LaTeX: n!¬†would be¬†LaTeX: 0,¬†rendering it a pretty useless operation. My explanation was combinatorial: we know that if we have a row of, say,¬†LaTeX: n¬†marbles, there are¬†LaTeX: n!¬†different ways to permute them, or¬†LaTeX: n!¬†different orderings of those marbles. When there are no marbles, so¬†LaTeX: n=0,¬†there is only one way to order them: do nothing, and go watch Netflix.¬†

Let’s stick with Bill’s interpretation for a moment:¬†the fact that some things need to be defined in order to make an observation about the real world work. In this case, the real world is defined as “algebra that makes some goddamn sense”. My explanation is more esoteric. You could say:¬†“What do you mean there’s only one way to arrange zero things? I don’t understand, if there are zero things and there’s nothing to do, shouldn’t there be, like, 0 ways to arrange them?”.¬†So, let’s stick with Bill’s interpretation to explain something that I attempted to explain to a group of students after our first lecture this semester:¬†Why do negative numbers even exist?

Here’s one such utilitarian explanation:¬†Because without negative numbers, Newtonian Physics, with their tremendous application in the real world, would not work.¬†That is, the model of Newtonian Kinematics with its three basic laws, which has been¬†empirically proven¬†to¬†describe very well¬†things that we¬†observe in the real world,¬†needs the framework of negative¬†numbers in order to, well, work. So, if you’re not ok with the existence of negative numbers,¬†you had better also be able to describe to me a framework that explains a bunch of observations on the real world in some way that doesn’t use them. For example, you probably all remember the third law of Newtonian motion: For every¬†action¬†LaTeX: \color{red}{\vec{F}}, there exists an¬†equal¬†and opposite¬†reaction¬†LaTeX: \color{red}{-\vec{F}}:

Recall that force is a vectoral quantity since it is the case that LaTeX: \vec{F} = m \cdot \vec{a}, and acceleration LaTeX: \vec{a} is clearly vectoral, as the second derivative of transposition LaTeX: \vec{x}. 

The only way for Newton’s third law of motion can work is if¬†LaTeX: \vec{F} + (-\vec{F}) = \vec{0}. This is only achievable if the two vectors have the same magnitude but exactly opposite directions. No other way. Hence the need to define the magnitudes as follows:

LaTeX: | |\vec{F}|| = \frac{1}{2} \cdot m \cdot a^2,\ | |\vec{\color{red}{-}F}|| = \color{red}{-}\frac{1}{2} \cdot m \cdot a^2

and the necessity for negative numbers becomes clear. Do you guys think the ancient Greeks or Egyptians cared much for negative numbers? They were building their theories in terms of things they could touch, and things that you can touch have positive mass, length, height…

Mathematics is not science.¬†It is an agglomeration of models that try to axiomatize things that occur in the real world.¬†For another example, ZFC Theory was developed in place of Cantorian Set Theory because Cantorian Set Theory can lead to crazy things such as¬†Russel’s Paradox. Therefore, ZFC had to add more things to Set Theory to make sure that people can’t do crazy stuff like this.¬†If we discover contradictions with the real world given our mathematical model,¬†we have to refine our model by adding more constraints to it.¬†Less constraints, more generality, potential for more contradictions. More constraints, less generality, less contradictions, but also more complexity.

So when discussing the cardinality of¬†LaTeX: \mathbb{N} ¬†and¬†LaTeX: \mathbb{Z} ¬†and finding it equal to¬†LaTeX: \aleph_0, we are faced with a problem with our model:¬†the fact that¬†LaTeX: \color{magenta}{\mathbb{N} \subset \mathbb{Z}}¬†(I have used the notation of¬†proper subset¬†here deliberately). Now, I just had a look at our cardinality slides, and it is with joy that I noticed that¬†we don’t use the subset / superset notation¬†anywhere.¬†That’s gonna prove a point for us.

So, back to the original problem: intuitively understanding why the hell LaTeX: \mathbb{N}   and LaTeX: \mathbb{Z}  have the same cardinality when, if I think of them on the real number line, I clearly have LaTeX: \mathbb{N} \subset \mathbb{Z}:

LaTeX: \underbrace{\dots, -4, -3 , -2, -1, \underbrace{0, 1, 2, 3, 4, \dots}_{\mathbb{N}} }_{\mathbb{Z}}

The trouble here is that¬†we have all been conditioned from childhood to think about the negative integers as “minus the corresponding natural”. This conditioning is not something bad:¬†it makes a ton of sense when modeling the real world,¬†but when comparing cardinalities between infinite sets, that is, sets that will never be counted entirely in finite time,¬†we distance ourselves from the real world a bit, so we need a different mathematical model. To that end, let’s build a new model for the naturals. Here are the naturals under our original model:

LaTeX: 0, 1, 2, 3, \dots

This digits that we have all agreed to be using¬†have not been around forever. The ancient Greeks used lowercase versions of their alphabet:¬†LaTeX: \alpha, \beta, \gamma, \delta , \epsilon, \sigma \tau ', \zeta,\ \dots\ \omega ¬†to name a total of 25 “digits”, while the Romans used a subset of their alphabet “stacked” in a certain way:¬†LaTeX: I, II, III, IV, V, VI, \dots, X, XI\dots . These “stacked” symbols¬†cannot be really called¬†digits¬†the way that we understand them, especially since new symbols appear long down the line (LaTeX: C, M) etc. These symbols¬†we actually owe to the Arabic Renaissance of the early Middle Ages.

The point is that¬†I can rename every single one these numbers¬†in a unique¬†way and still end up with a set that has the exact same properties (e.g closure of operations, cardinality, ordinality) as¬†LaTeX: \color{red}{\mathbb{N}}. This is formally defined as the¬†Axiom of Replacement. So, let’s go ahead and describe¬†LaTeX: \mathbb{N} ¬†by assigning a¬†random string¬†for every single number,¬†assuming that no string is inserted twice:

LaTeX: foo, bar, otra, zing, tum, ghi,\dots

Which corresponds to our earlier

LaTeX: 0, 1, 2, 3, 4, 5,\dots

Cool! Now the axiom of replacement clearly applies to LaTeX: \mathbb{Z}  as well, so I will rewrite

LaTeX: \dots, \color{blue}{-5, -4, -3, -2, -1,}\ \color{magenta}{0, 1, 2, 3, 4, 5,}\dots

into:

LaTeX: \dots, \color{blue}{qwerty, forg, vri, zaq,  nit,}\ \color{magenta}{bot, ware, yio, bunkm, ute, kue,}\dots

Call these “transformed” sets¬†LaTeX: \mathbb{N}_{new}¬†and¬†LaTeX: \mathbb{Z}_{new}¬†respectively. Under this encoding, guys, I believe it’s a lot more obvious that¬†LaTeX: \mathbb{N}_{new} \not\subset \mathbb{Z}_{new}¬†in the general case.¬†LaTeX: \mathbb{N}_{new} \subset \mathbb{Z}_{new}¬†under these random encodings is so not-gonna-happenish that its probability¬†is not even axiomatically defined. Therefore, now we can view¬†LaTeX: \mathbb{N} ¬†and¬†LaTeX: \mathbb{Z} ¬†as¬†infinite lines floating around space, lines that¬†we have to somehow put next to each other¬†and see¬†whether we can line them up¬†exactly. If you tell me that even under this visualization, the line that represents¬†LaTeX: \mathbb{Z} _{new}¬†is¬†infinite in both directions, whereas that of¬†LaTeX: \mathbb{N}_{new} ¬†has a starting point (0), then I would tell you that I can effectively “break” the line that represents¬†LaTeX: Z_{new}¬†in the middle (0) and¬†then mix the two lines together according to the mapping that corresponds to:

LaTeX: 0, 1, -1, 2, -2, 3, -3, \dots

Now we no longer have the pesky notation of the minus sign, which pulls us to scream “But the naturals are a subset of the integers! Look! If we just take a copy of the naturals and put a minus in front of them, we have the integers!”.¬†We only have two infinite lines, that start from somewhere, extend infinitely, and it is up to us to find a 1-1 and onto¬†mapping between them.¬†That is, it is up to us find a 1-1 mapping between:

LaTeX: foo, bar, otra, zing, tum, ghi,\dots

and

LaTeX: bot, ware, nit, yio, zaq, bunkm\dots

(Note that I re-ordered the previous encoding¬†LaTeX: \dots, \color{blue}{qwerty, forg, vri, zaq,  nit,}\ \color{magenta}{bot, ware, yio, bunkm, ute, kue,}\dots¬†according to the “hopping” map into ¬†LaTeX: \color{magenta}{bot}, \color{magenta}{ware}, \color{blue}{nit,} \color{magenta}{yio}, \color{blue}{zaq}, \color{magenta}{bunkm},\dots¬†.)

Under this “visual”, you guys,¬†it makes a lot of sense to try to estimate if the two sets have the same cardinality and, guess what, they do ūüôā

Not much else to say on this topic everyone. We can have a bunch of applications of the axiom of replacement to prove, for example, that the cardinality of the integers, LaTeX: \aleph_0, is also the cardinality of LaTeX: \mathbb{N} \times \mathbb{N}, LaTeX: \mathbb{Q} , etc. It is only when we start considering sets such as LaTeX: \mathbb{R} , \mathcal{P}(\mathbb{N}) and LaTeX: \{0, 1 \}^\omega  that this idea that we can be holding two infinite lines in space fails.

Ordinality

There’s not much to say here except that the easiest way to understand how an order differs from a set is to¬†consider an ordering exactly as such: an order of elements!¬†Think in terms of “first element less than second less than third less than …. “.¬†The simplest way possible. It is then that we can prove rather easily that¬†LaTeX: \omega \prec y \prec \zeta .

Things only become a bit more complicated when considering the ordering LaTeX: \omega + \omega:

LaTeX: 0 < \frac{1}{2} < \frac{3}{4} < \frac{5}{6} < \dots <1 < \frac{3}{2} < \frac{4}{3} <\dots <2 <\dots

Please note that this ordering is clearly not the same as LaTeX: \eta, the ordering of LaTeX: \mathbb{Q} . Between the first and the second element, for instance, there are countably many infinite rationals: LaTeX: \frac{1}{100}, \frac{2}{5}, \dots , \frac{3}{7}\dots  which are not included in the ordering. 

Finally, realize the meaning of “incomparable” orderings: a pair of orderings¬†LaTeX: \alpha, \beta ¬†will be called incomparable if, and only if:

LaTeX: (\alpha \npreceq \beta) \wedge (\beta \npreceq \alpha).

So please realize that this is not the same as saying, for instance, LaTeX: \beta \nprec\alpha .

I think this is all, I am bothered when I can’t explain something well to a student so I thought I’d share my views on countability in case the subject becomes easier to grasp.


As new data arrives, the covariance matrix takes notice.

The problem

I recently read a paper on distributed multivariate linear regression. This paper essentially deals with the problem of¬†when to update the global multivariate linear regression model in a distributed system, when the observations available to the system arrive in different computer nodes, at different times and, usually, at different rates. In the monolithic, single node case, the problem’s of course been solved in closed form, since for dependent variables y¬†and design matrix¬†X with examples in rows, the parameter vector¬†ő≤¬†can be found as per:

The linear regression solution.

This is a good paper, and anybody with an interest in distributed systems and / or ¬†linear algebra should probably read it. One of the interesting things (for me) was the authors’ explanation that, as more data arrives at the distributed nodes, a certain constraint on the spectral norm of a¬†matrix product that contains information about a node’s data becomes harder to satisfy. It was not clear to me why this was the case and, in the process of convincing myself, I discovered something that is probably obvious to everybody else in the world, yet I still opted to make a blog post about it, because why the hell not.

When designing any data sensor, it is reasonable to assume that the incoming multivariate data tuples will all have a non-trivial covariance. For example, in the case of two-dimensional data, it is reasonable to assume that all the incoming data points will not all lie on a straight line (which denotes full inter-dimensional correlation in the two-dimensional case). In fact, it is reasonable to assume that as more data tuples arrive, the covariance of the entire data tends to increase. We will examine this assumption again in this text, and we will see that it does not always hold water.

This hypothesized increase in the data’s covariance can be mathematically captured by the spectral (or “operator”) norm of the data’s covariance matrix. For symmetric matrices, such as the covariance matrix, the spectral norm is equal to the largest absolute eigenvalue of the matrix. If¬†a¬†matrix is viewed as a linear operator in multi-dimensional cartesian space, its¬†largest absolute eigenvalue tells us how much the matrix can “stretch” a vector in the space. So it gives us an essence of how “big” the matrix is in that sense, hence its incorporation into a norm formulation.

The math

We will now give a mathematical intuition about how the incorporation of new data in a sensor leads to a likelihood of increase of its spectral norm, or, as we now know, its dominant eigenvalue. For simplicity, let us assume that the data is mean centered, such that we don’t need to complicate the mathematical presentation with mean subtraction.¬†Let¬†őĽ¬†be the covariance matrix’s dominant eigenvalue¬†and u be a unitary eigenvector in the respective eigenspace. ő§hen, from the relationship between eigenvalues and eigenvectors, we obtain:

Derivation 1

with the second line being a result¬†of the fact that¬†u is assumed unitary. It is therefore obvious that, in order to gauge how the value of¬†őĽ¬†varies, we must examine the 2-norm (Euclidean norm) of the vector¬†on the right-hand side of the final equals sign.

Let’s try unwrapping the product that makes up this vector:

Derivation 2

Now, let us focus on the first element of this vector. If we unwrap it we obtain:

Derivation 3

The crimson¬†factors really let us know what’s going on here, since the summations in the parenthesis involve the “filling up” of values in the covariance matrix that lie beyond the main diagonal. For fully correlated data, those values are all zero. On the other extreme, they are all non-zero. It is natural to assume that, as more data arrives, all those values tend to deviate from zero, since some inter-dimensional uncorrelation is, stochastically,¬†bound to occur. On the other hand, if new data is such that it causes an increased inter-dimensional correlation, then the sum will tend towards zero, and the covariance matrix’s spectral norm will actually decrease!

The¬†second vector element deals with the correlation between the second dimension and the rest, and so on and so forth. Therefore, the larger the values of these elements, the larger the value of the 2-norm || X’ X u ||¬† is going to be and vice versa.

Some code

We can demonstrate all this in practice with some MATLAB code. The following function will generate some random data for us:

function X = gen_data(N, s)
%GEN_DATA Generate random two-dimensional data.
% N: Number of samples to generate.
% s: Standard deviation of Gaussian noise to add to the y dimension.
x = rand(N, 1);
y = 2 * x + s.* randn(N, 1); % Adding Gaussian noise
X = [x,y];
end

This function will generate the covariance matrix of the input data and return its spectral norm:

function norm = cov_spec_norm(X)
% COV_SPEC_NORM: Estimate the spectral norm of the covariance matrix of the
% data matrix given. 
%   X: An N x 2 matrix of N 2-dimensional points.

COV = cov(X);
[~, S, ~] = svd(COV);
norm = S(1,1).^2;
end

Then we can use the following top-level script to create some initial, perfectly correlated data, plot it, estimate the covariance matrix’s spectral norm, and then examine what happens as we add chunks of data, with increasing amounts of Gaussian noise:

% A vector of line specifications useful for plotting stuff later
% in the script.
linespecs = cell(4, 1);
linespecs{1} = 'rx';linespecs{2} = 'g^';
linespecs{3} = 'kd'; linespecs{4} = 'mo';

% Begin with a sample of 300 points perfectly
% lined up...
X = gen_data(300, 0);
figure;
plot(X(:, 1), X(:, 2), 'b.');  title('Data points'); hold on;
norm = cov_spec_norm(X);
fprintf('Spectral norm of covariance = %.3f.\n', norm)

% And now start adding 50s of noisy points.
for i =1:4
    Y = gen_data(50, i / 5); % Adding Gaussian noise 
    plot(Y(:,1), Y(:, 2), linespecs{i}); hold on;
    norm = cov_spec_norm([X;Y]);
    fprintf('Spectral norm of covariance = %.3f.\n', norm);
    X = [X;Y]; % To maintain current data matrix
end
hold off;

(Note that in the script above, every new batch of data gets an inreased amount of noise, as can be seen in the call to gen_data.)

One output of this script is:

>> plot_norms
Spectral norm of covariance = 0.191.
Spectral norm of covariance = 0.200.
Spectral norm of covariance = 0.200.
Spectral norm of covariance = 0.220.
Spectral norm of covariance = 0.275.

A plot of our dataInterestingly, in this example, the spectral norm did not change after incorporation of the second noisy data. Can it ever be the case that we can have a decrease of the spectral norm? Of course! We already said that the crimson summations above, corresponding to summations over cells of the covariance matrix beyond the first diagonal, can fall closer to zero after we incorporate new data whose dimensions are more correlated with¬†the existing data’s. Therefore, in the following run, the incorporation of the first noisy set actually increased the amount of inter-dimensional correlation, leading to a smaller amount of covariance (informally speaking).

>> plot_norms
Spectral norm of covariance = 0.177.
Spectral norm of covariance = 0.174.
Spectral norm of covariance = 0.179.
Spectral norm of covariance = 0.220.
Spectral norm of covariance = 0.248.

Another data plot.

Discussion

The intuition is clear: as new data arrives in a node, observing the fluctuation of the spectral norm of its covariance matrix can tell us some things about how “noisy” our data is, where “noisiness” in this context is defined as “covariance”. I guess the question to be made here is what to expect of one’s data. If we run a sensor long enough without throwing away archival data vectors, it’s unclear whether we can expect the spectral norm to continuously¬†increase (at least not by a significant margin). We should expect a sort of “saturation” of the spectral norm around a limiting value. This can be empirically shown by a modification of our top-level script, which runs for 50 iterations (instead of 4) but generates batches of data with standard Gaussian noise, i.e the noise does¬†not increase with every new batch:

% Begin with a sample of 300 points perfectly
% lined up...
X = gen_data(300, 0);
norm = cov_spec_norm(X);
fprintf('Spectral norm of covariance = %.3f.\n', norm)

% And now start adding 50s of noisy points.
for i =1:50
    Y = gen_data(50, 1); % Adding Gaussian noise 
    norm = cov_spec_norm([X;Y]);
    fprintf('Spectral norm of covariance = %.3f.\n', norm);
    X = [X;Y]; % To maintain current data matrix
end

Notice how the call to gen_data now adds normal Gaussian noise by keeping the standard deviation static to 1. One output of this script is the following:

>> toTheLimit
Spectral norm of covariance = 0.203.
Spectral norm of covariance = 0.341.
Spectral norm of covariance = 0.388.
Spectral norm of covariance = 0.439.
Spectral norm of covariance = 0.535.
Spectral norm of covariance = 0.635.
Spectral norm of covariance = 0.677.
Spectral norm of covariance = 0.744.
Spectral norm of covariance = 0.818.
Spectral norm of covariance = 0.842.
Spectral norm of covariance = 0.881.
Spectral norm of covariance = 0.913.
Spectral norm of covariance = 0.985.
Spectral norm of covariance = 1.030.
Spectral norm of covariance = 1.031.
Spectral norm of covariance = 1.050.
Spectral norm of covariance = 1.097.
Spectral norm of covariance = 1.148.
Spectral norm of covariance = 1.154.
Spectral norm of covariance = 1.186.
Spectral norm of covariance = 1.199.
Spectral norm of covariance = 1.280.
Spectral norm of covariance = 1.318.
Spectral norm of covariance = 1.323.
Spectral norm of covariance = 1.325.
Spectral norm of covariance = 1.344.
Spectral norm of covariance = 1.346.
Spectral norm of covariance = 1.373.
Spectral norm of covariance = 1.397.
Spectral norm of covariance = 1.447.
Spectral norm of covariance = 1.436.
Spectral norm of covariance = 1.466.
Spectral norm of covariance = 1.466.
Spectral norm of covariance = 1.482.
Spectral norm of covariance = 1.500.
Spectral norm of covariance = 1.498.
Spectral norm of covariance = 1.513.
Spectral norm of covariance = 1.518.
Spectral norm of covariance = 1.518.
Spectral norm of covariance = 1.499.
Spectral norm of covariance = 1.492.

It’s not hard to see that after a while the value of the spectral norm tends to fluctuate around 1.5. Under the given noise model (Gaussian noise with standard deviation = 1), we cannot expect any major surprises. Therefore, if we were to keep a sliding window over our incoming data chunks, and (perhaps asynchronously!) estimate the standard deviation of the spectral norm’s values, we could maybe estimate time intervals during which we received a lot of noisy data, and act accordingly, based on our system specifications.