Simply Scala – foldLeft & foldRight

This requires some basic knowledge about Scala. This time I’ll try to explain a bit about foldLeft and foldRight. These functions are meant to derive a result out of a collection of values. As a start I will give three examples in Java:

  • Summing all the values inside a collection of numbers.
    	int sumIntegers(List list) {
    	  int accumulator = 0;
    	  for (int item: list) {
    	    accumulator = accumulator + item;
    	  }
    	  return accumulator;
    	}
    	
  • Multiplying all the values inside a collection of numbers.
            int productIntegers(List list) {
              int accumulator = 1;
              for (int item: list) {
                accumulator = accumulator * item;
              }
              return accumulator;
            }
            
  • Finding the shortest string inside a collection of strings.
    	// For the sake of simplicity we are assuming that there are no 'null' contents in the list.
    	String findShortestString(List list) {
    	  String accumulator = list.get(0);
    	  for (String item: list) {
    	    accumulator = item.length() < accumulator.length() ? item : accumulator;
    	  }
    	  return accumulator;
    	}
    	

They look very much alike, don’t you think?

Such scenarios of deriving some value out of a collection of values are not unusual. Hence the solutions are also very similar. They are different mainly in two things: The initial accumulator value and How the accumulator is calculated as it is compared to every item. If only we could generalise the three functions above into a generic function where we could pass those 2 things, the code will be much simpler. More or less it will look like this code below. Please note that this code won’t compile since in Java (< version 8) we cannot pass function as a parameter. I’m just trying to convey my point here.

<T> T deriveSomethingOutOfThisList(List list, T accumulator, Function f) {
  for (T item: list) {
    accumulator = f(accumulator, item)
  }
  return accumulator;
}

How do we implement these functions in Scala?

In Scala a built-in feature has been provided to cater for such scenarios, which is by using either foldLeft or foldRight. This is how those three functions are defined in Scala

  // For the sake of simplicity we are assuming that there are no 'null' contents in the list.

  // Initial accumulator value = 0. 
  // The accumulator is recalculated with an addition in every iteration.
  val sumIntegers = list.foldLeft(0)((accumulator, item) => accumulator + item)

  // Initial accumulator value = 1.
  // The accumulator is recalculated with a multiplication in every iteration.
  val productIntegers = list.foldLeft(1)((accumulator, item) => accumulator * item)

  // Initial accumulator value = the first item in the list.
  // The accumulator is compared and updated if the iterated item is shorter.
  val shortestString = list.foldLeft(list(0))((accumulator, item) => {
                         if (item.length<accumulator.length) item
                         else accumulator
                       })

I’m discussing the 1st one. Once you understand how it works for the 1st one, it should be easy enough to understand the other two. Take a look at this line:

  val sumIntegers = list.foldLeft(0)((accumulator, item) => accumulator + item)

Method foldLeft belongs to the collection instance. It takes two parameters:

  • The initial accumulator value, which is 0, and
  • The function to recalculate the accumulator.This function takes two parameters: The accumulator and the list item. In this case we want the accumulator to be the sum of every list item. Hence the addition.

The other two examples follow the same behaviour as the first one. I’m leaving them to you to play around with the values and give it a try.

BTW, the naming is not very strict in the function definition above. I’m just naming them accumulator and item for the sake of readability. If you want, you can name them differently as shown below:

  val sumIntegers = list.foldLeft(0)((a, b) => a + b)

Wait, where is foldRight?

In all the examples stated above we always use foldLeft. What about foldRight? What’s the difference between them? Why am I not using foldRight at all in these examples?

There are a few differences between them:

  • Method foldLeft will trace the list item from the left to the right. While foldRight, right to the left. Shown below:
    • foldLeft: ((((1 + 2) + 3) + 4) + 5)
    • foldRight: (1 + (2 + (3 + (4 + 5))
  • Method foldRightmay throw StackOverflowException if there are too many items in the list. It’s not the case for foldLeft. More detailed explanations below.

Method foldLeft‘s implementation uses the while...do in order to trace every list item while method foldRight will trace every list item recursively. There is a drawback by looping the list recursively: it consumes more memory. Stack memory to be more exact. If the list contains too many items, the system is more likely to throw StackOverflowException. That’s why in most cases foldLeft is always preferred to foldRight.

But why are they different?

The answer lies in the way a List is structured in Scala. Scala’s List is different than Java’s. I’d just briefly say that when Scala defines a list such as List(1, 2, 3), in the back-end it’s actually structured as List(1, List(2, List(3, Nil))). Yes, by its very nature List is nested in Scala. Pretty strange, huh?

This demands another article for more thorough explanation. I have no plan to cover this in the future. I’m sure other people can explain it better than I do.

So basically foldRight is useless?

Yes. Most of the time. I have not yet found a scenario where it is better to use foldRight rather than foldLeft. And I don’t think this behaviour will change in the near future. It’s just the way it is natively structured in Scala.

And now, the not-so-sweetener syntactic sugars.

There is a syntactic sugar for method foldLeft and foldRight. Personally, I don’t like it because the way it’s used is not very common and confusing. Anyway, you can define the sumIntegers method above by using either foldLeft and foldRight like this:

  • foldLeft:
        val sumIntegers = (0 /: list)((accumulator, item) => accumulator + item)
        
  • foldRight:
        val sumIntegers = (list :\ 0)((item, accumulator) => accumulator + item)
        

Please remember not to get yourself mixed-up with the parameters location. Not a very sweet syntactic sugars, aren’t they? 🙂 This concludes my post for today. Please leave feedbacks below if you have any.

Simply Scala – foreach

In this post I just want to give a very small example of Scala’s conciseness. Since the program is the evolutionised version of Java, the code will be match against Java. The example given here is just in simply creating a List containing three values and then displaying each one of them.

Java code:

List<Integer> list = new ArrayList<Integer>(){{
 add(1);
 add(3);
 add(5);
}};
for (int l: list) {
  System.out.println(l);
}

Scala code:

// Initiate the list.
val l = List(1,3,5)

// For every item in the list, represented as 'n', display its value.
l.foreach { n =>
  println(n)
}

At a glance you can see that the latter is more concise than the former. But hey, it’s actually still a bit redundant. Why do we still need to redefine the variable inside the loop? The code can still be simplified a bit more.

// Initiate the list.
val l = List(1,3,5)

// Display every value inside the list.
l.foreach { println(_) }

Inside the for-loop bracket, Scala considers the underscore (_) as the active item of the list. But, seriously, do we even need the underscore? I mean, we just want to println the value for each item in the list, right? So…

// Initiate the list.
val l = List(1,3,5)

// Display every value inside the list.
l.foreach { println }

But… wait, for this example, we can actually remove the bracket since it’s just executing one line of code for each item (this also applies for the Java codes above, BTW). Also, Scala has a syntactic sugar that allows you to take out the dot(.) in calling an object’s method. So, let’s try to apply these into the code:

// Initiate the list.
val l = List(1,3,5)

// Display every value inside the list.
l foreach println

Oh gosh! it’s even simpler! It cannot possibly go much further than this, right? Well, not really. You can simplify the code above a bit more by inlining the List directly into the loop.

// Create a list and display each value.
List(1,3,5) foreach println

There you go. One line. And I would argue that it’s even more readable than the 1st one.

Simply Scala – Introduction

Being a relatively new programming language, Scala has been gaining popularity since its debut in year 2008 by Martin Ordersky, the inventor. It’s often considered as an evolved version of Java. There are a number of reasons why this is so:

1. It’s concise.

Martin takes notice of the common design patterns encountered during the development and so design this language to simplify many boilerplates. In general, one line of Scala may represent up to 5-8 lines of Java programs. Or even more.

2. It’s interoperable with Java programs.

One critical advantage of Scala is its operability with Java programs. Since it’s built on top of JVM, the compiled scala files, which are .class files, are cooperative with .class files generated from java programs.

3. It supports hybrid functional-imperative paradigm.

I’m still trying to find a way to explain what functional paradigm is. Most likely you have been programming for a while then you would have encountered, or even use, this model.

A quick comparison between imperative and functional paradigm is based on this question as a base mindset as you develop the system:

  1. Imperative: What do you want to do?
  2. Functional: What do you want?

I won’t go deep into this. Another article dedicated about this not-so-new programming paradigm is required in order to provide

There are still other strong advantages offered by Martin through this language. And of course, just as a coin has two sides, Scala also has its strong disadvantages as well, which are mostly rooted at this: It’s a very complex type system, which admittedly seem rather intimidating at first. This indicates several rooms for improvement, especially for its documentation.

Don’t get it wrong. It has already had decent community support and comprehensive documentation. I’m not talking about quantity of documentations, but quality (simplicity). Those playing around with Scala in Eclipse may sometimes cringe when you see the error compile message. It’s not always the case, though sometimes it does.

I won’t get too far here since this is just my simple introduction of what Scala is about.

Parameter Mutation

In short, do not mutate the parameter variables. It is very error-prone and it makes bug-tracing more difficult.

This is what I meant with parameter mutation.

public int[] sortNumber(int[] input) {
  for (int i=0; i<input.length-1; i++)
    for (int j=i+1; j<input.length; j++) {
      if (input[i]>input[j]) {
        int temp = input[i];
        input[i] = input[j];
        input[j] = temp;
      }
    }
  }
  return input;
}

The code shown above is an example of a bad code. Now, let’s look at this code below:

int[] numbers= new int[]{1,3,5,4,2};
int[] sortedNumber = sortNumber(numbers);

// Display the contents
for (int number: numbers) {
  System.out.print(number," "); // "1,2,3,4,5" will be printed.
}

Now as you see the sortNumber function above mutates the original data. This might not seem like a problem for you if you are the one programming the API. But when another programmer steps in and try to use the same function, it’s likely to cause a headache.
Example is always the easiest thing to do to clear things up:

int[] numbers = new int[]{1,3,5,4,2};
int[] sortedNumbers = sortNumber(numbers); // sortedNumbers = {1,2,3,4,5}
int[] reversedNumbers = reverseNumber(numbers); // reversedNumbers = {5,4,3,2,1}

As you see, the original variable (numbers) contains the unsorted value. When the sortNumber function is called and {1,3,5,4,2} is passed in, the method will return the expected array of sorted integers {1,2,3,4,5}.

The problem arises when the new programmer tries to use another function (reverseNumber in this example) using the same variable (numbers) right after using your function. He is expecting to see {2,4,5,3,1} as the result of reversing the values {1,3,5,4,2}. Oh dear he is likely to be surprised when he sees that the returned value is not as he expected.

Let’s have a quick look at some Math lesson regarding function (mapping) operation.

y1 <-- f1(x)
y2 <-- f2(x)

Now, when you call function f1 using parameter x, is it going to change the parameter x? No? Bingo, you’re right! It doesn’t make sense if the x value changes after f1 function is called.

The same thing should apply in programming as well. When you apply a function to a parameter. Do not change the parameter value in your method. If it’s really necessary, make a defensive copy of the parameter or clone the parameters and assign it to a new variable. Then you’ll be able to play around with that new variable freely without worrying about spoiling the original parameter values.

There are a few exceptions when the passed parameter variables are mutated. But in general, preventing the variable mutation is a better choice.

Tips for J2EE optimisation

There was one time when I had finished coding and verifying my project module in my local computer. Passed it to the UAT. The UAT team had also verified the functionalities. Everything seemed to work pretty well. “So, it is now ready to be deployed to Production, right?” was what I thought until they mentioned such thing that there is still another test called Load / Stress Testing. What the heck is that?

I wasn’t very prepared with such terms while I did my coding since I was still fairly new as a programmer (about a year) at that time. To make the story short, it failed. It’s supposed to be strong enough to handle 140 users at the same time. My project module could handle 5. Only FIVE users at the same time! Oh dear…

I learned then that it’s not just about making sure that all the functionalities are to work properly, no critical bug, no error page, and so that’s it. There is another dimension that should be considered to make sure that the project is ready to face the world. You’re doing no good if you show your Masterpiece Project to the WORLD in the end just to know that only five people can actually use it at the same time.

I learned that Performance is an important factor.

There are many tips & tricks for performance boosts for J2EE scattered all over the web. It might be too huge to compile them all into one blog post. Some of them are no longer relevant considering all the tweaks / fixes done by the marvelous Java Team to Java community. Nevertheless I am still eager to share those that I have got and learned so far.

I’d like to start with my favourite principle from Kevlin Hennely. I personally consider it a very basic principle of performance optimisation:

There is no code faster than no code
Kevlin Hennely

That is, don’t put unnecessary lines  into your program. You have to regularly review your code or have someone else to review your code to make sure that there is a purpose for the existence of every line.

Other than the basic principle mentioned by Kevlin Hennely above, these are the list of some of the rules that I have learned so far:

1. String operations.

What is the best way to concatenate Strings together? Is it by using ‘+’ operator or by using StringBuffer, or even StringBuilder?

The answer is: All of them. We just need to know how and when to use them properly, based on these rules below:

  • Use ‘+’operator to combine two String constants together, e.g.:
    "SELECT * FROM SHOP_STOCK " 
    + "WHERE QTY<QTY_THRESHOLD;" // good example. 
  • Don’t use ‘+’operator to combine a constant with a variable, or a variable with a variable, e.g.:
    "SELECT * FROM SHOP_STOCK " 
    + "WHERE QTY<" + var_qty_threshold; // bad example

    Use either StringBuffer or StringBuilder instead.

  • Use StringBuffer when one of those Strings to be combined are variables and you want to make it thread-safe, e.g.:
    StringBuffer sb = new StringBuffer("SELECT * FROM SHOP_STOCK " 
    + "WHERE QTY<").append(var_qty_threshold);
  • Use StringBuilder when one of those Strings to be combined are variables and there is no thread-safety issue involved. It’s very suitable for most cases, e.g.:
    StringBuilder sb = new StringBuilder ("SELECT * FROM SHOP_STOCK " 
    + "WHERE QTY<").append(var_qty_threshold);

 

2. Combine several EJB method calls into one method call.

EJB method invocation is a heavy process. Try not to call the EJB methods too many times.

So, instead of doing this:

// Don't do this!!!
for (Document doc: documents) {
  getRemoteEJB().saveDocument(doc); // the EJB method is invoked for every document.
}

do this instead:

// the EJB method is invoked only once for all documents.
getRemoteEJB().saveDocuments(documents);

 

3. Avoid subqueries. Use “JOIN” instead.

The subqueries will be executed for each row. This is not a good idea especially if there are huge data to be processed in the database. By joining to a table, there will be only a query executed to join the two tables.

This is a bad example:

select DT.DESCRIPTION, DT.QTY,
  (select MASTER.DESCRIPTION from MASTER_INFO MASTER where CODE = DT.EQ_CODE and MASTER.CATEGORY='EQUIPMENT') EQUIPMENT,
  (select MASTER.DESCRIPTION from MASTER_INFO MASTER where CODE = DT.BRAND_CODE and MASTER.CATEGORY='BRAND') BRAND
from DETAIL DT;

and this is the better example:

select DT.DESCRIPTION, DT.QTY, MASTER_EQ.DESCRIPTION EQUIPMENT, MASTER_BRAND.DESCRIPTION BRAND
from DETAIL DT
  left join MASTER MASTER_EQ on DT.EQ_CODE = MASTER_EQ.CODE and MASTER_EQ.CATEGORY='EQUIPMENT'
  left join MASTER MASTER_BRAND on DT.BRAND_CODE = MASTER_BRAND.CODE and MASTER_BRAND.CATEGORY='BRAND';

 

4. Put a value into a variable if it’s going to be used several times.

Avoid putting any computation in the conditional part of the for-loop if you know the value is always fixed throughout the iterations.

Bad example:

// Let's say there are 100 items in the database.
for (int i=0; i<getAllItemsFromDatabase().size(); i++) { // 100 times
  Item item = getAllItemsFromDatabase().get(i); // 100 times
  // Do whatever it is..
}

Note that the conditional section in the for-loop, “i<getAllItemsFromDatabase().size()”, will be evaluated for each iteration.

If the getAllItemsFromDatabase() method will return, for example, 100 items. Then the code above will in the end execute the getAllItemsFromDatabase() method 200 times! Not counting if that method will involve EJB or any database process.

It’s like going back and forth between heaven and hell 200 times. Not my favourite exercise. I mean, seriously…

and for the better example:

List<Item> items = getAllItemsFromDatabase(); // called only once
for (int i=0; i<items.size(); i++) {
  Item item = items.get(i);
  item.setTempIdx(i+1);  // Do whatever it is..
}

In this sample, the method getAllItemsFromDatabase() will be called just once, and the result will be reused for all the iterations.

 

5. Use “cheap” objects for EJB method calls.

All the parameters and return objects used in an EJB method invocation will be serialized. The heavier the object is, the longer it takes for the system to serialize the object.

When a client invokes an EJB method, this is what happens in sequence:

  1. Client invokes the method.
  2. The parameters are serialized and then sent as a stream to the server.
  3. The streams are de-serialized to be parameters again.
  4. Inside the EJB, after all the processes are completed and the result object is returned
  5. This return object will be serialized into a stream and then sent to the client.
  6. The EJB will de-serialize the stream to become the result object again.

As seen above, there are a couple of serialization and de-serialization processes going on for all the objects used in the method invocation. And it takes time! That’s why it’s good to observe the objects again (both the parameters and the return objects) whether they are really necessary or not, or if it’s possible to simplify them.

Take a look at the sample code below:

// Don't do this!!!
public List<SuperComplexDocument> updateDocumentSiblings(SuperComplexDocument doc) {
  // I'm oversimplifying here for the sake of clarity.
  return getDocumentDAO().updateSiblings(doc.getDocumentCode());
}

For example, in the example shown above, we see that in fact we only need to access the documentCode attribute of the Document object. If that’s the case, the method parameter should be simplified as follows:

public List<SuperComplexDocument> updateDocumentSiblings(String documentCode) {
  // I'm oversimplifying here for the sake of clarity.
  return getDocumentDAO().updateSiblings(documentCode);
}

After the modification, the serialization & de-serialization processes will be lighter since the EJB doesn’t serialize the SuperComplexDocument parameter object parameter anymore. It’s just serializing the String object.
The same thing also applies for the return objects (List<SuperComplexDocument> in this example). If you notice that the return objects can be simplified, then simplify it! You should even remove it if it’s unnecessary.

 

6. System.Out.Println < Logger < no-log-at-all.

Printing out the variables to the Production console for the sake of debugging is not a good idea. It’s time-consuming, especially when the application is about to be used by a number of users.

If you are deploying a program into a Production environment and using the System.out.println command in your application, you will likely get into a problem. If, for example, there are 100 users accessing the same module at the same time, then this very simple command will be executed 100 times. This is very inefficient because the server is spending its resources just to do what is actually not to be done in Production environment.

If you do need to print out a value for the sake of debugging a bug in your local environment but not in Production environment, use a Logger instead. Logger will filter the command and analyze whether it’s valid to print something to the console or not. So, not everything will be printed to the console for all users. The popular one for now is Log4J, and it’s sufficient enough for me so far.

The best scenario is definitely by not using a Logger at all (take a look at the initial quote I put in the beginning of this post). Whether the log is to be filtered or not by the Logger, this filtering process itself takes time although not as much as the System.out.Println method. That’s why doing excessive logging will also affect the performance of the application.

 

7. Favor Stateless Bean over Stateful Bean.

Stateful Bean is heavier than Stateless Bean since the system will store the transaction object of every user into the memory. If there are 100 users calling an EJB method and triggers the transaction, there will be 100 unique Beans created by the server.

By using Stateless Bean, the transaction is not unique for every user. They are all sharing the same transaction object. So, if there are 100 users calling an EJB method and triggers the transaction, there will be only one Bean created by the server.

 

8. Adjust the Transaction attributes as necessary.

One aspect of EJB that makes it heavier is the usage of transaction. If some EJB method is meant only to retrieve a result without modifying anything, than transaction is not necessary. Adjust the transaction attribute accordingly for all the EJB methods based on this behaviour.

You can set the transaction attributes in the ejb-jar.xml deployment descriptor file. If you don’t need a transaction for a certain method call, you can set the value to be either ‘NotSupported’ or ‘Never’. You can check the manual for more details.

 

9. Use ‘transient’ modifier to reduce serialization overheads.

As mentioned above, serialization and de-serialization is a heavy task. One way to lighten the burden is by showing the system which attributes that do not need to be transferred to a file / over the network, etc. Basically we’re telling the system which attributes that do not need to undergo serialization process.

 

10. Don’t reinvent the wheel.

If you’re using a HashMap object that is frequently accessed by concurrency actions, you might as well replace it with ConcurrentHashMap instead. That why you don’t need to manually set how to lock / unlock the key, where to synchronize, etc. The Classes / APIs provided by Java community has been tweaked, tested and optimized again and again to work the best it can possibly be.

Another example is when you try to check whether a String constant starts with a certain characters. Instead of using your own substring, iteration, etc., you can actually use method startsWith(String prefix).

Make sure that you’re not reinventing the wheel. Check first whether such functionality has been natively invented or not before deciding to make one.

Source:

1. http://www.javaperformancetuning.com/

2. http://javaboutique.internet.com/tutorials/tuning/

3.  http://www.javaworld.com/javaworld/jw-05-2004/jw-0517-optimization.html

For-Loop

As we know, we can do iterations in Java with at least two ways: For-Loop, or While-Loop. We can actually do it by using recursive as well, but let’s take it aside for now. For this post, I just want to focus a bit about For-Loop.

In For-Loop itself, we can do the looping in at least three ways:

1. Common For-Loop.

This is the most common way of doing iteration in Java. This is the basic format:

List<Book> books = new ArrayList<Book>();
// Set<Book> books = new HashSet<Book>();
books.add(new Book(“One”, 1));
books.add(new Book(“Two”, 2));
books.add(new Book(“One”, 1));
for (int i=0; i<books.size(); i++) {
 Book book = books.get(i);
 // Do something with var book here.
}

This iteration is good enough for you if your projects use Collection data type with List implementation or for Array. But it’s not good enough for any other Collection data types such as Vector or HashMap, since they don’t apply getter method for a specific index.

If, let’s say, one day your supervisor decides to change the data type from ArrayList to HashSet in order to prevent duplicates in the Books collection, the code above will no longer compile because HashSet doesn’t have ‘get’ method.

The code above is not flexible for modifications.

2. With Iterator.
Iterator will help to solve the flexibility issue found in the common For-Loop. This is how the code will look if you convert it to apply Iterator.

List<Book> books = new ArrayList<Book>();
// Set<Book> books = new HashSet<Book>();
books.add(new Book(“One”, 1));
books.add(new Book(“Two”, 2));
books.add(new Book(“One”, 1));
for (Iterator<Book> it = books.iterator; it.hasNext(); ) {
 Book book = it.next();
 // Do something with var book here.
}

The code above in this 2nd example is more flexible than the 1st one. It’s flexible because whether the books collection will use List data type or Set data type, the code will still compile and run just fine. It solves the flexibility issue found when you use the common For-loop.

Can it get even better than this? Yes, but no longer in terms of flexibility. It’s just in terms of readability.

3. Enhanced For-Loop.
Since JDK 5, Java has launched a new form of loop, which they call Enhanced For-Loop. Actually this loop will implicitly use Iterator implementation as shown in the previous example. Implicitly. With this new for-loop format, the code will look better.

This is how the code will look if you use the Enhanced For-Loop instead.

List<Book> books = new ArrayList<Book>();
// Set<Book> books = new HashSet<Book>();
books.add(new Book(“One”, 1));
books.add(new Book(“Two”, 2));
books.add(new Book(“One”, 1));
for (Book book: books) {
 // Do something with var book here.
}

Now, don’t you think that this one looks much nicer than the previous one? It’s more readable, and at the very least it saves you one line of code for each iteration.

The constraint here is that the declared Collection variable must use Generics. Anyway, actually it’s not really fair to call it a constraint. It’s more correct to consider it a feature. Generic is a very good Java practice to save you from wasting your time of compiling, deploying, and running the program just to encounter ClassCastException.

Indeed there are cases where it is necessary to use the classic For-Loop (the 1st option) instead of the others. One example is when the loop index is going to be used somewhere in the iteration. It usually happens when the sequence of the data inside the collections need to be maintained, such as ArrayList, Vector, or plain Array. If the sequence doesn’t matter, it’s advised to follow this formula:

Enhanced For-loop > For-Loop with Iterator > Common For-Loop.