Landa Function

Technological decisions that successful start-ups take

• startups, and development

Many tech founders wonder which technological decisions are going to help them build a successful company. Founders that sold their company or went to an IPO tend to talk about the culture, the spirit, and the core team but they usually leave tech decisions out of this scope. After 15 years of working in the Israeli tech industry in different roles, including being a consultant and working for Armis Security, I started to see patterns of decisions that successful startups tend to take.

Start with DevOps practices from day one

You have an idea and you start proto-typing around it. Sooner or later you are going to have an application (mobile/web or backend). It is going to run somewhere and have some impact on the world. Even if it’s only 1000 lines of code, you should start using source control (Git) and keep consistent naming conventions for branches. It will help you prototype faster and after having a working prototype you can throw this Git repository and start a new one from scratch. Source control is not enough, you should also have an automated build and deployment. “Dockerizing” (putting the application in Docker) is even better. Nowadays you don’t even need to have your own CI server, you should use managed services like Travis or Circle CI and Github/Gitlab for Git. Why? Because having a 1-click demo reduces stress and ensures stability before meeting potential investors. It will also make the life of the non-technical founders much easier.

Small data big insights, take care of BigData later

Building BigData infrastructure is complicated and costly. It requires experienced developers, choosing the right data store for your data and complicated code that works around the datastore limitations. If you are not developing a BigData service per se don’t go there. Handling small data applications is a solved problem. There is a wide variety of databases such as MySQL, PostgreSQL and even document databases like MongoDB that will work great with your data. These databases are battle proven and come with mature tooling and developers that already know how to use them.

Monolith first, microservices later

Microservices are one of the hottest buzzwords in the last few years. When starting a new startup it is tempting to start writing microservices from the first day, defining a good domain and separating concerns. But what most developers forget is that microservices come with a complexity price. Microservices require common infrastructure, orchestration, and dedicated deployment tooling. When your domain is not fully defined and your startup contains up to 10 developers, microservices will probably hinder more than help. It doesn’t mean that the code you write shouldn’t be modular, we used to write modular code even before we had microservices. I recommend not to bother writing microservices until you have a good grasp of what you are aiming to build.

Understanding Scala call-by-name

• scala

Scala call by name is a powerful feature missed by many Scala developers. By using a common use case like logging we will see what call by name gives us and what are the alternatives. When we call the logger, we usually specify a log level and a message:

class Logger { 
..
def log(logLevel: Int, message: String): Unit = .. 
} 

Known best practice is to use the appropriate logging level according to the need. However, logging levels doesn’t prevent from us from computing the message for our log line. From my experience, many debug messages require heavy computation which is not part of the main business logic. If the log level is higher than debug, these debug messages should not be computed. A common pattern that I saw at some companies was as following:

def log(logLevel: Int, message: String): Unit = {
	if logLevelEnabled(logLevel) { .. actualLogging ..}
}

// in another file
val message = someHeavyComputation() // call to an external service or a Spark action
log(1, message)

The log function implementation assures that messages from lower log level than specificized will not be logged, but the message is going to be computed anyway. Another solution will be to utilize Scala high-order functions and pass the message a function of () => String :

def log(logLevel: Int, messageFunc: () => String): Unit = {
	if logLevelEnabled(logLevel) { 
		val message = messageFunc()
		actualLogging(message) 
		..

}

Even though we have a solution, the syntax doesn’t feel right. In functional programing we assume referential transparency, making a function without arguments equal to a statement. This is exactly what the call by name feature provides using the syntax:
def func(statement: => Type).

Lets look at our log function with using a call by name parameter :

def log(logLevel: Int, message: => String): Unit = {
	if logLevelEnabled(logLevel) { 
		actualLogging(message) 
		..
	}
}

Using the syntax above the message statement will be evaluated every time it is called, just like a function with a single parameter.

A final example to emphasize the power of call by name can be found in the code below. In this snippet the log function will be called 10 times and the “foo” will be printed 10 times as well.

def logMultiple(nTimes: Int, message: => String): Unit = {
	for (_ <- 1 to nTimes) log(message)
}
logMultiple(10, {
	println("foo")
	"test"
})

To summarize, if I want to pass an expression instead of a value I would use a call by name parameter. Every access to the call by name parameter will evaluate the expression. For real world example look at Twitters Logger implementation

30 times faster

• scala

As developers we all have our war stories, in which we fix a critical bug in the middle of the night or find an awesome solution to a really complicated performance issue. This war story is about the old “Introduction to CS” problem - create all of the possible couple combinations from a given collection of items matching a predefined filter. It was interesting to encounter this problem again during a load test when a single method took several hours instead of seconds. That method did exactly what we were requested to do in class!

So I had this inefficient code:

def groupCombinations(groupMembers: Seq[GroupMember]): Array[(Member, Member)] = {
 groupMembers.combinations(2).collect {
   case first +: second +: _  if first.groupId == second.groupId =>
     if (first.member.id > second.member.id) {
       (second.member,first.member)
     } else {
       (first.member, second.member)
     }
 }.toArray
}


Using thread dump I saw that this method creates numerous Seq objects. Diving into the code of groupMembers.combinations(2) made it clear that this line was the root cause for our performance problem. groupMembers.combinations(2) created an iterator that produced a Seq object for every single couple combination, even though most of these combinations were going to be filtered later on. As a result the entire method had a polynomial run and memory complexity of magnitude O(n^2).

To reduce the run time complexity I decided to sort the group members by their group. By sorting, the members of the same group were in indexes near each other. No need to iterate over groupMembers O(n^2) times.

groupMembers.sortBy(_.groupId)

Sorting is fine but it’s not enough, I wanted to have an efficient representation of a couple combination. Scala’s powerful Vector class came to the rescue. Due to its fast random-access of elements it was possible to represent any couple combination by using only two Int indexes pointing to the relevant locations in the vector. There are scenarios when sorting is not necessary, this happens when all the group members belong to the same group - in this case sorting is skipped.

val firstGroupId = groupMembers.head.groupId
val severalGroupsExist = groupMembers.tail.exists(_.groupId != firstGroupId)
val sorted  = if (severalGroupsExist) {
  groupMembers.toVector.sortBy(_.groupId)
}  else {
  groupMembers.toVector
}

To wrap the solution up I used Scala’s for statement to create “on the fly” only the relevant combinations:

def optimizedCombinations(groupMembers: Seq[GroupMember]): Array[(Member, Member)] = {
  val firstGroupId = groupMembers.head.groupId
  val severalGroupsExist = groupMembers.tail.exists(_.groupId != firstGroupId)
  val sorted  = if (severalGroupsExist) {
    groupMembers.toVector.sortBy(_.groupId)
  }  else {
    groupMembers.toVector
  }

 def getNextIndex(currInd: Int): Int = {
   val nextInd = sorted.indexWhere(_.groupId != sorted(currInd).groupId, currInd + 1)
   if (nextInd > 0) nextInd else sorted.length
 }

 (for {
   currInd <- 0 until sorted.size - 1
   currItem = sorted(currInd)
   nextInd <- currInd + 1 until getNextIndex(currInd)
   nextItem = sorted(nextInd)
 } yield {
   if (currItem.member.id > nextItem.member.id)
    (nextItem.member,currItem.member)
   else
    (currItem.member, nextItem.member)
 }).toArray
}


When we ran our load tests again with the optimized code it ran 30 times faster than before, in isolated benchmarks it ran even faster, being 600 times faster than before.

I don’t recommend optimizing your code before knowing where the bottleneck is, but when you get to the point that you have to optimize - start with the basics.