Sunday, February 21, 2010

Setting up Clojure 1.1.0 on Mac OSX

As part of the Lambda Lounge, established by Alex Miller (Thanks!!), we have started a group to study the SICP. We just started and are going virtual... if you would like to participate, email or tweet me (@kensipe). In the process of studying SICP and LISP, I plan on focusing on Clojure. Clojure was previously on my machine as I was reading Stu's book Programming Clojure, however increased usage would require some maturing of my tools. This led to the discovery of Mark Reid's blog on Setting up Clojure for Mac OS X, however I assume based on updates and changes over the last year, it requires some updates.

Read Mark's blog first... getting clojure, the clojure-contrib and jline. Here are the differences:
  • Create an environment variable CLOJURE_HOME assigned to the directory for clojure
  • Create the /bin directory for the clj script included below
  • Add $CLOJURE_HOME/bin in the $PATH
  • Clojure is distributed with clojure.jar in the CLOJURE_HOME... so I maintain that. It is expected that contrib and jline are in the same directory.
  • Mark's clj script has been updated below... The previous script invokes classes which are depreciated.
Here is the updated script:

#!/bin/bash
# Runs Clojure using the classpath specified in the `.clojure` file of the
# current directory.
#
# Ken Sipe 2010-02-20
# Inspired by Mark Reid
# original: http://github.com/mreid/clojure-framework/blob/e1c80cc650f448713243be8272dba1fa3c1a7cea/clj
#
# This scripts expects $JAVA_HOME and $CLOJURE_HOME to be defined for the system it is running on
# The clojure.jar is standardly in the clojure_home dir (and not in the lib directory), so
# the expectation is that clojure-contrib.jar and the jline.jar are placed in the same dir

CLJ_DIR=$CLOJURE_HOME
CLOJURE=$CLJ_DIR/clojure.jar
CONTRIB=$CLJ_DIR/clojure-contrib.jar
JLINE=$CLJ_DIR/jline-0_9_5.jar
CP=$PWD:$CLOJURE:$JLINE:$CONTRIB

# Add extra jars as specified by `.clojure` file
if [ -f .clojure ]
then
CP=$CP:`cat .clojure`
fi

if [ -z "$1" ]; then
java -server -cp $CP \
jline.ConsoleRunner clojure.lang.Repl
else
java -server -cp $CP clojure.main -i $*
fi
Love the TextMate support and the history support in JLine!

Wednesday, February 17, 2010

Reporting from SpeakerConf 2010

I was able to attend speakerconf 2010 this year in Aruba. It was an amazing gathering of talented leaders in the software development space. Each day of the conference started with several 15 – 30 minute talks lasting roughly 3 hours. According to the retro at the end of the conference, most people enjoyed and preferred these short talks as a primer to longer conversations in the afternoon along the pool or beach. Most talks were met with the struggle to maintain time often due to the many follow on questions generated by the subject. As far as I know the subjects were selected by the speaker and they were generally not preannounced. It is impossible to capture the essence of the conference and I make no effort to cover the most valuable part of it… the evening free form discussions. Most of the information went so fast that it was difficult to mentally be engaged and take notes. Here is a glimpse of what was discussed.

Day 1
Stu Halloway
Stu introduced the ADD development… which is alcohol driven. Stu then jumped into clojure, introducing a private testing project he is working on called circumspect. Stu offered up some advice for working in clojure:
* be less clever
* idiomatic clojure
* make adjectives not nouns

The focus on circumspect was to provide an idiomatic testing option for functional testing in clojure. Stu mentioned that through his testing exploration that an test he created actually detected a bug in the clojure code.

Dennis Byrne
Dennis is the first to discuss a topic on hardware and general process constraints in a talk titled Memory Barriers. The challenge is that virtual machines and hardware interfaces reorder code in ways most developers are not aware of. Fences or memory barriers define for the VM what can and can’t be reordered. There was some discussion on the expensive nature of these barriers.

Matt Deiters
Matt introduces a challenging problem that he has been working on in a story about vertex. The problem stems from social network analysis… it doesn’t work in SQL. Their previous implementation in SQL might take an hour to find all the details need for just 2 degrees of separation in a social network. Matt introduces neo4j an open source nosql graph database and neo4jr-simple, which is a RESTful ruby API. I enjoyed the fact that Matt wasn’t a No SQL bigot… He espoused a “Not all SQL” approach, using alternatives where SQL databases fall down.

Fred George

Fred displayed a triangle with management, story and programmers (or workers) at the vertexes. He wanted to focus on the relationship of the story to the programmer. Fred focuses on a Lean process and raises the question if acceptance tests are really lean? This stirred some debate. In the end, It appeared to me that what was being argued in most cases was the definition of acceptance tests vs. other types of tests. I don’t know if I agreed with Fred’s position, which we discussed at length the rest of the conference. My favorite part of his talk was a list of words or phrases that are flags that your project isn’t Agile. They include: Review, I need it in writing, write me a ticket, plan, checklist, code freeze, gant chart.

Brian Goetz
Brian’s talk is on how we find ourselves in this mess with multi-cores. Brian gives a wonderful history of processors defining them in eras: CISC era, RISC era, and Multi-Core era. The big take-aways for me include:
* a move from explicit to implicit parallelism
* moore’s law favors bandwidth over latency
* cost of memory access

With the speed of today’s processors, it takes 200-300 clock cycles to get 1 memory fetch, where it used to be 1 cycle. As latency increases we need more caches. Memory access times by clock cycle are roughly:
* Register ~ 1 cycle
* L1 ~ 3
* L2 ~ 15
* RAM ~ 200 – 300

Brian states “Memory is the new disk”. Clearly memory access is the new problem to solve in computer hardware. Brian also states that there is no programming model for this new multi-core era. Based on the talk, it appears to be that the industry has a new chicken or the egg problem. Hardware is struggling to evolve, because there is no software development model. Software is struggling because there isn’t a new standard. Brian concludes with latency is the enemy and the industries need to provide development teams with new tools. There will be a growing need to understand the memory access model of a running program, which can only be determined at run-time at the chip level. It will be necessary to understand the Integer pipe and memory cache misses. At the end of Brian’s talk, Nygard mentioned Chuck Moore’s Multi-Core chip where each core runs a tiny forth program. It is something I’ve added to the list of things to look into.

Michael Feathers
Michael discusses challenges to object oriented and functional language hybrids. There is background that I didn’t record, but one of the great observations is that functional programming is a “tell, don’t ask” model and object oriented programming is a “ask, don’t tell” model. In the FP approach, programming logic and data seem to bubble up the call tree and in the OO approach the opposite is true, logic seems to be pushed down the tree. Michael then looks at what it looks like conceptually if FP is on top of OO or OO is on top of FP. Michael then introduces the group to an algorithm used to solve morphological watersheds :)

Day 2
Steve Vinoski
Steve introduced us to a product his startup company Verivue, Inc is producing for the big media companies for streaming video. The IO throughput numbers were unbelievable. The software at the core of the product… Erlang!

Obie Fernadez
Obie and Desi introduced us to the agile practices of Hashrocket, indicating that they have evolved past iterations and they question the value added by iteration activities. They develop in one-week windows and use Pivotal Tracker for tracking stories and progress. The term “Iteration” isn’t even in their vocabulary. Later in the week, I challenged Obie to see if the word “week” replaces the word “iteration” as a convenience and to see if they were doing all things an ”iteration” would entail. The short answer is No! Clearly pivotal tracker desires a good look. Another tool Obie mentioned later in the week was Balsamiq... a tool used for screen mocking. Yet another tool to checkout.

Phillip Hanrigou
Phillip starts with the phrase “any sufficiently advance technology should feel like magic”. He continues with the fact that 30% of Amazon’s sales is based on recommendations. This was an interesting talk that had me flashing back to Matt’s talk on neo4j looking for synergies.

Oren Eini
Oren suggests that TDD doesn’t scale… with a beautiful picture of a motorcycle with training wheels. The major point is that unit tests are bad and scenario based tests are good.

Brian Marick
The take away… “I don’t want to care about anything until I absolutely have to care about it”… Love it!

Dave Thomas
Dave introduces us to vector functional languages such as KDB and k. These specialized languages are the workhorses behind querying ½ terabyte of data in 1 second for the hedge fund industry. Dave claims this is the most interesting technology to come along in 25 years. If interested, he suggests getting started with j from jsoftware.com

Eric Yew
Eric gives us a deep dive into GPUs. He was on the NVidia team that built the technology. Eric suggests that GPUs are making the industry rethink previous assumptions that are not correct in the GPU world. How would we code differently if:
  1. There were no costs to threads
  2. And no costs to context switching
The approach suggests that memory lookup is too expensive (seems like we keep hearing this :). It is cheaper to recalculate a value on a 100 threads then to retrieve from memory.

This model supports scale and throughput and not latency. Latency once again is sacrificed.

The memory model is upside down as well from the traditional general processors. There are 3 memory access points:
  1. global - ~ 100 cycles to get to it
  2. shared
  3. local – stacks
It appears that the old paradigm of being abstracted from the hardware is over. The benefit is that the price of 2 Teraflops is near nothing… thank you computer gaming community!

Day 3
Neal Ford
Neal did a wonderful job presenting a presentation on presentation :) at a conference on conferences… it was a very meta way to start the morning. The big take away is http://presentationpatterns.com/ a site and forth-coming book on presentation styles, techniques and anti-patterns. The biggest benefit of the book is its focus on the practical. It will include templates for good practices.

Amanda Laucher
Amanda starts by defining the classifications of assholes and morons which got a lot of mileage the rest of the day. Amanda focused on educating us on terminologies in the category theory space with examples in F#. Here was what I was able to capture:

* Catamorphism - a fold
* Anamorphism – unfold
* Endomorphis – takes a type and returns the same type
* Homomorphism – takes a structure and returns the same structure or shape
* Monoid – A Functor
* Monad – Applied context
* A Monad is a monoid in the category endofunctors

Dave Hoover
Dave discussed the value of Resque and how it help in the development of Groupon.com

George Malamidis
George continues the conversation on memory access times and threads, introducing us to Node.js as a high scale solution for the web.

Michael Nygard
Thoughts on consistency – what a great talk! Michael introduces the concept of an observer and a super observer in a system through an object lesson. He then follows it up with the fact that with in a database there are often inconsistent states… the point is at the end of a transaction there cannot be any inconsistent states. For example, it is possible that an insert into the database would result in a violation of referential integrity… this is ok and necessary until a commit occurs.

Consistency is based on time. Frozen time means no change.

Another practical example:
2 ebay users making the same query at roughly the same time get different query results. This is viewed as being ok, because from the perspective of the observer it is consistent (they have no idea). From the perspective of a super observer there are some inconsistencies. This led to some discussion in mainly of the corporate failures stemming from the complexity created when trying to maintain consistency from the perspective of a super observer (which is a fallacy and trap).

Aslask Hellesoy
Another agile talk… this one focused on visualization of the return on development through CFD graphs. The suggestion was that velocity is a questionable valuation.

References included: CFD Details and a google spreadsheet
Robert Martin
I don’t think you can summarize Uncle Bob… you have to experience him :)


Summary
I made a couple of interesting observation / assessments through the conference. The first is there was a large percentage of the speakers / attendees that had a strong EE background. I truly enjoyed revisiting challenges at this level again.

Here are some of my takeaways from the conference:
  1. We are at the beginning of a new era in field of computing. One in which we do not have a standardize model or tools
  2. There is a need for programming solutions which reduce (or leak) the abstraction from the memory model that is generally preferable in the previous programming model. It will be necessary to indicate that values in memory are near, far or very far. It will be necessary to be able to measure and understand the cache miss rate.
  3. We appear to be at our limits in our pursuit to improve latency. Worse…. Latency is the component that is consistently sacrificed to increase throughput.