Wednesday, February 17, 2010

Reporting from SpeakerConf 2010

I was able to attend speakerconf 2010 this year in Aruba. It was an amazing gathering of talented leaders in the software development space. Each day of the conference started with several 15 – 30 minute talks lasting roughly 3 hours. According to the retro at the end of the conference, most people enjoyed and preferred these short talks as a primer to longer conversations in the afternoon along the pool or beach. Most talks were met with the struggle to maintain time often due to the many follow on questions generated by the subject. As far as I know the subjects were selected by the speaker and they were generally not preannounced. It is impossible to capture the essence of the conference and I make no effort to cover the most valuable part of it… the evening free form discussions. Most of the information went so fast that it was difficult to mentally be engaged and take notes. Here is a glimpse of what was discussed.

Day 1
Stu Halloway
Stu introduced the ADD development… which is alcohol driven. Stu then jumped into clojure, introducing a private testing project he is working on called circumspect. Stu offered up some advice for working in clojure:
* be less clever
* idiomatic clojure
* make adjectives not nouns

The focus on circumspect was to provide an idiomatic testing option for functional testing in clojure. Stu mentioned that through his testing exploration that an test he created actually detected a bug in the clojure code.

Dennis Byrne
Dennis is the first to discuss a topic on hardware and general process constraints in a talk titled Memory Barriers. The challenge is that virtual machines and hardware interfaces reorder code in ways most developers are not aware of. Fences or memory barriers define for the VM what can and can’t be reordered. There was some discussion on the expensive nature of these barriers.

Matt Deiters
Matt introduces a challenging problem that he has been working on in a story about vertex. The problem stems from social network analysis… it doesn’t work in SQL. Their previous implementation in SQL might take an hour to find all the details need for just 2 degrees of separation in a social network. Matt introduces neo4j an open source nosql graph database and neo4jr-simple, which is a RESTful ruby API. I enjoyed the fact that Matt wasn’t a No SQL bigot… He espoused a “Not all SQL” approach, using alternatives where SQL databases fall down.

Fred George

Fred displayed a triangle with management, story and programmers (or workers) at the vertexes. He wanted to focus on the relationship of the story to the programmer. Fred focuses on a Lean process and raises the question if acceptance tests are really lean? This stirred some debate. In the end, It appeared to me that what was being argued in most cases was the definition of acceptance tests vs. other types of tests. I don’t know if I agreed with Fred’s position, which we discussed at length the rest of the conference. My favorite part of his talk was a list of words or phrases that are flags that your project isn’t Agile. They include: Review, I need it in writing, write me a ticket, plan, checklist, code freeze, gant chart.

Brian Goetz
Brian’s talk is on how we find ourselves in this mess with multi-cores. Brian gives a wonderful history of processors defining them in eras: CISC era, RISC era, and Multi-Core era. The big take-aways for me include:
* a move from explicit to implicit parallelism
* moore’s law favors bandwidth over latency
* cost of memory access

With the speed of today’s processors, it takes 200-300 clock cycles to get 1 memory fetch, where it used to be 1 cycle. As latency increases we need more caches. Memory access times by clock cycle are roughly:
* Register ~ 1 cycle
* L1 ~ 3
* L2 ~ 15
* RAM ~ 200 – 300

Brian states “Memory is the new disk”. Clearly memory access is the new problem to solve in computer hardware. Brian also states that there is no programming model for this new multi-core era. Based on the talk, it appears to be that the industry has a new chicken or the egg problem. Hardware is struggling to evolve, because there is no software development model. Software is struggling because there isn’t a new standard. Brian concludes with latency is the enemy and the industries need to provide development teams with new tools. There will be a growing need to understand the memory access model of a running program, which can only be determined at run-time at the chip level. It will be necessary to understand the Integer pipe and memory cache misses. At the end of Brian’s talk, Nygard mentioned Chuck Moore’s Multi-Core chip where each core runs a tiny forth program. It is something I’ve added to the list of things to look into.

Michael Feathers
Michael discusses challenges to object oriented and functional language hybrids. There is background that I didn’t record, but one of the great observations is that functional programming is a “tell, don’t ask” model and object oriented programming is a “ask, don’t tell” model. In the FP approach, programming logic and data seem to bubble up the call tree and in the OO approach the opposite is true, logic seems to be pushed down the tree. Michael then looks at what it looks like conceptually if FP is on top of OO or OO is on top of FP. Michael then introduces the group to an algorithm used to solve morphological watersheds :)

Day 2
Steve Vinoski
Steve introduced us to a product his startup company Verivue, Inc is producing for the big media companies for streaming video. The IO throughput numbers were unbelievable. The software at the core of the product… Erlang!

Obie Fernadez
Obie and Desi introduced us to the agile practices of Hashrocket, indicating that they have evolved past iterations and they question the value added by iteration activities. They develop in one-week windows and use Pivotal Tracker for tracking stories and progress. The term “Iteration” isn’t even in their vocabulary. Later in the week, I challenged Obie to see if the word “week” replaces the word “iteration” as a convenience and to see if they were doing all things an ”iteration” would entail. The short answer is No! Clearly pivotal tracker desires a good look. Another tool Obie mentioned later in the week was Balsamiq... a tool used for screen mocking. Yet another tool to checkout.

Phillip Hanrigou
Phillip starts with the phrase “any sufficiently advance technology should feel like magic”. He continues with the fact that 30% of Amazon’s sales is based on recommendations. This was an interesting talk that had me flashing back to Matt’s talk on neo4j looking for synergies.

Oren Eini
Oren suggests that TDD doesn’t scale… with a beautiful picture of a motorcycle with training wheels. The major point is that unit tests are bad and scenario based tests are good.

Brian Marick
The take away… “I don’t want to care about anything until I absolutely have to care about it”… Love it!

Dave Thomas
Dave introduces us to vector functional languages such as KDB and k. These specialized languages are the workhorses behind querying ½ terabyte of data in 1 second for the hedge fund industry. Dave claims this is the most interesting technology to come along in 25 years. If interested, he suggests getting started with j from jsoftware.com

Eric Yew
Eric gives us a deep dive into GPUs. He was on the NVidia team that built the technology. Eric suggests that GPUs are making the industry rethink previous assumptions that are not correct in the GPU world. How would we code differently if:
  1. There were no costs to threads
  2. And no costs to context switching
The approach suggests that memory lookup is too expensive (seems like we keep hearing this :). It is cheaper to recalculate a value on a 100 threads then to retrieve from memory.

This model supports scale and throughput and not latency. Latency once again is sacrificed.

The memory model is upside down as well from the traditional general processors. There are 3 memory access points:
  1. global - ~ 100 cycles to get to it
  2. shared
  3. local – stacks
It appears that the old paradigm of being abstracted from the hardware is over. The benefit is that the price of 2 Teraflops is near nothing… thank you computer gaming community!

Day 3
Neal Ford
Neal did a wonderful job presenting a presentation on presentation :) at a conference on conferences… it was a very meta way to start the morning. The big take away is http://presentationpatterns.com/ a site and forth-coming book on presentation styles, techniques and anti-patterns. The biggest benefit of the book is its focus on the practical. It will include templates for good practices.

Amanda Laucher
Amanda starts by defining the classifications of assholes and morons which got a lot of mileage the rest of the day. Amanda focused on educating us on terminologies in the category theory space with examples in F#. Here was what I was able to capture:

* Catamorphism - a fold
* Anamorphism – unfold
* Endomorphis – takes a type and returns the same type
* Homomorphism – takes a structure and returns the same structure or shape
* Monoid – A Functor
* Monad – Applied context
* A Monad is a monoid in the category endofunctors

Dave Hoover
Dave discussed the value of Resque and how it help in the development of Groupon.com

George Malamidis
George continues the conversation on memory access times and threads, introducing us to Node.js as a high scale solution for the web.

Michael Nygard
Thoughts on consistency – what a great talk! Michael introduces the concept of an observer and a super observer in a system through an object lesson. He then follows it up with the fact that with in a database there are often inconsistent states… the point is at the end of a transaction there cannot be any inconsistent states. For example, it is possible that an insert into the database would result in a violation of referential integrity… this is ok and necessary until a commit occurs.

Consistency is based on time. Frozen time means no change.

Another practical example:
2 ebay users making the same query at roughly the same time get different query results. This is viewed as being ok, because from the perspective of the observer it is consistent (they have no idea). From the perspective of a super observer there are some inconsistencies. This led to some discussion in mainly of the corporate failures stemming from the complexity created when trying to maintain consistency from the perspective of a super observer (which is a fallacy and trap).

Aslask Hellesoy
Another agile talk… this one focused on visualization of the return on development through CFD graphs. The suggestion was that velocity is a questionable valuation.

References included: CFD Details and a google spreadsheet
Robert Martin
I don’t think you can summarize Uncle Bob… you have to experience him :)


Summary
I made a couple of interesting observation / assessments through the conference. The first is there was a large percentage of the speakers / attendees that had a strong EE background. I truly enjoyed revisiting challenges at this level again.

Here are some of my takeaways from the conference:
  1. We are at the beginning of a new era in field of computing. One in which we do not have a standardize model or tools
  2. There is a need for programming solutions which reduce (or leak) the abstraction from the memory model that is generally preferable in the previous programming model. It will be necessary to indicate that values in memory are near, far or very far. It will be necessary to be able to measure and understand the cache miss rate.
  3. We appear to be at our limits in our pursuit to improve latency. Worse…. Latency is the component that is consistently sacrificed to increase throughput.

5 comments:

Anonymous said...

Nice summary and some great tidbits to chew on. Wish I could have been there!

Michael Easter said...

What a lineup of speakers! Thanks for the review, Ken...

Chris Bird said...

From Michael Nygard, "This led to some discussion in mainly of the corporate failures stemming from the complexity created when trying to maintain consistency from the perspective of a super observer (which is a fallacy and trap)."

In principle this is right, but there are situations where the "super transaction" depends on a whole lot of autonomous actions taking place. Imagine someone with a clip board (the super observer) whose job it is to know that all the other autonomous and parallel jobs have been done. Definitely don't want normal transaction semantics here, yet the "super observer" must make the ultimate decision. Complex Event Processing (where the events are the completions - normal or otherwise - of the individual autonomous jobs) might bring some thinking to bear.

Ken Sipe said...

@Chris
Certainly there are cases or contexts where the need for consistency from the perspective of a super observer is necessary... that to was discussed. I do think that in general more corporate architects do not recognize the distinction between when super observer consistency is appropriate to a context and when it is not... or in more challenging organizations they lack the understanding of the different perspectives of observation completely.

Anonymous said...

This is the first time I am visiting your site and happy to read this post. This site gives the light in which we can observe the reality and it is very useful one and gives in depth information. thanks for this sharing this article.

Joomla developer