Well, I was lucky got a chance to hear Tommy to talk about his Data Sciene journey @ plenty of fish (POF).

The conversation started with PROBLEM first, yes, we all have to understand what is the problem to solve, and why that is a problem. Sometimes, the basic question would surprise us, but how many of us cant wait to roll sleeves to get into tactics/details without deep questions about why we need it? (and the 5 whys, man, dont think many things could stand still with 5 whys). Anyway, the problem, that Tommy need to solve is the user behavior, most particularly, to boost the user engagement on the site by how much? you name it… yet to be discovered.

Then he goes into the technical details of the implementation, finding raw data and computing power, then build up the technology foundation with the problem and constraints (reminds me the play within limit). He went into #Postgres and #R for data storage and statistics/mining, and forward, he deep dived into each technical issue that faced, and solved with existing packages. I would like to ask him to share the slides, or maybe a blog post for the journey including all the packages details, so that others could use it as reference.

The other good idea, is about providing clear and simple action recommendations to the management team with his research results. For instance, if we implement feature A, we will see x% engagement increase; if we ditch page B, we will see -y% people leaving the site… etc. than providing a beautiful chart. Well, this is well served for the research program as he mentioned, for daily operations, the shinning interactive charts and the up / down trend is still helpful as my point. But he does not seem to be bothered on the day to day operation stuff. Which is another good skills we all shall learn, STAY FOCUSED. There are million battles, only one that we shall pick.

I do agree, the start simple, start small, and start sampling the data. We dont need all the petabytes, or all billion records, or wait #hadoop to analyze the data. It is always to start looking at a small data set and make it meaningful on the business terms, then when need it, scale.

Other lines either original or make ups.

Technology is there to solve particular problem, every technology does. So don’t hammer everything like a child.

Be wrong all the time. To be wrong all the time, is not a bad thing, every failure make you closer to the final glory.

I find he has no problem to discuss the beautiful things that in the future, i.e. give a vision while talking about the reality.