I work on a variety of projects, generally one or two at a time. Most projects last from 1- 3 months, although there are occasionally small testing pieces, and/or adhoc requests that minimally impact overarching project schedule. There is very little day to day repetition; however, techniques and tools are reapplied whenever possible.
Projects come in and are specified with the client. Possible data sources are specified. Sometimes external data groups are tapped to provide an extract, sometimes I pull the data from a database (although generally that doesn’t happen despite the title ‘database’) If the project is large enough, I’ll maintain what stage the project is in using OneNote and Project – throwing in research papers, prior work, schedules, scratch work, validation, and bits and pieces that might be useful for a final presentation.
Assuming an extract is required, I’ll usually specify a file format for a DBA to pull for me. In the interim, I’ll build a dummy dataset and start to write some SAS code (generally in Enterprise Guide – since that allows for some pretty quick manipulation. I will generally run a few standard procedures to test and validate the data, and possibly aggregate it differently than the original format – since there are generally a few things SAS can do better than SQL. If I have to go out and pull data differently – maybe SiteCatalyst data, or NPD data, or some economic data, I might get those data sets squared away while I wait for my data. Between data specification, extract requests and validation, I work with most datasets for a few days through this point. The bigger the project, generally the more datasets. Some datasets may need to be synthesized (indicator variables are routinely done as on/off flags). Finding that most parts of the business only work on ‘today’ and give very little thought to the past generally makes me sad… Once I have the data formatted properly for modeling, it is generally an asset and I pass the asset on to the DBAs for loading.
After I’ve got all my data together and validated it, 70% of the project is done. I generally build a decomposition – just a brief 10-15 minute presentation that gives a good idea of what the data looks like it is telling me. Sometimes this means I build the graphs in SAS, sometimes this means I write an interpreter and build the graphs in a different stat package called EViews – or load the data into some GIS software. Sometimes its Tableau time. At the end of it, I generally have a concise story of what the market and history of whatever I am studying is. The project is now 80% complete.
The next piece is generally building a model. Depending on the project, this may be multiple models using different techniques – it really depends on the question being asked. If I’m calculating time to contact, I might model contact like a repair problem. If I’m calculating effectiveness, I’m probably doing an OLS model – which may mean I start with a some techniques to eliminate colinear factors, and then use a stepwise regression to see what a rough model looks like. After that, smoke, mirrors, manipulation of form (log log / seasonally adjustments / anything to help make the model work.) I’m generally looking at making sure coefficient signs make sense and a few key stats that indicate I’m left with random noise. I generally don’t care about my R^2. After the model is built, sometimes its a solve the past with an alternate past scenario, and sometimes it is forecast the future. Any way it gets sliced, the goal is generally to make sure the model makes sense.
Model results are then added to the decomposition. A few slides get yanked. I’ll generally add a piece on contribution (dorfman-steiner) and possibly calculating a nice ROI. I might do a sub analysis on a special case. I may seek out exceptions to the rules of the model – just to poke holes in things. At the end of this, I write up a nice presentation. At that point, I vet the document with a few coworkers, and then start running it as a roadshow – presenting the paper to the principal client as well as a few other key folks. The road show is generally about 1/3 the length of the project, with 3 month projects generally taking about a month to complete the road show.
Other things that might get done, geolocational insight, purchasing habits, promotion effectiveness, calculating test results – basically everything and anything they throw at me.
I should also note, that sometimes projects may take up to 3 or 4 people. Generally I direct the analysis aspect of the project when that is the case. My strengths are generally in cutting through the filler and getting down to a subset of data that the client can chew on. I’m starting to get into data visualizations that none of my current software can do…so that means I’ll either be learning a new piece of software soon, or I’ll be finding another way to do it.
If I didn’t like data, this would be hell.If I didn’t like thinking, this would be hell. If I didn’t like my coworkers, this would be hell. I’ve been lucky.Add to favorites