Building a Modern Data Dream Team
It is quite different than you think!
Everyday new buzz words and startups around data are springing up like mushrooms!
The graph above represents an exponential increase in job postings on Indeed. I am sure there would be similar trends in other platforms. So, in this world of chaos how do you find order and build your modern stack rockstar data team? Let’s find out.
Where are you in this journey?
Firstly, let us understand your position in the modern world of data. This is an important step, it will help your engineers to prioritise your stack.
This assessment is essential for developing the roadmap for data journey. Let us say that your primary need is to understand how are your marketing efforts paying off, that is, which channels(google ads/facebook ads) are leading to more conversions? This will put you in the middle of the block and your data engineers would need to build the ETL/ELT pipelines accordingly. Similarly, if you want to understand why your customers are dropping off? In this case historical data analysis is more important. Once this step is clear only then we should proceed to do predictions.
Who will be the core players in the team?
Usually a modern data team has the following core players -
Data Engineer - They should the first hire in this team, if your data is not at all in place. They are responsible for building data ETL/ELT pipelines, maintaining them and handling various data stores like Amazon S3 buckets, Data Warehouse(Redshift for example) and Data Marts. They will be the ones who can speed up the APIs by optimising the SQL queries, tuning your databases etc.
Data Analyst - They create actionable reports/insights for your startup/organisations, based on the data which you have already and prepared by data engineers. Now, the choice of this report is a combined effort of business and engineering. Don't think that the DA can do this alone.
Data Scientist - They add the most value when they are hired around the time an organisation is ready to do predictions. They use advanced mathematics and statistics, and programming tools to build predictive models.
How to hire these key players?
Usually the skills to look for in DE are the following -
Data Processing Framework - It can be either Spark or Flink. It totally depends on what kind of processing needs to happen. If you want real time streaming analytics then Apache Flink makes more sense, else Apache Spark is a very mature tool.
Knowledge of orchestrator - Airflow, luigi etc. The ETL/ELT jobs or the report generation jobs can need to be scheduled properly. These tools come in handy there.
Knowledge of AWS EMR/ AWS Glue/ KDA etc(depending on your cloud provider).
Knowledge of different databases like OLAP - Redshift etc., OLTP - AWS RDS(postgres) etc.
Various File Format - Parquet, Orc, Arrow, Avro.
Similarly, if a Data Analyst has following skills it can suffice unless you have special requirements -
Good Communicator
EDA skills - Ability to explore the data efficiently and gather insights from them.
BI Tool skill - Superset, Tableau etc - Essential for generating quality dashboards and reports
For Data Scientist it is tricky. Any Data Scientist should have the following skills -
Good programming understanding (it is a myth that DS need not be good programmers, a bad coder DS creates a lot of technical debt, I will soon write a post about hidden technical debt in machine learning systems).
Classical ML knowledge - Decision Trees, Linear Regression, etc. - Most of the business problems can be tackled with classical ML and statistics.
Exposure with tools like sklearn, pandas, numpy, etc
Statistics and Mathematics.
Now based on your special requirement around what data you want to work with. There will be special requirements for computer vision, natural language processing, automatic speech recognition etc. Reach out to me if you want to talk more about it.
A Data Team should be a widget
A data team should ideally be run as a product team. The primary problem a data team faces is “analysing something which is not present in the data”. In order to not fall into conflicts, a data team should closely work with product managers.
Every report data team generates is a product, it requires same amount of effort and testing as any other website. The impact which a report has is immense, a wrong report can negatively affect the business.
A data team can be thought of as a widget which can be plugged into various teams in your organisation. They can work with marketing, finance, content, operations etc.
Project Management of the Data Team
Project management of a data science team is a task! Scrum methodology can be easily applied in the Data Engineering and Data Analysis aspect of the Data team, but data science can sometimes fall behind.
Data science is an explorative field. This team will make progress in bursts and often produce nothing, as a project manager one would need to cushion for this.
I can mention a few reasons why scrum may not be a good idea -
Scrum leads to narrow focus - Data Teams flourish when they are cross-pollinated with other teams and projects.
Hard to give estimates - Data Scientists will often be solving a lot of open ended questions. It would take multiple iterations for them to provide a solid outcome. Hence continuous delivery is a much better solution rather than end-of-sprint releases.
I have seen some successful application of Kanban methodology in data science. Hence a good method would be to use Kanban in the data team altogether.
Final Thoughts
I think this post might have given you a good starting point to think about building your modern data team in your organisation. The more honest we become in understand where we stand and what we want to analyse, the better the data team becomes. The data team is there to give you meaningful answers from the data you possess but we need to learn to ask right questions too!
I hope that you enjoyed this read. The next post will be more technical hence I hope that you will share this newsletter with your techie friends and this post to your CEO/CIO/VP buddies. I will meet you next Sunday in your inbox. See you!