-
Notifications
You must be signed in to change notification settings - Fork 3
/
_05-phase4.Rmd
94 lines (51 loc) · 20 KB
/
_05-phase4.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# Deep dive into Phase 4: project definition {#definition}
If you have followed the steps outlined in phases 1-3, you will have now determined that the project is viable. You will have confirmed that, in principle, the project makes contextual sense and that it is likely to bring value to your stakeholder, and you are now ready to think in terms of the specifics about how you will deliver the work. Congratulations! This represents a major crossroads for any project, and you should be proud of yourself for having had the discipline to work through Phases 1 – 3 carefully. Now you are ready to start your project in earnest!
But hold on! Before you start writing code and delving into the work, you must plan the project – that is what Phase 4 is about. This is the design phase, where you lay out in detail the approach you will take, the steps that will be included and the resources that will be required. We often hear people say things such as, “Data science is research...how can I say how long it will take if I don’t know what I’m going to find?” This is a valid concern – in an ideal world, you would have all the time and money you need. Sadly, this is an example of the difference between the ideal world and reality: if you are working with a client/manager, you will need to inform that client/manager what will be required in terms of time and money. Can you be certain that your estimations are sufficient? No. But you can make reasonable estimations with the information you have, and this is where the art of project design comes in.
It isn’t always possible to have a robust plan but having a flawed plan is always better than none at all. A plan helps break down the task into smaller more manageable tasks that can be done and progress can be shown. It also helps everyone understand the deliverables and the course of action. Changing a plan can always be done but operating without a plan will always be slower than without one.
This phase uses a different type of thinking than is normally found in scientific research. While research generally uses deductive (the reasoning of logic) and inductive (the reasoning of forming general rules from observations) reasoning, the synthesis required for design thinking uses intuition, or more accurately, abductive reasoning – the reasoning of developing one of several possible solutions for a problem. More specifically it is seeking the simplest solution to a problem without any formal validation. For a more detailed explanation of how designers think, we recommend the book Design Thinking by Nigel Cross. For the purposes of this writing, what is important is to understand that this relates to the process of identifying a set of required functionalities or purposes and crafting a solution that satisfies those purposes.
::: {.infobox}
**What is abductive reasoning?**
Abductive reasoning starts with an observation or set of observations and then seeks to find the simplest and most likely conclusion from the observations. This process, unlike deductive reasoning, yields a plausible conclusion but does not positively verify it. Abductive conclusions are thus qualified as having a remnant of uncertainty or doubt, which is expressed in terms such as "best available" or "most likely".
:::
For many, project design is difficult, ambiguous, stressful and downright unpleasant. Indeed, this phase requires you to commit to delivering a given bit of work in exchange for a certain budget, and you want to get it right – who wouldn’t find that responsibility stressful? You inherently can’t know exactly what it will take to deliver a successful outcome because that insight only really comes from the work itself. But if you have done a good job of scoping the project in Phases 2 and 3, then you should have a fairly good idea of the complexity of the task in front of you.
## Developing a project plan
We suggest you begin by thinking at a high level. Where are you starting from, where do you want to be when the project is completed, and what are the logical steps you need to take to get there? What are the functionalities that are required, and which ones would be nice to have but are not critical? What are the novel concepts and approaches you will have to develop? Are there logical intermediate steps along the way? Does something depend on something else being completed first?
In most cases, we break a project down into stages with aims and milestones. We find it useful to clearly define what work will be carried out in each stage and what will be delivered at the end of it. Sometimes a “deliverable” is merely a short report, a small presentation or a conversation with your client/manager. In other cases, it can be a piece of software or a tangible functionality that can be demonstrated. Exactly what the deliverable at the end of each milestone looks like is project-dependent.
In the best cases, each milestone in itself is a valuable step forward for the stakeholder. This de-risks the entire project. For example, a project may be aimed at building an interactive data analytics dashboard with four milestones along the way. Milestone 1 may be an in-depth exploratory analysis that can yield important insights about the data. If their project were to end there, the work would still have brought value. While this is not always possible, we suggest you keep this in mind as something to aim for when crafting your project timeline and stages/milestones.
Throughout this process, it can be useful to consider pivot points or alternative outcomes. For example, many projects have milestones that are inherent points for decision-making. If that is the case for your project, be sure to communicate that clearly in your project plan. When making these decisions, try to balance flexibility and open-mindedness with a clear view of business value.
In the earlier scoping phases, you will have determined what the requirements of the project are in terms of outcomes and deliverables. If these include data products that require engineering or deployment (as opposed to projects primarily focused on the generation of insights and models), you will need to include corresponding milestones. Data cleaning, code refactoring, optimisation and productization can be time-consuming, as can deployment of your product and the development of sound ETL pipelines – be sure to budget for these in your project plan and think carefully about what a sensible development plan looks like.
## Skills/expertise required
As you plan your project, the required technical and non-technical skills should start to become clear. Many skills can be thought of as more general, such as coding, machine learning or results orientated. Others are more specialised, such as natural language processing or graph theory. And yet other skills are harder to come by, the domain-specific expertise. Ensuring you have the industry-specific experience to deliver the project or make sure that you have willing sponsors within the company who will set aside time to work with you on understanding any industry quirks and nuances.
Your approach to meeting the skills required for a project is, of course, dependent upon the composition of your team. If you are designing a project that you will work on individually, then it is up to you to make sure you have the skills required in your repertoire. If you don’t, you may need to consider bringing in outside help or budgeting for the time it will take to level-up your shortfall. Often projects are designed for teams of data scientists, in which case you will want to make sure that the skills required are matched collectively by the team. Data Science as software development is a team sport.
> “If you want to go fast, go alone. If you want to go far, go together.” -- African Proverb
## Determining the cost
You will almost certainly have to put a price tag on your project. For us, this is essentially a calculation based on how long we think the project will take and the billing rate for the staff. Junior/mid-level/senior data scientists all have different rates to consider in this calculation. The staff billing rate is normally constant, so we tend to focus primarily on the duration of the project.
As discussed above, getting this right can be difficult. If your estimate is too low you may not have the time needed to complete the work; if it’s too high, you risk losing the contract. The latter case is especially relevant if you are competing against other providers, for example in the case of a proposal written in response to an RFP (request for proposals). For other projects, the cost factor might not be important at all as employee projects often overrun without any negative consequences. For those to whom the cost factor is important, we don’t have a magic formula that tells how to strike the right middle ground. What we can do is highlight some considerations that we find helpful in the process and offer our reassurance that it gets easier to make this judgement as you become more experienced.
Breaking the project into milestones as described above can help: it’s easier to estimate the time required for smaller tasks than for larger ones. Understanding the supporting data will also help, which is a major reason for the scoping work described in Phase 3. This is also a place where having a solid network to turn to for advice might be very useful. At Pivigo, for example, our data team discusses each project plan that is being written to get as many different points of view as possible. If you have access to a network of peers, going through such a sense-check can be a very valuable process. Experience is key, look at different case studies or ask colleagues about similar projects and how long it took to do those.
It can help to anticipate places where the work is at a higher risk for delays. For example, you may have made assumptions about the data structure based on your scoping work, only to find out that some of these assumptions are not correct. The data coming in may have changed in structure or location, or unanticipated factors may have worked to introduce more missing values than you expected. You can’t anticipate every possible roadblock, but if you can identify places where your progress is most vulnerable to problems, you can make a conservative estimate.
It is also useful to consider the business when budgeting for a project. Have you worked with this company or person before? If you have, and if the project was successful, you probably have earned a degree of trust that can go a long way in convincing them that your proposal is reasonable. In contrast, if the business is new to data science, or new to you, you have yet to earn their trust and may want to be more cautious in how you cost your project. Similarly, if you feel that your client/manager has the potential for being demanding and hard to please, it may be a good idea to err on the side of caution by budgeting a bit generously. In these cases, the costs of underestimating the required time are high.
On the other hand, if the project is exploratory, if your client/manager is just looking to see the “art of the possible” or if your project is restricted to creating insights, you may feel a bit braver and choose a slightly lower budget: the costs of underestimating the time required are lower. Or, to think of it in another way, even if you only accomplish 95% of what you would have liked, that is still very valuable for the business.
:::{.smaller}
```{r echo=FALSE}
mytable = data.frame(
Pricing = c("Under-budgeting", "Over-budgeting"),
Potential_benefits = c('- Your proposal may be more attractive to the stakeholders. \n- The potential to show your client/manager good value-for-money \n- Get a “foot in the door”/opportunity to prove yourself', '- Extra time padding can give you a greater chance of success \n- Greater chance of overdelivering and delighting the client/manager \n- Greater chance of more work, due to happy client/manager'),
Risks = c("- Your client/manager may be sceptical that you can deliver \n - You may not be able to deliver \n - You are vulnerable to expected problems \n - Setting the expectation too low for the follow on work (assuming you get it)","- Your price may be too high/losing the work"))
pander::pander(mytable, keep.line.breaks = TRUE, style = 'grid', justify = 'left', split.cells = c(5,25,25))
```
:::
<br/>
It’s also useful to think about aspects of the project that are “must-haves” versus those that are “nice-to-haves”. At the very least, your budget should give you enough time to safely deliver the must-haves. Taking a slightly more risky approach to the nice-to-haves may be more aligned with your client’s or manager’s appetite. This is especially true if your client/manager is on a tight budget. In such cases, an approach we often take is to write a proposal with several costs: a cost for the essential (must-have) work and additional add-on costs for the nice-to-have bits. We find this resonates with clients/managers who are nervous about spending a lot of money on a project that has yet to show good ROI (return-on-investment). We also recommend treating your proposal as a step in a back-and-forth conversation with the client/manager. Invite the client/manager to comment on the content and the plan, and be open to the possibility of changing the plan if the client/manager is not comfortable with the initial version. While you may not be willing to make sacrifices in your rate (and generally you should not be), you can adjust the scope of the project to better align with your client’s/manager’s needs, wishes, concerns and budget. This will not only help you to find the right balance for your client/manager, but it will also help to build trust with your client/manager in the fact that you are trying to work with them to produce something that is useful and has good value-for-money.
We also encourage you to mention any other costs that the client/manager may incur. For instance, if your work requires the use of a virtual machine or the creation of a database that will be hosted on the cloud, these are costs that the client/manager will want to know about. Hosting your solution can also bring with it security concerns and other maintenance costs or complications. Be sure to be as thorough as possible. This will help your client/manager to understand the total expense of the project. It will also help in your efforts to build up a trusting relationship with your client/manager. Or, to put it another way, surprising your client/manager with unexpected expenses can undermine the trust that you are striving to build. In short, view your work with your client/manager as a relationship that has to be built and nurtured, and do your best to approach it with empathy for all the people who are involved.
## How to manage the project
At this point, you have a project plan and you have accounted for the technical skills that will be needed to bring it to fruition. However, a final consideration is still outstanding: how will you manage the project?
As above, if you are working on a project independently, then this is probably a fairly easy question to answer. If you are designing the project for a team, then planning for the project’s management will be critically important. Either way, we suggest that you take some time to consider exactly how you will work and how you will interact with your client/manager.
Project management is a large field – we cannot give it justice in this small section. When done well, a project flows smoothly and has a clear road-map and an efficient system for sharing the workload. Everyone is happy with the work and they all feel like they are contributing towards a common goal. When done poorly or not at all, a project can stagnate and become aimless, deadlines are frequently missed and the project outcomes may not align with the project goals. Good project management cannot turn every project into a masterpiece, but it can go a long way in keeping a project on-track, focused and successful.
For some teams, a scrum approach can be useful, although you should bear in mind that this is designed for software development and there do exist significant differences between this field and that of data science. In our work, we tend to adopt an agile methodology (Scrum or Kanban), in which we work in discrete sprints and organise our tasks in the form of discrete issues. Other approaches also exist, such as the more traditional waterfall model. Tools such as Kanban boards and Gantt charts can be very helpful in planning out your project and breaking down the major phases of the project into tangible, bite-sized pieces. Even if you are a team of one, we have found that the formality of organising our work in this way can be very helpful in keeping the focus on project priorities.
In addition to planning out how you will work and communicate with your team, you should also make a plan for how you will interact with your client/manager. You should consider how you will communicate with your client/manager on a day-to-day basis (we have found Slack to be very useful) as well as how and when you will give progress reports and project updates.
We also encourage you to think about how you intend to structure your codebase. We generally use Git and GitHub for version control and code sharing, and we would encourage you to build your repository’s directory structure before any code is added. We highly recommend the [“cookiecutter” family of project templates](https://github.com/cookiecutter/cookiecutter){target="_blank"} and have found the [Python-based data science template](http://drivendata.github.io/cookiecutter-data-science/){target="_blank"} to be a good fit for most of our Python-based projects. It includes a guide to best practices for directory structure and naming conventions and contains a built-in make functionality that can be useful for managing the steps required to build the requisite datasets and functions for your project. For R-based projects, one option is the [ProjectTemplate](http://projecttemplate.net/){target="_blank"}, although others also exist. At the minimum, we encourage you to work within the [framework of an R Project](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects){target="_blank"}.
## Evaluate the plan
If you have followed the steps above, you will now have defined your project. You will have a project plan that includes milestones, a timeline, potential pivot points and clearly-defined deliverables. You will have determined the composition of the team needed to execute the project and have thought about how the team will work together and liaise with the client/manager. You will have also created a budget for the project based on the time required, cost of staff and any other costs required. You probably can’t wait to send your proposed plan off to the client/manager and get to work!
But before you do, we encourage you to look back at the four levels of project evaluation outlined previously. Think about the business case and the larger context of the project in relation to the business strategy, and ask yourself if what you have planned will be impactful. If you are not convinced that it will be, you may need to reconsider your plan.
As a parting piece of advice, we suggest that you take the time to make a shortlist of ways that the project is likely to be impactful and to add a sentence or two to your proposal highlighting this. While it may seem obvious to you, and you may feel that it should be obvious to the client/manager, it can help to remind them about why your project will bring value to their business and reassure them that this proposal is a worthwhile investment.
Bear in mind that if your client is a company, the person you have been liaising with may not be the final decision-maker or the one who controls the money. The proposal you write may be passed on to executives in the company whom you have never met and who don’t have any understanding of the project. Explicitly explaining why your proposed project is likely to bring value to them on both the business and contextual levels will make it easier for that person to say “yes”.