Academia is a thankless profession. You spend your life becoming highly trained and a world expert in your field. However, the constant moving around, the instability and uncertainty becomes less palatable as family and priorities change. At some stage you make the decision to jump to industry, with data science well suited to your analytical/coding/statistical/quantitative background. How does one change careers? And what can you do to get your foot in the door? I’ve been there; it can be challenging and there is no one sure path. I wanted to provide some thoughts and set expectations as you embark on your new career.
Introduction
Hello weary reader. My name is George, and I approximate a data scientist. If you asked me what my profession is, I would still consider myself an astrophysicist – it’s what I always dreamed about growing up, what I spent most of my life studying towards, and what I still think about most days. But it’s been over five years since I last picked up a paper on the topic or ran a stellar model. When I speak to my former colleagues, they’re curious about my day-to-day work. In academic circles, I describe my job as solving optimisation problems in business, with data science as one of the many tools I use. However, without any formal training in the area, I don’t ‘feel’ like a real data scientist. Like many of you, I don’t expect that imposter syndrome to ever go away. It’s common and a healthy acknowledgment of how complicated and diverse the field is. It’s just simpler to call myself a data scientist, as apart from luck and hard work, one needs to find their niche and market themselves to employers. The data scientist label has the widest appeal in a competitive workforce that values specialisation and experience.
I have agreed to regularly contribute to the Konnect with Data Writers Hub. There’s a lot of data science content out there. Some people write because they are extremely passionate about the field; others use it for self-marketing purposes to help themselves stand out. For me, I have truly transitioned from academia to industry and hope some of my learnings and experiences might help some of you on your career journey. I’ve lived overseas for long periods, changed careers, and dealt with the personal and professional hardships life throws at us all. I don’t consider data science my passion or calling – in short, I am too old to have anything to prove. I am hopeful that my career path offers a pragmatic and measured point of view and demonstrate that a straight line is not the only way to travel between two points.
I wanted the opening of my first article to convey a little about myself and my motives so that you can quickly determine whether, as an author, I might resonate with you. I’d like to stress that the following opinions are based on my experience and the small sliver of the industry I’ve worked in. These are not absolute truths – it’s okay to disagree with what’s written and bring a different perspective.
Transitioning from Academia to Industry
There is a high attrition rate in academia due to a funding model favouring post-doctoral positions and PhD stipends. In Australia, we are lucky to have government and faculty-supported PhD scholarship schemes, but elsewhere, PhDs may only open up if the group leader has secured a research grant. In my field of astrophysics, there are about ten times as many PhDs as there are professorships. This was the statistic I heard back in my day, and I wouldn’t be surprised if the numbers are even more skewed today. It’s for this reason we find many scientists with quantitative backgrounds transitioning to data science.
So how does one start out in the field? The answer depends on who you ask, their psychology (locus of control), and personality. My experiences and upbringing have me perceive events in the world as:
1. Able to be described by a distribution
2. Able to be framed as an optimisation problem
Like any good academic, let’s start with background reading to understand the problem. Then look at how we can incorporate these points of view to help us optimise the chances of success.
What are Companies Looking For?
Although most people don’t explicitly think about recruiting this way, your CV, your interviews (cultural and technical), and your competitors all combine to establish an optimisation problem for the company. They are seeking to minimise risk and optimise efficiency (which is usually a proxy for gross profit).
If possible, companies do not want to spend time training people. Not only do they want an expert, but they also want someone who is an expert at everything. The people interviewing you are likely overworked and under-resourced. If they can find some efficiency gains through a hire, they will jump at it. When constructing a role description, they will describe their ideal candidate, who very rarely exists. If you are applying for a job where you tick every box, you are probably overqualified. If you tick 50% of the boxes, you are probably uncompetitive. I leave it as an exercise to the reader to find the optimal condition.
What are You Competing Against?
Market forces, obviously. While I firmly believe data science should sit centrally in an organisation (i.e., under Finance and the CFO), it will likely fall under technology or find itself closely intertwined with IT. Job supply will ebb and flow accordingly. Right now, there is an overall shrinkage in the IT sector. Revenue is generally up, but costs for companies are high. Times are tough for developers and software engineers, but data engineers are in high demand. Data scientists are expensive; their ROI timescale might be longer than many companies are willing to consider at the moment. Sometimes data scientists struggle to quantify the value they bring to the table, leading companies to want to outsource this capability to consulting firms. This provides opportunities for those seeking a career in consulting, but that’s not for everyone. Right now, there is a small data science bubble on the back of generative AI and the large language model hype. However, my personal view is that data science is finding its equilibrium, and many companies that thought they needed data scientists are realising the insights they are seeking can be answered through business analysts and data analysts.
The supply of data science applicants does not decrease. Through hype and marketing, it has become one of the ‘it’ roles of our age. Students think that the title will guarantee high-paying salaries and they will be forever in demand. Educational institutions have caught wind of this and are offering data science and training degrees of various rigour. The pools have become diluted, and it’s not clear what abilities these courses deliver. There is more to data science than feeding perfectly curated training sets through a library. Having said that, I feel that a bifurcation in the field is on the horizon, leading to a new type of data scientist – one that does not require deep knowledge of LLMs or statistics, but rather the ability to service and tweak LLMs for business needs. This is probably evident by the amount of generative AI consulting, start-ups, and marketing.
Who are You Competing Against?
• People with industry data science experience
• Data Science/Computer Science graduates
• Other quantitative PhDs looking to make a jump
• Graduates in other fields with some form of data science accreditation
A recruiter might get 200-300 applications, 80% of which will be non-competitive. On paper, and all things being equal (e.g., market-competitive salary), my list ranks the strength of the priors recruiters apply when selecting candidates. Recruiters are the broad-spectrum filter for any role. They need to get that large applicant pool down to 3 or 4 candidates for those under-resourced and overworked people to interview.
Working in Industry
Industry will present a new way of working and generally will require an adjustment period. No two data science roles are the same, and no two companies are the same. How far outside your data science wheelhouse you need to go will depend on the size and resources of the company hiring. You may need to:
• Understand that solving a problem is not necessarily an enterprise solution.
• Deliver in sprints and work in an agile manner (this is not a good delivery methodology for data science, but that’s another story).
• Conform to company coding best practices and management. Understand productionised code.
• Understand code deployment and release cycles.
• Accept that perfection gets in the way of good enough.
• Accept you don’t need to understand everything (just enough) and you don’t need to do everything yourself.
• Be able to multitask and meet tight deadlines.
• Understand the modern business cloud environment (AWS, Azure, GCP).
• Understand DevOps, MLOps, and Infrastructure as Code.
• Understand principles of data engineering, data warehouses, data lakes, and lakehouses.
• Understand principles of data governance and management.
• Have good stakeholder management and communication skills. Explain ideas, concepts, and results to both technical and non-technical audiences.
• Require skills of business analysts and data analysts. Understand business processes and tease out what the user is asking for from ill-defined requirements.
• Manage projects, prepare business cases, and documentation.
• Have knowledge of security, APIs, and model serving and integration patterns.
There’s probably more that could be added to the list, but that’s what came to mind without thinking too deeply about it. The point here is that data scientists are expected to be versatile and agile in their role. Many of these skills you will carry over from academia, so make sure you can demonstrate that succinctly.
Bringing it All Together
We now understand the state of play. The truth of the matter is it can be difficult to get that foot in the door. Having said that you come from a research background, you are smart, capable and self-driven. There are things you can do optimise your chances to land that first role.
As I mentioned early in this piece, I tend to look at things probabilistically and or as an optimisation problem. Your first hurdle is to get past the recruiters. They are helping the company with the risk minimisation component of their hiring. Applicants already in roles with a proven track record will pass this filter. It’s easy to see why. A large portion of the risk has already been carried by their current employer. People with an educational background in data science will sometimes pass this filter. They tend to speak the language and there is a one-to-one mapping to the role requirements. So how does an academic get themselves into the pool?
I can’t stress this enough — make it easy for the recruiters. Those with a data science background have by directly demonstrating their history fits the position description. The recruiters need to go through hundreds of applications. They work with companies in business and IT recruitment, they should not be expected to interpret an esoteric (your CV is esoteric to everyone outside your field) academic CV. Having published X number of papers with a H-index of Y, having won this fellowship and worked on problem Z doesn’t tell the recruiter how well you meet the requirements of the advertised job. It tells them that you are probably smart, but they already assumed this by virtue of you having a PhD. You are entering their domain, not vice-versa. Do not expect your academic track record to speak for itself.
In the job market there is a modern trend to do away with cover letters. You will need to make use of a cover letter to give context to your CV. This will serve two functions:
1. Tell your story, where you’ve been, where you want to get to and why. Set a narrative. Recruiters and hirers are humans. Help them relate to you and see you as person. This also helps them evaluate whether you will be a good cultural fit.
2. More Importantly, this is your opportunity to describe how your experiences map back to the job requirements. You will need to be short, sharp, punchy. As a post doc you possibly helped supervise PhD students or groups of students. Link that back to responsibilities in business such as management, budget, timelines, training. Demonstrate how powerful data science or statistics have been in your research and how you are highly proficient. You may have used super computers of clusters so know code optimisation and high throughput computing. Have I said this before? Make it easy for them to see why you are a strong candidate. Don’t expect recruiters/businesses to make the links.
Some mistakes I made when starting out, and something I noticed when helping colleagues transition to industry, is that academic CVs (in my field anyway) tend to be written in an understated tone. We rely on our track record, networking and references. It’s considered poor form to talk yourself up too much given the giants in the field. My old group leader used to say, ‘George, sometimes in life you need a spew bucket. You need to sing your own praises on stage then spew into the bucket backstage because of how disgusting you feel afterwards’. Whilst I don’t recommend that you claim to be the next big thing in data science and that it would be a monumental mistake for the company not to hire you, the chances are a more aggressive spruiking of your ability and experience is needed in your CV.
If you get to the interview stage obviously prepare. It can be easy to get fixated on the technical aspect and trying to study the lists of top x data science questions, particularly for junior roles. I studied technical questions too much when I started out, this was my imposter syndrome anxiety taking over. I personally don’t ask these questions as I am more interested in how a candidate thinks and in their processes. It will take someone 5 minutes to look up the best metric for model z. I’m not saving lives, I don’t need people under pressure (interview nerves) to recall information on the spot (such a requirement can bias you against personality types, genders, or cultures). Have a good narrative ready that explains your story and be ready to discuss your work. Why you made decision X? Would you do the same in retrospect? Keep within the STAR framework if this is all new to you.
Learn about the company and try to understand the problem they will need you to solve. I was able to land my current role because I planned out how I would run the project, the project streams, and the types of data science analysis and engineering that would be needed to get there. This can be high risk, high reward, particularly if you miss the mark. But if you get it right, it helps allay risk by demonstrating you understand the problem and the requirements. Remember to keep things general as you are not expected to be across the details e.g., ‘in general there are several ways to approach the problem, without being across the data you could…’.
We’ve spoken about the optimisation part. What about the probability part? There are many factors outside your control when applying for that first job (or any role for that matter). The right role, right time, right competition, right culture, the right networking. Landing that first job requires some or all of these factors to align. Some of you will walk into the right place at the right time with the right CV and think this is all too easy. For those not in extreme percentiles, control the things you can control and do not take rejection personally. It’s a statistical problem and may require several attempts. I know this is hard because when applying you are under a lot of stress, you invest so much energy and feel there is a lot at stake. Realise the factors at play. You are highly capable and would succeed in the role, but statistically the numbers won’t always come out in your favour. Reflect, learn and improve.
Some Final Thoughts
Remember, you are competing against people who have been doing this all day every day for many years. It might take several iterations of your cover letter/CV to get to the interview stage. Transitioning that cover letter/CV to a competitive industry version might take several iterations. I think mine changes every job I applied for. Reach out to former colleagues and your network. Ask for feedback on your CV and advice in general. Sometimes they might have contacts who need a short-term fill which helps get you a track record. A combination of nerves and inexperience may also require a few iterations/attempts at interviews. That’s fine as long you reflect and improve.
We have seen that working as a data scientist can require a diverse skill set. These areas can be hard to gain experience in while working in academia. No one is saying you can’t do it once you take the time to sit down to learn. Unlike academia, documentation is good for major codes and services. But you are competing against people who already know this. I recommend to my colleagues to open a free AWS tier and build some trivial apps and play with their services. Knowledge of cloud-based infrastructure is very helpful as it impacts algorithm and solution design. Questions about cloud services will usually be quite superficial in a data science interview but it’s always good to be prepared to have some idea of what they are talking about if it comes up. If you haven’t already, learn (py)Spark and play around with Databricks. Become familiar with any of the dashboarding services (PowerBi., Tableau, Quicksight). Plotly and Dash are free but built at much lower-level code than these offerings. A good argument can be made that dashboarding principles are all similar and you’ll quickly pick up the preferred tech stack.
And finally, be kind to yourself. You are changing professions. The work is new, the technology is new, the way of working is new. You probably have very high expectations of yourself, but it’s ok if mistakes are made. You might find yourself in a place that doesn’t feel right or is not working out. If this happens it’s ok to reset and move on.
I hope this article has somewhat helped prepare you for what to expect. I am more than happy to answer questions and provide support to those starting out. Thank you for reading, and I look forward to engaging with the community.
About George:
I am an astrophysicist at heart but have been finding technological and optimisation problems in industry to keep me challenged over the last few years. I have worked as an office cleaner, supermarket night fill, trivia host, lecturer/unit coordinator and as research fellow at a couple of Max Planck Institutes. More recently I have worked in tech enablement/data science/data engineering across various roles and industries (insurance, defence, automotive, logistics). I am not exactly excited by data or data science, but rather find simple pleasures in learning and the thrill of problem solving. There never seems to be enough time to study and learn everything but thankfully I am handed around to my younger reports to be trained in their modern techniques. I care about my people and love seeing them flourish.
See George’s profile here.