last week at NeurIPS’24, one extremely salient thing was the anxiety and frustration felt and expressed by late-year PhD students and postdocs who were confused by the job market that looks and feels so much different from what they expected perhaps when they were applying for PhD programs five or so years ago. and, some of these PhD students and postdocs are my own under my supervision. this makes me reflect upon what is going on or what has been going on in artificial intelligence research and development. this post will be more of less a stream of thoughts rather than a well-structured piece (though, as if i ever wrote a well-structured, well-thought-out, well-prepared blog post.)
the past decade or so has been an interesting time for machine learning, or more broadly artificial intelligence. starting with speech recognition in 2010 or so, deep learning has shown dramatic improvements over then-existing states of the art in a variety of challenging and also practical problems, such as object recognition from images and machine translation. by 2014, it was pretty clear that something big was going to happen and that every major company, both tech and not, wanted to ensure that they are part of this ongoing revolution and benefit from it.
because deep learning hadn’t been anything close to mainstream then for many years, there was almost no undergraduate curriculum where basic ideas and techniques behind deep learning were taught seriously. in fact, artificial neural nets were mentioned barely in passing in many machine learning and artificial intelligence courses back then. this created a great discrepancy between supply and demand of deep learning talents, forcing these companies, who were able to see this revolution earlier than the others could, to aggressively recruit PhD students from a small number of labs across the world.
because there were only a handful of labs in the world that were seriously pursuing deep learning (unlike how it is now,) there was a fierce competition over the graduates as well as even professors of these labs. this fierce competition naturally led to a greatly increased level of compensation for these PhD’s with experiences and expertise in artificial neural networks. this made the gap between the academic compensation and the industry compensation even greater in this particular area within the field of artificial intelligence, making it extremely challenging for universities to recruit some of them to educate their students. in fact, i was one of those very few who graduated between 2010 and 2015 with PhD, working on artificial neural nets during the PhD years, and joined a university as a tenure-track faculty member. this of course led to a great delay in ramping up the supply of talents, while the demand continued to soar.
an interesting side effect of such a fierce competition was that these companies were recruiting them even if they may not contribute directly to either top or bottom lines. these companies were hiring them to prepare themselves for the inevitable and imminent revolution that would change everything they do. a lot of these PhD’s hired back then were therefore asked to and free to do research; that is, they chose what they want to work on and they publish what they want to publish. it was just like an academic research position however with 2-5x better compensation as well as external visibility and without teaching duties, administrative overhead as well as pressure to constantly write grant proposals. what a fantastic opportunity!
i suspect that this wasn’t lost to students, both college and high school (if not middle school). there was an opportunity with amazing financial compensation, cushy benefits and freedom to choose their favourite topic to work on, as long as it was within the realm of artificial intelligence (to be frank, who doesn’t work on AI, i guess?) this opportunity seemed to be available however only to PhD’s who have published academic papers on artificial neural nets. this led to a flood of PhD applicants applying to become (now so-called) AI PhD students.
the flood of applicants does not necessarily mean that we would end up with a large number of PhD students, since the constraint on the number of PhD students is not the number of applicants but rather the number of available faculty advisors. although i did mention earlier that there were only a handful of labs 15 years or so ago working on artificial neural nets, by 2016, a lot of professors had already pivoted their labs to become deep learning labs and were aggressively expanding their labs by admitting a large number of PhD students.
so, it seemed like we had created a great pipeline of AI talent. a large number of brilliant students apply to PhD programs. a large number of professors in AI admit and train these brilliant students to become next-generation PhDs. a small number of major tech companies and others recruit them with unimaginably good compensation and research freedom.
this wasn’t however sustainable, perhaps obviously in hindsight. the only way for this to continue was for deep learning to continue to be something that can revolutionize the industry (if not the whole society) in five years and that it had to be in five years every year. as mentioned earlier, companies were recruiting these talents and investing in the environments in which they conduct research, in anticipation of this inevitable change in the future. in other words, it had to be the future they were preparing for, in order for this pipeline to continue.
once the first generation of lucky PhD’s (including me!) who were there not out of career prospects but mostly out of luck (or unluck), we started to have a series of much more brilliant and purpose-driven PhD’s working on deep learning. because these people were extremely motivated and were selected not by luck but by their merits and zeal, they started to make a much faster and more visible progress. soon afterward, this progress started to show up as actual products. in particular, large-scale models, represented by but not exclusively by large-scale conversational language models, began to show that these products are truly the revolutionary products that can both change the future and produce economical values in the present. in other words, these new generations of brilliant PhD’s have successfully brought the future into the present by productizing deep learning in the forms of large-scale conversational language models and their variants.
productization implies a lot of things, but there are two aspects that are particularly important to this note. first, productization requires some kind of standardization in development and deployment processes. such process standardization is however antithetical to scientific research. we do not need a constant and frequent stream of creative and disruptive innovations but incremental and stable improvements based on standardized processes. PhD’s are lousy at this, because this is precisely the opposite of what PhD programs are designed to train them for. PhD’s are supposed to come up with innovative ideas (yes, debatable if every idea is innovative, but it tends to be at least innovative with a lot of noise,) validate these ideas either theoretically or empirically, repot the findings to the community by writing papers and then move on. once something becomes an actual product (or a product category,) we cannot simply innovate and move on, but need to stick with it to support it continuously. with a well-established system of processes, the necessity of PhD degrees disappears rapidly.
second, productization creates a visible and concrete path to revenue. this is a great thing for the companies who have invested in recruiting these amazing talents and providing them resources to innovate within their organization rather than somewhere else. unfortunately, once there is a concrete path to revenue (and ultimately to profit), it becomes increasingly more difficult for researchers to continue to ask for full research freedom. many will be asked to directly contribute to products (or product categories) and justify their compensation as well as employment overall, and only a few will be allowed continued freedom in research. this is only natural, and probably why in most organizations (including for-profit, non-profit, government, etc.) research teams are often significantly smaller than and given less resources than product teams.
furthermore, during the past few years, universities have somewhat caught up with the demand and started to educate, train and graduate undergraduate and master’s students on fundamentals and practical ideas behind these new technologies. they know how to train these models, test these models and deploy these models, in addition to theoretical ideas behind them. even better, they are less egotistical on average than PhD’s and are often more open-minded.
these factors together completely shatter the AI talent pipeline outlined earlier. companies do not need as many PhD’s as before, since they can recruit bachelor’s or master’s students who can contribute immediately and directly to AI-based products following the standardized process. students do not need to enter PhD programs to learn necessary skills to do so, since universities can train them as part of their undergraduate curriculum. the current crop of PhD students, who joined the programs even slightly due to the positive career prospect based on this AI talent pipeline, are being left out of this big reorganization of the AI talent pipeline.
at this point, it is perhaps unsurprising that these students near the end of their PhD programs are feeling a greatly heightened level of anxiety and frustration. they looked up to people of my generation (still relatively young and junior, but in this field, probably on the more senior side) and thought they would enjoy similar career prospects by becoming extremely highly paid research scientists at big tech companies with a great degree of research freedom, as long as their PhD degrees were somewhat relevant to a broad field of machine learning and adjacent areas. from their perspective, the job market suddenly asks them to show their credentials in terms of innovating in a much narrower domain of large-scale language models and their variants and to work on directly contributing to these products built on top of these large-scale models. there are much fewer opportunities if they do not want to work on productizing large-scale language models, and these positions are disappearing quickly.
that said, i must emphasize that this does not mean at all that research topics in AI outside these large-scale models are not important nor sought after. for instance, at Prescient Design, we have been continuously hiring PhD-level research scientists who specialize in uncertainty quantification, causal machine learning, geometric deep learning, computer vision and more, because research and development in these areas are directly relevant to what we do, that is, lab-in-the-loop antibody design.
large-scale models are just one particular sub-area of AI that have received a lot of attention in recent years. i am incredibly excited by advances and progress from these large-scale models, but they are not the only one that deserves attention and investment. such an outrageously heightened level of attention at large-scale language models and their variants however easily blinds us, and in particular those who are still students as well as even faculty members at these so-called elite universities. regardless, attention, however well or not justified, comes with a greater level of opportunities, and it is only natural for students to feel anxious if they hadn’t produced nor are producing papers on large-scale models, as these opportunities may not be available to them.
at this point, it feels like the heightened anxiety and frustration i sensed from talking to and hearing from senior PhD students and postdocs at NeurIPS’24 last week was well justified. some of them probably feel betrayed, as the gap between what they were promised earlier and what they see now is growing rapidly. some of them probably feel helpless, as their choice of research topics and their work on these topics seem less welcome at these companies. some of them probably feel defeated, as bachelor’s or master’s students seem to be better versed at training and deploying these large-scale models and look to be considered more valuable than they are. it must be frustrating and anxiety-inducing.
unfortunately i could only work my way through (partly) understanding the source of anxiety and frustration i could sense from these immensely brilliant students but cannot think of a way to help alleviate such frustration. after all, it looks like i may have greatly but unintentionally contributed to the situation that makes them frustrated and anxious about their careers and future. sorry!
GIPHY App Key not set. Please check settings