How Institutions Can Prepare Their IT Environments for a Data Science Program
Data scientist is a hot job, with a median six-figure salary and projected 36% growth in positions over the next several years. Recent advances in artificial intelligence (AI) and other forms of data analytics only expand the potential opportunities for budding data scientists.
These trends have captured the attention of colleges and universities. By one count, more than 1,000 institutions now offer undergraduate or graduate degrees in data science.
There are two aspects of data science institutions should focus on, each of which can benefit the other. First is the actual data science education program that schools offer students. Colleges and universities that don’t yet offer such a degree — or that offer only older forms of data-focused study, such as statistics — should consider launching a data science program. A robust program can help them attract and educate high-caliber students who can excel in tomorrow’s jobs.
The second aspect is the data science projects that institutions conduct themselves as part of their business operations. Increasingly, schools need data science to sift through troves of student, market, financial, and other data to gain insights that can contribute to more effective education and stronger competitive advantage.
Both aspects of data science require some foundational resources and capabilities. Beyond fielding faculty and staff with data science expertise, institutions should invest in two specific areas.
First is to develop an IT department with an adaptive culture. Data science is a relatively new discipline that’s advancing rapidly. The types of data analytics that the IT team must support are evolving at a dizzying pace. The tools that researchers and students must be educated on could be different next year than they are today — as evidenced by the sudden interest in generative AI tools like ChatGPT. Those realities mean the IT department must be agile enough to support this rate of change.
Second is a stable IT environment where security is prioritized. Schools of all types have become prime targets of cybersecurity attacks, making strong data security an imperative. Data science by its nature involves large quantities of data, further raising the security stakes. Two strategies can help:
Place compute and data close together. In the past, organizations operated datacenters on site, where computer servers and data storage were housed behind locked doors. More recently, workloads and data have migrated to public clouds. Today, institutions are rethinking where they maintain their most sensitive information, with some returning certain data on premise.
This centralized approach presents challenges, however. More data is being generated at the edge of the network. For data science researchers, this could be in labs and satellite facilities. For data science practitioners, it could be on internet of things (IoT) devices. In either case, transmitting vast data streams to a central location can be costly, and it risks exposing large quantities of sensitive information.
Rather than transfer data to centralized compute resources, place the compute close to the data. This is achievable today using a containerized IT architecture. A container is a lightweight, standalone package that combines an application with its necessary files and settings.
Containers give institutions the ability to run data analytics applications on small devices. Analysis of the data can take place at the edge, and only the output needs to be transmitted. This can help reduce the amount of data that must be transferred. NASA, for instance, is using containers to conduct scientific analysis on the Internal Space Station.
Secure the technology supply chain. Following the infamous Sunburst supply chain hack of 2020 — an attack that spread to thousands of organizations through popular IT monitoring software — many institutions now worry about supply chain security. And for good reason.