From Cron to Modern Data Stack (MDS): Dataflow Automation and Its Current State
The concept that makes the technological miracles of today possible are defined by data. Enormous amounts of data are collected...
2020 was a year like no other. The global pandemic brought along countless challenges for governments, health organizations, and frontline workers. As the virus rapidly spread across continents and lockdown measures hardened on a weekly basis, an entire world joined the fight. Academic labs, companies, startups, foundations, NGOs, and tech communities have united on a global level in an unprecedented display of collaborative problem-solving.
Data science tools, ML-powered algorithms, and AI solutions have played a crucial role in responding to the global pandemic. Tech communities have transformed the global challenge into an opportunity to innovate and tackled a large variety of challenging topics. Some of the proposed software initiatives are worth taking a second look at.
All global efforts to combat the COVID-19 spread in 2020 had one thing in common: the need to collect, analyze, and take decisions on real-time outbreak data. Whether for biomedical, epidemiological, or socioeconomic purposes, data science has been one of the main tools assisting researchers and policymakers to tackle pandemic-related challenges. At the same time, ML-powered software took the spotlight for all kinds of innovative applications.
The most researched COVID-19-related application of 2020 was medical imaging for screening and detection of infected patients. Since ML algorithms are already popular in the health sector, it was only expected of scientists to develop ML models for identifying different stages of the disease. According to a survey published in late September 2020 more than two dozen ML approaches are used to diagnose COVID-19 based on X-Ray and CT scan images.
AI-powered solutions were also very popular in an attempt to bring life back to normal and keep frontline workers safe. IoT-based thermal screening applications and contactless screening solutions based on cloud technologies have become ubiquitous in hospitals all across the globe. Social distancing monitors based on computer vision methods were also researched extensively, given their utility in both public spaces and work environments.
Lastly and perhaps most importantly, real-time data acquired throughout the pandemic was crucial for policymakers, governments, and organizations in monitoring, managing, and communicating the impact of the outbreak. Without proper statistical tools and data science capabilities, making sense of outbreak data would have been impossible. Like never before, 2020 has shown it clearly: data and data-driven decisions can have immediate political and economic consequences.
2020 was the year that proved how important open cooperation is. How despite challenges on a global scale, people of different cultures and backgrounds could still come together to produce innovative, sometimes life-saving solutions.
What is a better way to celebrate the spirit of cooperation than by celebrating the open-source momentum of 2020?
Researchers, scientists, makers, and developers all across the world have joined forces for one of the most productive years of the open-source industry. Maybe don’t quote us on this one, but trust us: 2020 saw some truly amazing and powerful open source initiatives. At least a few thousands can be found on a quick search on Github, and there are many others out there.
The Data Scientists at Penn Medicine’s Predictive Healthcare have applied predictive analytics to build the COVID-19 Hospital Impact Model for Epidemics (CHIME). With support and contributions from the CodeForPhilly community, the application was “designed to assist hospitals and public health officials with understanding hospital capacity needs as they relate to the COVID pandemic”. The tool used an epidemiological modeling technique to provide estimates for daily capacity planning needs (aka short term forecasting). The predicted numbers included daily hospitalizations, patients needing ventilation, and ICU admissions. By using this tool, hospitals could better understand and plan for fluctuating demands caused by the virus and avoid equipment shortages.
On the old continent, the European COVID-19 Data Platform sought to enable rapid collection and sharing of research data, in its effort to accelerate coronavirus research. In Paris, a freshly founded research laboratory called Just One Giant Lab puts open-source tools and data at the core of their OpenCovid19 mission: bringing together developers, engineers, data scientists, and healthcare experts for developing low-cost solutions against the pandemic.
Opening up data for scientific research has been the main trend of 2020. From the early days of the COVID-19 spread, John Hopkins University has been making infection data freely available to the world (together with its now popular interactive dashboard). At the same time, journals and publications started openly sharing their publications, while independent research laboratories became less secretive about their results.
Fighting COVID-19 brought together the open-source (software and hardware) communities to contribute to a greater goal. Whether for assisting front-line workers, for providing governments and policymakers with real-time data, or for research purposes, 2020 has made it clear: open source software and open data are key to collective innovations, especially throughout a global crisis.
Looking back at 2020 and at the impressive contributions of the tech community, we can draw the following conclusions for the years to come.
We didn’t really need a global crisis to realize this, but here it is. All data science and research efforts against the pandemic shared a common denominator. They all heavily relied on multiple data sources, collected and communicated in real-time. Virus data has powered mobile apps, spread tracking dashboards, and state of the art research. The same data quality rule applied to all of them: garbage in, garbage out. Whenever building data-driven solutions, the quality of the data is the most important aspect of the whole development process.
As Francesca Dominici, the co-director of the Harvard Data Science Initiative puts it:
“I think that the pandemic has definitely increased the appreciation of data science as an important discipline that can help us solve enormous challenges impacting society.”Francesca Dominici, the co-director of the Harvard Data Science Initiative
Understanding, acknowledging, and dealing with uncertainty has become part of our daily lives. Given the tough circumstances of 2020, data science and statistical modeling have become go-to tools for understanding and making the right decisions in times of uncertainty. We have all been firsthand witnesses to the various consequences of a global crisis. It is time we acknowledged the power that lies in our hands and to do what the tech community did best in 2020: transform a global challenge into an opportunity for innovation.
At the beginning of a new decade, we can come together and truly celebrate innovation. The state of technology has advanced to a point where it truly has the chance to make a difference and impact our lives for the better.
These inspiring use cases keep us in awe and give us trust in the community, the tools, and the overall tech ecosystem. ML, AI, and Data Science have proven themselves as extremely useful tools for humanity and society and the path forward is only following a positive trend: it will only get better, more performant, and more ethical.
Bring it on, 2021!