New data models need to be built rapidly to respond to the COVID-19 pandemic. Matt Jones explains how to do it.
In their drive to develop new therapeutics and vaccines for COVID-19, researchers are building, training and deploying data models at an unprecedented speed.
However, the data they are using is still very new and full of uncertainties. As a result, experts designing drugs and vaccines are spending a lot of time engineering data and rebuilding and validating models. This takes up valuable research time so needs to be conducted quickly. Despite this, if models are rushed through based on old assumptions, rather than being explicitly tailored for the problem at hand, they may not produce the right results and cause problems down the line.
The following four areas should help to guide COVID-19 task forces when responding to the unprecedented challenges before them.
Are you looking to explore how lipid formulations in softgels can enhance drug absorption and bioavailability. Register for our upcoming webinar to find out!
3 September 2025 | 3:00 PM BST | FREE Webinar
This webinar will delve into the different types of lipid formulations, such as solutions, suspensions, emulsions, and self-(micro)emulsifying systems. Applications span diverse therapeutic areas including HIV therapy, oncology, immunosuppressants, and emerging treatments like medicinal cannabis (eg, CBD).
What You’ll Learn:
Lipid formulation development and screening tools for optimisation
Key steps in scale-up and industrialisation to ensure consistency and efficiency
Impact of lipid-based softgels on drug delivery and patient outcomes.
The core of the current challenge is data uncertainty. Most disease research is built on years of study. Now, we are dealing with data on biological mechanisms and patient responses where our understanding is still evolving. Investigating potential secondary indications of existing drugs, for example, involves data based on subjective assessments by doctors who are still getting to grips with and learning about the disease.
The result is that open source data or data from clinical trials and hospitals may include bias or misreporting and various sources may use different labels and data capture mechanisms. Models build on such data will not produce reliable results.
As data enters databases, it must be assessed by subject matter experts for errors and bias. Data scientists must make necessary changes to ensure consistency. They must also remove confounding elements, eg, labels added to scans by physicians, which will confuse models.
Metadata should be added on what the data represents – eg, type of molecule or toxicology, but also provenance, timestamps, usage licences, etc. There must also be a consistent taxonomy established for naming things so models (and humans) can find and make sense of the data.
Once complete, data must be fed into central and accessible data stores, while tools and integrators should be setup to feed data to data science teams.
At Tessella, we have seen many projects derailed because modellers drew invalid conclusions from data sets which contained errors, bias or lacked contextual information. In a project with a pharma company – which used pre-clinical data to predict late-stage failures – we discovered the problem was that their modellers said data was hard to find, difficult to understand, laborious to use and risky to draw conclusions from. By addressing this, we significantly reduced failures. If this happens during normal times, it will doubtlessly be a major problem in the current rush for answers.
2. Choosing the right models
The next potential sticking point is building the right model. There is no rule for which approach is best for a particular problem. The nature and context of the issue, data quality and quantity, computing power needs, speed and intended use, all feed into model choice and design. A model which analyses lung scans to monitor disease progression will look very different to a model which analyses molecule libraries to identify likely candidates for new drug targets.
Start by understanding the type of problem. Is it classification or regression, supervised or unsupervised, predictive, statistical, physics-based, etc? Do not just settle for the approach you are most familiar with.
Screen data to understand what is possible. Perform rapid and agile early explorations using simple techniques to spot the correlations that will guide your plan. From this analysis, identify candidate modelling techniques (eg, empirical, physical, stochastic, hybrid) before narrowing down to the most suitable model for that specific problem.
‘Most powerful’ is not the same as ‘must suitable’. Techniques such as machine learning need lots of well understood data and so are ill-suited to most COVID-19 challenges at this stage. Approaches such as Bayesian uncertainty quantification may be better where limited trusted data is available.
3. Ensure your answers are trusted
The best model in the world will fall down if users do not trust it. Trust requires more than just a working model. Over-complicated or frustrating user-interfaces or models which break after a few months, undermine trust and reduce uptake. We are seeing this right now in track and trace apps, but it is equally true of drug discovery platforms.
So does a lack of explainability. If users cannot understand why the model reached a result, they will end up having to repeat work manually. A good model contains tools to analyse what data was used, its provenance and how the model weighed different inputs, then report on that conclusion in clear language.
Privacy and ethical concerns also undermine trust. Patient data must be freely given and kept securely. If, as some suspect, there is variability in disease response between ethnicities, these must be reliably accounted for. Models which only work for white people will quickly be shelved.
4. Deploying models at scale
Models must work in the enterprise, not just for the data scientists.
Usually that involves engineering the final model into a piece of software and integrating it into a mobile or web app, or a bespoke piece of technology. This requires an understanding of the rules and complexities of enterprise IT or edge computing where the model must operate.
Data scientists must make necessary changes to ensure consistency”
This may involve wrapping models in software (‘containers’) which translate incoming and outgoing data into a common format, to allow it to slot into an IT ecosystem. It will require allocating power to compute demands relevant to the application. This means planning for ongoing maintenance, support and retraining. This is where a lot of models face big hold ups, since pharma researchers and even data scientists are not usually software engineers.
If all goes well, the user is presented with a clear interface. They enter the relevant inputs, eg, desired pharmacological properties. The model runs and presents the resulting insight in an easy to understand way that the user is comfortable acting upon.
Bringing it all together for rapid results
Time can be saved by identifying your end objective and being laser-focused on capturing and curating the most relevant data. However, rigour is needed throughout and there are few shortcuts to take. Speed is not about cutting corners; it is about doing things right first time, so you do not have to abandon projects and start again.
That means efficient allocation of resources – selecting the right skills for the right job. Getting data experts to handle the data, modellers to do the models and software engineers to manage the software. Critically, it means giving COVID-19 experts the tools and time to focus where their true expertise lies – understanding the disease and developing drugs and vaccines.
Dr Matt Jones holds a PhD in synthetic organic chemistry and has over 20 years of experience in pharmaceutical R&D. He has been at Tessella since 2014, before which he held a number of technical and management roles at GlaxoSmithKline (GSK). In 2015 he was elected to the board of directors of Pistoia Alliance.
This website uses cookies to enable, optimise and analyse site operations, as well as to provide personalised content and allow you to connect to social media. By clicking "I agree" you consent to the use of cookies for non-essential functions and the related processing of personal data. You can adjust your cookie and associated data processing preferences at any time via our "Cookie Settings". Please view our Cookie Policy to learn more about the use of cookies on our website.
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorised as ”Necessary” are stored on your browser as they are as essential for the working of basic functionalities of the website. For our other types of cookies “Advertising & Targeting”, “Analytics” and “Performance”, these help us analyse and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these different types of cookies. But opting out of some of these cookies may have an effect on your browsing experience. You can adjust the available sliders to ‘Enabled’ or ‘Disabled’, then click ‘Save and Accept’. View our Cookie Policy page.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Cookie
Description
cookielawinfo-checkbox-advertising-targeting
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics
This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance
This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID
This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged
This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.
Performance cookies are includes cookies that deliver enhanced functionalities of the website, such as caching. These cookies do not store any personal information.
Cookie
Description
cf_ob_info
This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob
This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only
This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush
This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db
This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC
This cookie is set by Youtube and is used to track the views of embedded videos.
Analytics cookies collect information about your use of the content, and in combination with previously collected information, are used to measure, understand, and report on your usage of this website.
Cookie
Description
bcookie
This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS
This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang
This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc
This cookie is set by LinkedIn and used for routing.
lissc
This cookie is set by LinkedIn share Buttons and ad tags.
vuid
We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId
This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule
This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session
This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues
This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga
This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat
This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid
This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
Advertising and targeting cookies help us provide our visitors with relevant ads and marketing campaigns.
Cookie
Description
advanced_ads_browser_width
This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions
This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info
This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer
This cookie is set by Advanced Ads and sets the referrer URL.
bscookie
This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE
This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr
This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory
This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE
This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.