Coronavirus: Modelling the world of COVID-19

Jones, Matt

Coronavirus: modelling the world of COVID-19

4

SHARES

Share via

Posted: 6 July 2020 | Matt Jones (Tessella) | No comments yet

New data models need to be built rapidly to respond to the COVID-19 pandemic. Matt Jones explains how to do it.

In their drive to develop new therapeutics and vaccines for COVID-19, researchers are building, training and deploying data models at an unprecedented speed.

However, the data they are using is still very new and full of uncertainties. As a result, experts designing drugs and vaccines are spending a lot of time engineering data and rebuilding and validating models. This takes up valuable research time so needs to be conducted quickly. Despite this, if models are rushed through based on old assumptions, rather than being explicitly tailored for the problem at hand, they may not produce the right results and cause problems down the line.

The following four areas should help to guide COVID-19 task forces when responding to the unprecedented challenges before them.

1. Accessing the right data

The core of the current challenge is data uncertainty. Most disease research is built on years of study. Now, we are dealing with data on biological mechanisms and patient responses where our understanding is still evolving. Investigating potential secondary indications of existing drugs, for example, involves data based on subjective assessments by doctors who are still getting to grips with and learning about the disease.

The result is that open source data or data from clinical trials and hospitals may include bias or misreporting and various sources may use different labels and data capture mechanisms. Models build on such data will not produce reliable results.

As data enters databases, it must be assessed by subject matter experts for errors and bias. Data scientists must make necessary changes to ensure consistency. They must also remove confounding elements, eg, labels added to scans by physicians, which will confuse models.

Metadata should be added on what the data represents – eg, type of molecule or toxicology, but also provenance, timestamps, usage licences, etc. There must also be a consistent taxonomy established for naming things so models (and humans) can find and make sense of the data.

Once complete, data must be fed into central and accessible data stores, while tools and integrators should be setup to feed data to data science teams.

At Tessella, we have seen many projects derailed because modellers drew invalid conclusions from data sets which contained errors, bias or lacked contextual information. In a project with a pharma company – which used pre-clinical data to predict late-stage failures – we discovered the problem was that their modellers said data was hard to find, difficult to understand, laborious to use and risky to draw conclusions from. By addressing this, we significantly reduced failures. If this happens during normal times, it will doubtlessly be a major problem in the current rush for answers.

2. Choosing the right models

The next potential sticking point is building the right model. There is no rule for which approach is best for a particular problem. The nature and context of the issue, data quality and quantity, computing power needs, speed and intended use, all feed into model choice and design. A model which analyses lung scans to monitor disease progression will look very different to a model which analyses molecule libraries to identify likely candidates for new drug targets.

Start by understanding the type of problem. Is it classification or regression, supervised or unsupervised, predictive, statistical, physics-based, etc? Do not just settle for the approach you are most familiar with.

Screen data to understand what is possible. Perform rapid and agile early explorations using simple techniques to spot the correlations that will guide your plan. From this analysis, identify candidate modelling techniques (eg, empirical, physical, stochastic, hybrid) before narrowing down to the most suitable model for that specific problem.

‘Most powerful’ is not the same as ‘must suitable’. Techniques such as machine learning need lots of well understood data and so are ill-suited to most COVID-19 challenges at this stage. Approaches such as Bayesian uncertainty quantification may be better where limited trusted data is available.

3. Ensure your answers are trusted

Models The best model in the world will fall down if users do not trust it. Trust requires more than just a working model. Over-complicated or frustrating user-interfaces or models which break after a few months, undermine trust and reduce uptake. We are seeing this right now in track and trace apps, but it is equally true of drug discovery platforms.

So does a lack of explainability. If users cannot understand why the model reached a result, they will end up having to repeat work manually. A good model contains tools to analyse what data was used, its provenance and how the model weighed different inputs, then report on that conclusion in clear language.

Privacy and ethical concerns also undermine trust. Patient data must be freely given and kept securely. If, as some suspect, there is variability in disease response between ethnicities, these must be reliably accounted for. Models which only work for white people will quickly be shelved.

4. Deploying models at scale

Models must work in the enterprise, not just for the data scientists.

Usually that involves engineering the final model into a piece of software and integrating it into a mobile or web app, or a bespoke piece of technology. This requires an understanding of the rules and complexities of enterprise IT or edge computing where the model must operate.

Data scientists must make necessary changes to ensure consistency”

This may involve wrapping models in software (‘containers’) which translate incoming and outgoing data into a common format, to allow it to slot into an IT ecosystem. It will require allocating power to compute demands relevant to the application. This means planning for ongoing maintenance, support and retraining. This is where a lot of models face big hold ups, since pharma researchers and even data scientists are not usually software engineers.

If all goes well, the user is presented with a clear interface. They enter the relevant inputs, eg, desired pharmacological properties. The model runs and presents the resulting insight in an easy to understand way that the user is comfortable acting upon.

Bringing it all together for rapid results

Time can be saved by identifying your end objective and being laser-focused on capturing and curating the most relevant data. However, rigour is needed throughout and there are few shortcuts to take. Speed is not about cutting corners; it is about doing things right first time, so you do not have to abandon projects and start again.

That means efficient allocation of resources – selecting the right skills for the right job. Getting data experts to handle the data, modellers to do the models and software engineers to manage the software. Critically, it means giving COVID-19 experts the tools and time to focus where their true expertise lies – understanding the disease and developing drugs and vaccines.

This article is based on Tessella’s whitepaper, COVID-19: Effective Use of Data and Modelling to Deliver Rapid Responses, developed with input from a range of modelling experts.

About the author

Dr Matt Jones holds a PhD in synthetic organic chemistry and has over 20 years of experience in pharmaceutical R&D. He has been at Tessella since 2014, before which he held a number of technical and management roles at GlaxoSmithKline (GSK). In 2015 he was elected to the board of directors of Pistoia Alliance.

Related organisations

Tessella

Related diseases & conditions

Coronavirus, Covid-19

Cookie	Description
cookielawinfo-checkbox-advertising-targeting	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Description
cf_ob_info	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	This cookie is set by Youtube and is used to track the views of embedded videos.

Cookie	Description
bcookie	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	This cookie is set by LinkedIn and used for routing.
lissc	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Description
advanced_ads_browser_width	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Recommended

Coronavirus: modelling the world of COVID-19

1. Accessing the right data

2. Choosing the right models

3. Ensure your answers are trusted

4. Deploying models at scale

Bringing it all together for rapid results

About the author

Related topics

Related organisations

Related diseases & conditions

Leave a Reply Cancel reply

Recommended

Coronavirus: modelling the world of COVID-19

1. Accessing the right data

2. Choosing the right models

3. Ensure your answers are trusted

4. Deploying models at scale

Bringing it all together for rapid results

About the author

Related topics

Related organisations

Related diseases & conditions

Webinar: Benchtop NMR for Continuous Process Monitoring in PAT

AbbVie immunology deal to advance potential first-in-class therapy

Alliance for impact – advancing CGT development in Europe

Gilead partners to advance novel oral oncology drug

Radioligand therapy could address multiple cancer types

Leave a Reply Cancel reply