NIR spectroscopy is well-known for its sensitivity to water, which can be useful for detecting water variation in the sample matrix. However, for many other applications, water intake of the samples may be an issue and must be closely monitored to gauge the predictive performance of calibration models. This article looks at how moisture fluctuations in drug products affect the performance of classification models. To achieve this, we built hit quality index (HQI) and principal component analysis (PCA) identification models on pharmaceutical finished products to see how humidity variations affect the robustness of these models.
While near-infrared spectroscopy (NIRS) is a highly valued tool with many applications in industry due to its fast and noninvasive nature, a significant drawback is its extreme susceptibility to humidity changes in the environment. In this study, we evaluate the influence of humidity variation on the predictive performance of NIR‑based multivariate calibration models.
The pros and cons of the most common algorithms used in NIR or Raman spectroscopy systems in the market have been discussed in open literature.1 Here, spectral correlation technique, more specifically hit quality index (HQI) algorithm, and also principal component analysis (PCA) will be evaluated to see how their predictive performance are affected by the change in moisture content of the samples.
Are you looking to explore how lipid formulations in softgels can enhance drug absorption and bioavailability. Register for our upcoming webinar to find out!
3 September 2025 | 3:00 PM BST | FREE Webinar
This webinar will delve into the different types of lipid formulations, such as solutions, suspensions, emulsions, and self-(micro)emulsifying systems. Applications span diverse therapeutic areas including HIV therapy, oncology, immunosuppressants, and emerging treatments like medicinal cannabis (eg, CBD).
What You’ll Learn:
Lipid formulation development and screening tools for optimisation
Key steps in scale-up and industrialisation to ensure consistency and efficiency
Impact of lipid-based softgels on drug delivery and patient outcomes.
All drug products in this study were purchased from a licensed pharmacy. All samples in tablet and capsule forms were measured in diffuse reflectance with an average of 32 scans over the wavelength range of 1,600-2,400nm. These measurements are used as a spectral library and training sets in HQI and PCA models, respectively.
Moisture effect on HQI model
HQI, which is derived from cosine similarity of spectral vectors, is a common method for library matching of unknown materials. It individually compares the shape of the unknown test spectrum with all the spectra in the library and computes a degree of similarity index, ranging from zero to one (Equation 1). High values represent a measure of high similarity between the unknown and a library match.
Equation 1
The effect of humidity on HQI identification models was investigated here. A mini-spectral library of 12 drug products were created and these are listed in Table 1. All measurements for the spectral library were taken as soon as the products were removed from their packaging, ensuring they were not exposed to humidity. After the library measurements were taken in the month of May, the samples were kept in well plates with the lid on in room conditions for a further six months.
Figure 1: Average relative humidity (RH %) change of the room from June to November. Humidity levels show a decrease in the months of October and November as the heat was turned on in the room.
Periodic measurements on these samples over this time were conducted as part of validation studies to evaluate the predictive performance of the HQI model.
Figure 2: Spectra of the same levofloxacin tablet taken in October and November. Tablet was first removed from its packaging in May and left exposed to room conditions. Both spectra are SNV-normalised.
All tests carried out over the five‑month period correctly identified all products with an HQI threshold value of 0.90. Some samples, however, started failing in the sixth month’s run (see blue line in Figure 3). During this six-month period, although the samples were kept closed, they were still exposed to the atmosphere to some degree. The change in the spectral signature of a levofloxacin tablet over the course of a month, for example, can be seen in Figure 2. The region corresponding to water peak in the dotted box in Figure 2 shows the moisture variance between October and November, where the heat in the room was turned on in November and humidity levels decreased (Figure 1). The blue curve in Figure 3 represents the HQI values computed from the spectra collected from the samples that were exposed to the atmosphere for six months.
Figure 3: HQI index values of validation tests against the spectral library. Blue line is where the tablets are left in a well plate with the cover on for about six months and tested. Red line is when the water region (shown in the dotted box in Figure 2) is removed from the spectra and HQI values are recalculated. Clear improvement is observed in HQI values going up. Threshold (black dashed line) is set at 0.90.
The red line is constructed by eliminating the water region between 1,850-2,050nm in these same spectra. Without the water region the model prediction improves. Figure 3 also indicates that samples are differently affected by humidity in room conditions. Furthermore, the HQI model is found to be quite robust to minor spectral changes caused by room conditions. Only two samples out of 12 are identified incorrectly after a six-month exposure in room conditions.
Moisture effect on the PCA model
Both spectral correlation and factor-based multivariate approaches have their advantages and disadvantages. While the HQI method has the advantage of being simple and allows for rapid screening of materials against a verified spectral library, it is not particularly sensitive to small spectral changes. As a result, if the information sought lies in subtle variances between samples then multivariate techniques, such as PCA, may be more appropriate. PCA has the added advantage of being able to discriminate between samples with close formulations. For example, with the HQI model we were not able to precisely differentiate between the levofloxacin tablets based on their formulations or suppliers. PCA, on the other hand, produced a model with clear grouping of each formulation, as can be seen in Figure 4.
Figure 4: PCA plot of levofloxacin tablets with five different formulations. Black scores represent the samples that the calibration model is built on. After the same samples were exposed to the atmosphere for a week, their spectra were recollected and projected onto the PCA plot (red scores).
PCA is an unsupervised technique that assesses the variances in the training set of spectra and then ranks these variances as factors, assigning each spectrum a score value on each factor calculated by the algorithm. Details of PCA are thoroughly documented in literature.2 Previous studies have shown that factor-based models are highly sensitive to variances – an important one being humidity fluctuations.3 We demonstrated this effect in Figure 4 with levofloxacin tablets from five different suppliers. The PCA model in Figure 4 was constructed on the samples that were measured immediately after they were out of their pack. The model illustrates that there are five distinct clusters formed for the five formulations. Black scores in the figure represent the calibration samples. The same samples were then exposed to the atmosphere for a week and their spectra were recollected and projected onto the PCA plot (red scores).
Figure 5: The water region of 1,850-2,050nm is removed from both calibration samples and the test samples used in Figure 4 and then the PCA model is reconstructed. Black scores are for the calibration set, while red scores represent the test samples projected on the PCA model. Without the water region the model prediction improves.
Figure 4 shows that the model fails to predict these exposed samples correctly; however, when the water region between 1,850 and 2,050 is eliminated from both calibration and test samples, the model predicts correctly (Figure 5). All test samples are correctly grouped in their respective formulation classes in Figure 5. Thus, we could say that although HQI models are more robust against variations in moisture content and small traces of water did not fail the HQI model, the robustness of NIR-based PCA models is strongly influenced by humidity.
Concluding remarks
Our periodic tests over the course of several months concluded that small changes in spectra due to humidity could be tolerated; however, too great a change in spectral signatures over time fails the HQI model. In addition, we found that PCA models are far more sensitive to changes caused by humidity. A one-week exposure to the atmosphere would fail the model (Figure 4). Therefore, all products should be measured as soon as they are removed from their packs to minimise the effect of atmospheric humidity. Another alternative may be to remove water region from the spectra before building the calibration models to limit the effect of moisture variation on the accuracy of model predictions.
Derya Cebeci
Derya currently works at PortMera Corp (Technopark Istanbul) as a Research Chemist, developing PAT tools as real-time chemical measurement systems and at-field screening applications. She has a PhD in Analytical Chemistry from Purdue University and holds an MBA degree from Ball State University. Prior to joining PortMera, she worked at the US Food and Drug Administration (FDA) as a postdoctoral fellow, generating spectroscopy methods for at-field counterfeit drug screening applications.
References
1. Bakeev KA, Chimenti RV. Pros and cons of using correlation versus multivariate algorithms for material identification via handheld spectroscopy. European Pharmaceutical Review. 2013.
2. Smith LI. A tutorial on Principal Component Analysis. 2002.
3. Yoon WL, Jee RD, Charvill A, Lee G, Moffat AC. Application of near-infrared spectroscopy to the determination of the sites of manufacture of proprietary products. Journal of Pharmaceutical and Biomedical Analysis. 2004;34(5):933-44.
This website uses cookies to enable, optimise and analyse site operations, as well as to provide personalised content and allow you to connect to social media. By clicking "I agree" you consent to the use of cookies for non-essential functions and the related processing of personal data. You can adjust your cookie and associated data processing preferences at any time via our "Cookie Settings". Please view our Cookie Policy to learn more about the use of cookies on our website.
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorised as ”Necessary” are stored on your browser as they are as essential for the working of basic functionalities of the website. For our other types of cookies “Advertising & Targeting”, “Analytics” and “Performance”, these help us analyse and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these different types of cookies. But opting out of some of these cookies may have an effect on your browsing experience. You can adjust the available sliders to ‘Enabled’ or ‘Disabled’, then click ‘Save and Accept’. View our Cookie Policy page.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Cookie
Description
cookielawinfo-checkbox-advertising-targeting
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics
This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance
This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID
This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged
This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.
Performance cookies are includes cookies that deliver enhanced functionalities of the website, such as caching. These cookies do not store any personal information.
Cookie
Description
cf_ob_info
This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob
This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only
This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush
This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db
This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC
This cookie is set by Youtube and is used to track the views of embedded videos.
Analytics cookies collect information about your use of the content, and in combination with previously collected information, are used to measure, understand, and report on your usage of this website.
Cookie
Description
bcookie
This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS
This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang
This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc
This cookie is set by LinkedIn and used for routing.
lissc
This cookie is set by LinkedIn share Buttons and ad tags.
vuid
We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId
This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule
This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session
This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues
This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga
This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat
This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid
This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
Advertising and targeting cookies help us provide our visitors with relevant ads and marketing campaigns.
Cookie
Description
advanced_ads_browser_width
This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions
This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info
This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer
This cookie is set by Advanced Ads and sets the referrer URL.
bscookie
This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE
This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr
This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory
This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE
This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.