FDA requires usability testing for new or modified medical devices
Are you bringing a new or modified medical device to the US Market? Then you probably know there are guidelines and standards that have to be followed to demonstrate that your product is safe and effective to use for the intended users in their intended use environments.
FDA requires usability testing, which is called human factors validation testing, "to demonstrate that the device can be used by the intended users without serious errors or problems, for the intended uses, and under the expected use conditions.”
This primer gives you a basic understanding of the steps and processes involved in human factors validation testing, adhering to FDA guidance, which sets forth the following requirements:
To identify your actual users, you need to consider all types of users who will engage with the product, such as patients, professional and lay caregivers, doctors, and service technicians. Each type of user represents a user group. User groups are defined by their engagement with your product and their knowledge, training, or experience with devices like yours.
Here are two examples of the user groups for a human factors validation study.
This cardiac monitoring device is to be prescribed by a doctor for patients to use in their home over a period of weeks, in which cardiac monitoring will take place, then returned to the medical device manufacturer for cleaning, inspection, and subsequent reuse. The following three user groups are identified:
This device is a stent graft to be inserted by a physician for the endovascular treatment of abdominal aortic aneurysms. The single user group comprises vascular surgeons and other cardiovascular clinicians who have experience performing this type of surgery.
For each user group, a “screener” needs to be developed. The screener provides the criteria for selecting or rejecting potential participants.
Once the user groups are identified, training requirements need to be determined. If training for any of the user groups is expected, then training should be included in the study. If training is expected but may not be available in actual use in all cases, a worst-case scenario could be identified to conduct testing without training. In this case, a separate, untrained user group may be needed for the study, along with a user group that receives training.
Generally, when training is expected, a qualified trainer (nurse, company representative) who represents the training that will be provided in real life needs to provide the training in the testing situation. For example, patients are generally trained for in home-use of a medical device by the nurse who prepares them to go home after a hospitalization. The nurses have to be trained so that they can then train the patients.
Nurses are often trained in small groups, representing the in-service training sessions they would likely receive by the device manufacturer’s representative. Physicians are typically trained individually by a company sales representative or technical support staff.
When training is part of the process for preparing users to use the medical device, the training needs to be provided in advance of the testing session. Importantly, a “decay period” needs to be included to provide a break between the training session and the testing session.
FDA does not specify the length of the decay period, but it suggests that one hour may be acceptable in some cases. In fact, a minimum one-hour decay period is commonly used in practice, as reflecting what could be a user’s lunch break.
If more time can be scheduled between training and testing, the better the realism of the training/testing scenario will be. However, it may be challenging to recruit some clinicians, such as skilled surgeons, to schedule time for training, followed by a break of a day or a week or more, and then schedule another time for the testing session.
In our practice, we schedule each physician in their own training session, followed by a break of 1 hour on the same day, and then their testing session. Participants receive an incentive for their total time for the study, which includes training, decay period, and testing.
For nurses, we schedule 3 or 4 participants for an in-service group training in the morning, then provide a minimum of 1-hour decay period break, after which we schedule the nurses in back-to-back individual sessions in the afternoon and evening of the same day.
Fewer cancellations result when all 3 elements – training, decay, testing - are included in the same day.
For each user group, all critical tasks must be identified and validated. A critical task is defined as “a user task which, if performed incorrectly or not performed at all, would or could cause serious harm to the patient or user, where harm is defined to include compromised medical care.”
Critical risks can be evaluated through performance tasks and knowledge tasks. Performance tasks document the number of use errors, close calls, and difficulties experienced by users. Knowledge tasks document users’ understanding of safe use of the product beyond what can be observed in actual use. Knowledge tasks typically include questions around critical risks, such as proper storage, disposal, and reuse. A knowledge task question might be, “How long can you safely store the device?” or “What conditions are required for safe storage?” Knowledge tasks generally require the user to find the information in the instructions for use (IFU) or the labeling or packaging. When users need to find information in the IFU, the goal is not only to confirm that they can find the information, but also that the information is understandable to them.
The critical tasks are derived from the URRA (Use Related Risk Analysis). The FDA identifies use-related risk based on “the combined probability, occurrence, and severity of harm.”
A URRA sets forth the systematic use of available information to identify use-related hazards and to estimate use-related risk. To do that, regulatory compliance staff at the medical device manufacturer need to establish levels of severity (typically on a scale from 1 - negligible to 5 - catastrophic) and frequency and likelihood of occurrence for each task. Those tasks identified as serious (3 or higher severity) are determined to be critical tasks.
A new FDA Draft Guidance document (December 9, 2022) provides the following table for use in presenting the critical risks identified in the URRA. The last column, the validation method, is generally indicated as the use scenario/performance task # or knowledge task #. These numbers come from the test plan presented in the test protocol.
Once the user groups, session lengths based on critical tasks, and training requirements are determined, recruiting can begin, using the screeners developed for each user group.
Recruiting participants for studies is one of the most important and challenging parts of planning for usability testing, as some participants may be difficult to locate and recruit, while others, such as specialized surgeons, may require hefty stipends to secure their commitment to participate.
If you do not have the resources to recruit the participants yourself, you will need to work with a company that specializes in recruiting participants that meet your requirements. It is important to note that participants cannot be employees of the device manufacturer, except where the user group is exclusively employees (such as service technicians used in reprocessing devices). It is also important to note that participants must be U.S. residents.
For each user group, FDA recommends a minimum of 15 completed usability testing sessions. This means that you will need to recruit and schedule additional users (generally 18 – 20 participants per user group) to allow for cancellations while still meeting the 15-person minimum.
Once you have done the planning to identify user groups, training requirements, and the critical tasks, you can document all aspects of the human factors validation test in a test protocol. Topics typically include the following:
Test protocols can vary widely, and many device manufacturers have templates in place for both the test protocol and the test report. In this primer, I will focus on a few key topics in the test protocol.
For the procedural/hands-on part of the study, a use scenario provides context for the user. The use scenario describes a situation that results in the participant interacting with the medical device in a natural workflow. For example, a use scenario for a clinician using a cardiac monitoring device and the associated app, might be this:
“You are preparing to see your patient [patient name with patient information provided in a handout], who will be fitted with a cardiac monitoring device. In preparation for seeing the patient, use the app on the mobile device [provided as part of the study] to confirm the patient’s details in the mobile application.”
The medical device needs to be in a production-ready state, but the device itself may need to be modified for simulated use. The reason is to protect the participants from any risk while using the device. Simulated use may also be needed to provide realistic, but not actual data on a patient. Simulated use may be also include the use of a manikin in place of an actual patient.
Simulated use also means that the testing environment must simulate, to the extent possible, the environment in which the device will be used. For instance, if there are noises in the environment, such as voices in a clinical setting or music in a home setting, these can be simulated with an audio recording of people talking or a radio playing music in the room. If lighting conditions will be low in actual use, these should be simulated. If the space in actual use is constricted, similar space restrictions need to be set up for simulated use.
Some products are best tested in a simulated use surgical suite, which some recruiting companies can provide in their testing suite. In most cases, this requirement is not needed if the test protocol documents how the simulated environment will be set up. For example, a hospital bed for a participant or a draped table with a manikin representing the patient can be used.
All use errors for all critical tasks must be identified for later analysis.
FDA defines a use error as “an action or lack of action that was different from that expected by the manufacturer and caused a result that (1) was different from the result expected by the user and (2) was not caused solely by device failure and (3) did or could result in harm.”
In addition to identifying use errors, FDA wants to know about any close calls or difficulties observed and experienced by users. These are not defined as use errors, but the study needs to identify and analyze them, particularly when more than one user experiences a close call or difficulty. The test protocol needs to define a close call or difficulty, along these lines:
Other key definitions typically include:
A moderator's guide (also called a Facilitator's Guide) is needed for each unique user group. The guide provides a script for the moderator to use in each testing session. The script includes the welcome, introduction to the study, pre-test questions about the participant’s experience/background, the scenarios/task flows being tested, post-task questions, and post-test interview questions.
The uniformity of sessions is achieved by the moderator's close adherence to the guide.
A human factors validation test put the test protocol plan into action in individual test sessions with each user in every user group.
Testing sessions (which typically can be as short as 15 minutes or as long as 2 hours or more) provide the user experience required to determine whether the device is safe to use. The logger/notetaker uses the logging template (typically in the form of an Excel spreadsheet) to note use errors, close calls, difficulties, and any assistance needed for all critical tasks, along with participants’ answers to post-task questions and post-study interview questions.
For the post-study subjective interview, FDA suggests the following questions:
1. What did you think of the device overall?
2. Did you have any trouble using it? If so, what kind of trouble did you have?
3. Was anything confusing?
4. Please tell me about this [use error or problem observed]. What happened? How did it happen?
In our studies, we typically ask participants for subjective feedback on use errors, close calls, and difficulties after participants complete the procedure tasks, while the experience is still fresh. These would include questions 2 – 4 in the FDA suggested questions noted above.
Then in the post-test subjective interview we focus on the participant’s overall experience with questions like the first one in the FDA bulleted list above, as well as these questions:
Now that testing is complete, the job of documenting the findings to determine the root cause for all critical risk errors, close calls, and difficulties requires a review of the notetaker’s logs, and any notes taken by the moderator and observers.
The report provides quantitative findings, counting how many participants experienced each use error, close call, and difficulty. The report also provides subjective data in the form of relevant comments made by participants with respect to each critical task in which a use error, close call, or difficulty was discussed, as well as participants’ subjective feedback comments in the post-test interview. For each error, close call or difficulty and each error in answering a knowledge task question, the report provides an analysis of the cause of the error to determine the root cause.
FDA states that “the primary purpose of the analysis is to determine whether [any] part of the user interface could and should be modified to reduce or eliminate the use problem and reduce the use-related risks to acceptable levels. An essential secondary purpose of the analysis is to develop a modified design that would not cause the same problem or a new problem.”
Changes to the device design and any associated elements are called “mitigations,” as they are intended to reduce the errors noted.
However, FDA does not expect that every use error can be addressed by mitigation. If, for example, the user did not read the instructions during testing, resulting in a use error, the moderator will want to explore the participant's actions in the post-task or post-test interview. The moderator may ask the participant to read the relevant step in the instruction. The participant may state that the instruction was clear, but they chose not to read it. The goal of the analysis of root causes of errors is to minimize the risk as far as possible. In this case, there might not be any further mitigation possible.
In other cases, the report may indicate that minor changes to the IFU or labeling, such as adding directional arrows to the interface, can aid the user's understanding of what they need to do, whether or not they read the instructions.
Mitigations are often presented as changes needed to the instructions for use, labeling, or training. When numerous issues are identified, FDA will likely expect that further testing is needed to confirm the reduction of risk following mitigation.
In this situation, the current human factors validation test becomes a formative usability test. Retesting in a follow-up summative human factors validation study need only focus on the critical tasks that received a high number of issues in the current test.
This primer on preparing for
human factors validation testing is intended to give you a general idea of the process. The actual steps in planning, preparing, testing, analyzing the findings, determining root causes for critical task failures, and writing the report take time and require many more considerations than I can cover here.
How much time? On a fast track for a study with a single user group in testing sessions of one hour or less, the timeline can be completed in 10 weeks if everything is ready at the start of the planning process and the team can focus full attention on the requirements.
To get a basic understanding of usability testing, check out our primer on usability testing.
Carol brings her academic background and years of teaching and research to her work with clients to deliver the best research approaches that have proven to produce practical solutions. Carol’s many publications (6 books and more than 50 articles) have made a substantial contribution to the body of knowledge in the UX field. The 2nd edition of her award-winning handbook Usability Testing Essentials is now available.
We are a small, boutique UX consultancy.
That means you work directly with the top UX experts.
We are with you every step of the way.