• Sanjukta Moorthy

Types of Sampling

When planning your data analysis project, you'll need to find a good way to structure your sampling approaches, validation, and the sample itself. Having a sample and a strategy to gather and analyse allows you to make generalisations about the whole data set. You can find a representative sample of them and use the results of your data to infer how others are likely to think, feel, and behave. It's also time- and resource-efficient since you don't need to speak with everyone in your community. As with anything related to data, there are many issues and potential risks to consider.


There are broadly two types of sampling: probability and non-probability. These are also known as random and purposive sampling.


Generally speaking, probability sampling allows you to make strong assumptions about the whole data set, and its selection is random.


Non-probability sampling is not random but is based on convenience. Selection is deliberate and reflects the most important features of the datasets you want to look at.


The type you choose depends on where you work, your dataset, the needs of your study, your expertise, and dozens of other factors. Make sure you consult with your partners, communities, and others before you commit to one type. Make sure that your reasoning and methodology are carefully documented somewhere. In case your research gives you interesting insights, you may be able to use a similar methodology in the future.


Let's look at some examples. Say you're sampling people in a community.


Four of the most common types of probability sampling are:


Simple random sample

Every member of your group has an equal chance of being chosen. Here, your sampling frame is the whole community.


Systematic sample

Similarly to simple random sampling, each person is given a number and is chosen at regular intervals (everyone with a multiple of 4, for example).


Stratified sample

Dividing the community into groups - these groups normally have distinguishing features. These groups are called strata and include age, gender, location, etc. Then you calculate how many people you need to sample from each group to get a good representation of your community and take it from there.


Cluster sample

Here you also divide your community into groups, but each group is a microcosm of the whole community. So instead of choosing individuals from each stratum, you choose an entire one - a cluster.


Within non-probability sampling, four of the most common are:


Convenience sampling

Sampling people that are convenient to you and your team - if you are of the same gender, age group, etc. This is a good way to get data quickly and could give you insider information depending on how good you interact with them. But it has flaws because it doesn't automatically mean your sample is a good representation.


Voluntary response

People volunteer to be chosen instead of being chosen by the team. So if in a focus group you ask for a show of hands of people who will help you monitor the school they built recently, that's a voluntary response sample.


Purposive sampling

Also known as judgement sampling - this is when you and the team would select your sample based on what you know of the community and your needs, and can therefore be very specific. If it's done well, it can help you have a conversation within the team about what a 'good' sample looks like and can help you get a deeper understanding of the data. If I choose to speak to all the women, this isn't very objective, but it does mean that I'm more likely to get to know their needs and realities better than a random sample.


Snowball sampling

Remember the phone tree scene in 'Legally Blonde 2', minus the mild transphobia? That's similar to snowball sampling. If your target group can be hard to access, either logistically or due to language or other barriers, you may be able to speak to groups they're associated with. For example, if you want to speak to school children but are security concerns, you may want to speak first with their teachers, parents, or community leaders. Then they would act as intermediaries to the children.


It's been more than a decade, and I haven't found a tool that perfectly fills all my needs. So that's a rough overview of the main types you often find. I always combine these and create my methodologies to suit the research.


This is one of the joys of data and research. I always begin with my community's needs and realities, then try and see if there's anything in my wheelhouse that I know can work. If not, I create something that works, combining some traditional ideas with new and creative methods, indigenous knowledge, or something from another sector. That's why I love working with bespoke ideas. It allows you to get to the heart of what you want to know more about, what your data should help you answer, and how you can better serve your people.


What do you normally use? What would you be interested in using and getting to know better? What do you need more support on?