Рубрики

drawing

Sampling and drawing in my area

Example: Using the random number table, I select the numbers 2, 7, 17, 67, 68, 75, 77, 87, 92, 101, 145, 201, 222, 232, 311, 333, 376, 401, 478, and 489.


Simple Random Sampling: 6 Basic Steps With Examples

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master’s in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

Updated March 19, 2023
Reviewed by
Reviewed by Michael J Boyle

Michael Boyle is an experienced financial professional with more than 10 years working with financial planning, derivatives, equities, fixed income, project management, and analytics.

Fact checked by
Fact checked by Kimberly Overcast

Kimberly Overcast is an award-winning writer and fact-checker. She has ghostwritten political, health, and Christian nonfiction books for several authors, including several New York Times bestsellers. Kimberly also holds a Class C private investigator license.

What Is a Simple Random Sample?

A simple random sample is a subset of a statistical population in which each member of the subset has an equal probability of being chosen. A simple random sample is meant to be an unbiased representation of a group.

Key Takeaways

  • A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member has an equal probability of being chosen.
  • Researchers can create a simple random sample using methods like lotteries or random draws.
  • A sampling error can occur with a simple random sample if the sample does not end up accurately reflecting the population it is supposed to represent.
  • Simple random samples are determined by assigning sequential values to each item within a population, then randomly selecting those values.
  • Simple random sampling provides a different sampling approach compared to systematic sampling, stratified sampling, or cluster sampling.

Simple Random Sample


Understanding a Simple Random Sample

Researchers can create a simple random sample using a couple of methods. With a lottery method, each member of the population is assigned a number, after which numbers are selected at random.

An example of a simple random sample would be the names of 25 employees being chosen out of a hat from a company of 250 employees. In this case, the population is all 250 employees, and the sample is random because each employee has an equal chance of being chosen. Random sampling is used in science to conduct randomized control tests or for blinded experiments.

The example in which the names of 25 employees out of 250 are chosen out of a hat is an example of the lottery method at work. Each of the 250 employees would be assigned a number between 1 and 250, after which 25 of those numbers would be chosen at random.

Because individuals who make up the subset of the larger group are chosen at random, each individual in the large population set has the same probability of being selected. This creates, in most cases, a balanced subset that carries the greatest potential for representing the larger group as a whole.

For larger populations, a manual lottery method can be quite onerous. Selecting a random sample from a large population usually requires a computer-generated process, by which the same methodology as the lottery method is used, only the number assignments and subsequent selections are performed by computers, not humans.

Room for Error

With a simple random sample, there has to be room for error represented by a plus and minus variance (sampling error). For example, if in a high school of 1,000 students a survey were to be taken to determine how many students are left-handed, random sampling can determine that eight out of the 100 sampled are left-handed. The conclusion would be that 8% of the student population of the high school are left-handed, when in fact the global average would be closer to 10%.

The same is true regardless of the subject matter. A survey on the percentage of the student population that has green eyes or is physical disability would result in a mathematical probability based on a simple random survey, but always with a plus or minus variance. The only way to have a 100% accuracy rate would be to survey all 1,000 students which, while possible, would be impractical.

Although simple random sampling is intended to be an unbiased approach to surveying, sample selection bias can occur. When a sample set of the larger population is not inclusive enough, representation of the full population is skewed and requires additional sampling techniques.


Simple random sampling

In simple random sampling (SRS), each sampling unit of a population has an equal chance of being included in the sample. Consequently, each possible sample also has an equal chance of being selected. To select a simple random sample, you need to list all of the units in the survey population.

To draw a simple random sample from a telephone book, each entry would need to be numbered sequentially. If there were 10,000 entries in the telephone book and if the sample size was 2,000, then 2,000 numbers between 1 and 10,000 would need to be randomly generated by a computer. All numbers would have the same chance of being generated by the computer. The 2,000 telephone entries corresponding to the 2,000 computer-generated random numbers would make up the sample.

SRS can be done with or without replacement. An SRS with replacement means that there is a possibility that the sampled telephone entry may be selected twice or more. Usually, the SRS approach is conducted without replacement because it is more convenient and gives more precise results. In the rest of the text, SRS will be used to refer to SRS without replacement, unless stated otherwise.

SRS is the most commonly used method. The advantage of this technique is that it does not require any information on the survey frame other than the complete list of units of the survey population along with contact information. Also, since SRS is a simple method and the theory behind it is well established, standard formulas exist to determine the sample size, the estimates and so on, and these formulas are easy to use.

On the other hand, this technique necessitates a list of all units of the population. If such a list doesn’t already exist and the target population is large, it can be very expensive or unrealistic to create one. If a list already exists and includes auxiliary information on the units, then the SRS is not taking advantage of information that allows other methods to be more efficient (like stratified sampling, for example). If collection has to be made in-person, SRS could give a sample that is too spread out across multiple regions, which could increase costs and duration of the survey.

Imagine that you own a movie theatre and you are offering a special horror movie film festival next month. To decide which horror movies to show, you survey moviegoers to ask them which of the listed movies are their favorites. To create the list of movies needed for your survey, you decide to sample 10 of the 100 best horror movies of all time. One way of selecting a sample would be to write all of the movie titles on slips of paper and place them in an empty box. Then, draw out 10 titles and you will have your sample. By using this approach, you will have ensured that each movie had an equal probability of selection. You could even calculate this probability of selection by dividing the sample size (n=10) by the population size of the 100 best horror movies of all time (N=100). This probability would be 0.10 (10/100) or 1 in 10.

Systematic sampling

Systematic sampling means that there is a gap, or interval, between each selected unit in the sample. For instance, you could follow these steps:

  1. Number the units on your frame from 1 to N (where N is the total population size).
  2. Determine the sampling interval (K) by dividing the number of units in the population by the desired sample size. For example, to select a sample of 100 from a population of 400, you would need a sampling interval of 400/100 = 4. Therefore, K = 4. You will need to select one unit out of every four units to end up with a total of 100 units in your sample.
  3. Select a number between one and K at random. This number is called the random start and it would be the first number included in your sample. If you choose 3, the third unit on your frame would be the first unit included in your sample; if you choose 2, your sample would start with the second unit on your frame.
  4. Select every Kth (in this case, every fourth) unit after that first number. For example, the sample might consist of the following units to make up a sample of 100: 3 (the random start), 7, 11, 15, 19 …395, 399 (up to N, which is 400 in this case).

In the example above, you can see that there are only four possible samples that can be selected, corresponding to the four possible random starts:

1, 5, 9, 13 … 393, 397

2, 6, 10, 14 … 394, 398

3, 7, 11, 15 … 395, 399

4, 8, 12, 16 … 396, 400

Each member of the population belongs to only one of the four samples and each sample has the same chance of being selected. From that, we can see that each unit has a one in four chance of being selected in the sample. This is the same probability as if a simple random sample of 100 units was selected. The main difference is that with SRS , any combination of 100 units would have a chance of making up the sample, while with systematic sampling, there are only four possible samples. The units’ order on the frame will determine the possible samples for systematic sampling. If the population is randomly distributed on the frame, then systematic sampling should yield results that are similar to simple random sampling.

This method is often used in industry, where an item is selected for testing from a production line to ensure that machines and equipment are of a standard quality. For example, a tester in a manufacturing plant might perform a quality check on every 20th product in an assembly line. The tester might choose a random start between the numbers 1 and 20. This will determine the first product to be tested; every 20th product will be tested thereafter.

Interviewers can use this sampling technique when questioning people for a sample survey. The market researcher might select, for example, every 10th person who enters a particular store, after selecting the first person at random. The surveyor may interview the occupants of every fifth house on a street, after randomly selecting one of the first five houses.

The advantages of systematic sampling are that the sample selection cannot be easier: you only get one random number, the random start, and the rest of the sample automatically follows. The biggest drawback of the systematic sampling method is that if there is some periodical feature in the way the population is arranged on a list and that periodical feature coincides in some way with the sampling interval, the possible samples may not be representative of the population. This can be seen in the following example:

Suppose you run a large grocery store and have a list of the employees in each section. The grocery store is divided into the following 10 sections: deli counter, bakery, cashiers, stock, meat counter, produce, pharmacy, photo shop, flower shop and dry cleaning. Each section has 10 employees, including a manager (making 100 employees in total). Your list is ordered by section, with the manager listed first and then, the other employees by descending order of seniority.
If you wanted to survey your employees about their thoughts on their work environment, you might choose a small sample to answer your questions. If you use a systematic sampling approach and your sampling interval is 10, then you could end up selecting only managers or only the newest employees in each section. This type of sample would not give you a complete or appropriate picture of your employees’ thoughts.

Sampling with probability proportional to size

Probability sampling requires that each member of the survey population has a known probability of being included in the sample, but it does not require that this probability be the same for everyone. If there is information available on the frame about the size of each unit (e.g. number of employees for each business) and if those units vary in size, this information can be used in the sampling selection in order to increase the efficiency. This is known as sampling with probability proportional to size (PPS). With this method, the bigger the size of the unit, the higher the chance of being included in the sample. For this method to bring increased efficiency, the measure of size needs to be accurate. This is a more complex sampling method that will not be discussed in further detail here.

When using stratified sampling, the population is divided into homogeneous, mutually exclusive groups called strata, and then independent samples are selected from each stratum. Any of the sampling methods mentioned in this section can be used to sample within each stratum. The sampling method can vary from one stratum to another. A population can be stratified by any variable for which a value is available for all units on the sampling frame prior to sampling (e.g. age, sex, province of residence, income).

Why create strata? There are many reasons, the main one being that it can make the sampling strategy more efficient. It was mentioned in the previous section that in order to an estimation of a certain precision, a larger sample size is needed for a characteristic that varies greatly from one unit to the other than for a characteristic with smaller variability. For example, if every person in a population had the same salary, then a sample of one individual would be enough to get a precise estimate of the average salary.

This is the idea behind the efficiency gain obtained with stratification. If you create strata within which units share similar characteristics and are considerably different from units in other strata then you would only need a small sample from each stratum to get a precise estimate of total income for that stratum. Then you could combine these estimates to get a precise estimate of total income for the whole population. If you were to use a SRS in the whole population without stratification, the sample would need to be larger than the total of all stratum samples sizes to get an estimate of total income with the same level of precision.

Another advantage is that stratified sampling ensures an adequate sample size for subgroups of interest in the population. When a population is stratified, each stratum becomes an independent population and a sample size is calculated for each of them.

Suppose you want to estimate how many high school students have part-time jobs at the national level and provincial level. If you were to select a simple random sample of 25,000 people from a list of all high school students in Canada (assuming such a list was available for selection), you would end up with just a little over 100 people from Prince Edward Island, since they account for less than 0.5% of the Canadian population. This sample would probably not be large enough for the kind of detailed analysis you were planning for. Stratifying your list by province and then determining a sample size needed in each province would allow you to get the required level of precision for Prince Edward Island and for each of the other provinces as well.

Stratification is most useful when the stratifying variables are

  • simple to work with,
  • easy to observe,
  • closely related to the topic of the survey.
Colin Wynn
the authorColin Wynn

Leave a Reply