DATA QUALITY
Evaluations and the findings derived can only ever be as good as the data basis. Surveys offer a broad possibility of having errors in a raw-data set. In general, we can distinguish between sampling errors, encryption errors and distortion errors. The sampling error is a statistical measure due to the fact that not the entire population but only a sample can be included in the survey. This error is usually unavoidable. The encryption error can occur when the survey results are transferred to an electronic form. However, it can be avoided or remedied by means of suitable controls.
Distortion errors can be systematically summarised as follows:
- Errors from sampling (sampling errors)
- Error by not including or excluding one or more groups (coverage error)
- Error due to non-responder (non-response error)
- Errors in the implementation and execution of the survey (survey error)
The following distortion errors are the most common, can influence the results very strongly and should be kept as low as possible:
interviewer influence / interviewer bias (survey error):
For example, there can be an influence on the result, depending on who has conducted an interview or a measurement. Aspects such as gender, age, accuracy, conscientiousness etc. often play a very important role here. The object of the survey should be as dependent as possible on significant characteristics of the survey personnel.
Coverage problem:
In an ideal (random) sample, each person (survey unit) should have the same chance of being drawn. Many registers of a population are (systematically) incomplete: Not every person has internet access for an online survey, not every person is registered in a telephone register etc. If the response behaviour of unreachable people does not match that of reachable people, coverage problems can severely distort results.
Unit non-response:
This error represents a source of distortion in which the behaviour of the respondents need not be the same as that of the non-responders. This effect is strongly dependent on the type of survey and centrally on the response rate. For this reason in particular, the response rate of a survey must always be greater than 50% - the higher, the better. If the sample size is sufficient, surveys with a very low response rate often achieve statistically very accurate but completely false results.
Not responding to parts of a survey / item non-response:
For example, the absence of survey characteristics is understood as the absence of individual answers or information in a survey. This data could be considered "ignorable" if it is purely random and therefore has no effect on the results of the survey. In this case, the lack of survey characteristics only influences the economic efficiency of a survey. It is difficult to estimate what influence this lack of survey characteristics can have on the result of the survey. The following are possible reasons for the lack of survey characteristics:
- Forget, don't remember, ...
- Problems of understanding in the context of the survey
- Overload of respondents
- Other personal reasons
- Intentional wrong answers to make a good impression
It is important to note that influencing factors such as the design of the questionnaire or the interviewer himself or herself are particularly relevant for the lack of survey characteristics. A number of measures can be taken as part of a survey to help minimize the extent of this type of non-response:
- Improvement of the layout of self-administered questionnaires
- Better training of interviewers
- Careful pre-tests
- Careful handling of collected data
- Reduction of memory joggers
- Reduction of comprehension problems
- Reduction of the respondent's subjective perception that his or her privacy is being violated.
When conducting a survey, maximum attention must be paid to reducing the item-non-response effect. However, since this effect can never be excluded the following two steps are essential:
- Analysis of the absence of survey characteristic effect and, if necessary,
- the generation of the missing data by data imputation.