VCE IT Lecture Notes by Mark Kelly, McKinnon Secondary College

Data Validation

Slideshow linkSee my validation slideshow

Types of validation

  • Electronic validation (performed by the RDBMS, spreadsheet, program etc)
    • range checking in database (e.g. age between 5 and 25)
    • existence checking (is box filled in?) [may be referred to as 'null testing']
    • spell checking (treat electronic spellchecking with great suspicion - treat it more like a "typing error checker"!)
    • validation rules in databases, spreadsheets (rejects or queries dubious input)
    • spreadsheet formulas to check values in other cells (e.g. =IF(AND(A10< >"M",A10< > "F","Gender must be M or F","")
    Note: Using drop-down limited lists to prevent erroneous data entry (e.g. must select state of Australia from a list rather than being allowed to type one in) is not really a validation technique which checks for invalid data : it is a tool to prevent invalid data being entered.

    NOTE: do not use electronic validation for data that must be checked with human judgement (e.g. people's surnames, dates of events). For that you need manual validation...

  • Manual validation (performed by humans)
    • proof-reading a document to see if it makes sense
    • checking the placement of graphics and text on a page/screen
    • items that require human common sense to tell if they are valid or not
    • checking if an image is too blurry or obscure
    • checking colour combinations for readability and attractiveness
    • checking spelling of words or names that are not in a dictionary
    • checking that data is complete (e.g. that "Printing" is included in a set of instructions)

 

Validation

Checking the reasonableness of the data before it is processed. Remember GIGO: garbage in, garbage out. Processing rubbish data gives rubbish information. If all the foremen in the factory just made up the numbers on the workers' time cards, working on the pay cheques would be totally pointless.

Validation is ensuring inputted data is of the right type (e.g. numeric) and within reasonable limits (e.g. ages between 1 and 130). Databases and spreadsheets can have validation rules built into data fields to reject impossible entries. Validation can include:

  • Existence: is some essential data simply missing, such as a name?
  • Reasonableness: does it seem logical? Checking hours are within reasonable limits (e.g. anything over say 50 hours a week may be questioned)
  • Consistency: Checking for inconsistencies in surveys (e.g. a person claims to be unemployed but earns $80,000 a year.) Some surveys ask similar questions in different parts of the survey to check whether people are lying when they answer.
  • Type check: e.g. have letters been entered instead of digits?
  • Format check: e.g. an ID must be three letters followed by 4 digits. Is date entered as dd/mm/yy?
  • Range check: is a date in August between 1 and 31?
  • Sending data back to its source for confirmation before it is entered into a system (e.g. people joining a club might be sent back the data they put on their application forms: they have to confirm it is correct before the data is entered into the club database)
  • Using a check digit to validate a credit card number

Note: Validation does not check that data are accurate (e.g. that when Fred says he's 23, he actually is 23) but it can highlight suspicious entries.

Remember: data can be valid, but inaccurate!

Some definitions of data validation:

  • A set of rules you can apply to form fields to restrict the type of information site visitors enter into forms. For example, you can set rules so that only letters, and not numbers, can be entered into a "name" field on a form. www.west.asu.edu/achristie/555/glossary.htm
  • A process used to determine if data are incomplete, or unreasonable. The process may include format checks, completeness, checks, check key tests, reasonableness checks, and limit checks. www.labcompliance.com/glossary/c-d-glossary.htm
  • Data validation is a process to ensure the quality of data by checking they have been entered correctly. wps.prenhall.com/wps/media/objects/505/517554/glossary.html
  • A systematic effort to review data to identify any outliers or errors and thereby cause deletion or flagging of suspect values to assure the validity of the data to the user. www.sbeach.navy.mil/Programs/Environmental/IR/Reading_Room/Glossary/G_D.htm
  • A term used to describe the process of evaluating data once it has been entered into a software program. Using a set of rules, which may contain a range of acceptable values, the evaluation results in either the entry being accepted or rejected. highered.mcgraw-hill.com/sites/0072470941/student_view0/glossary.html

Some methods for enforcing valid data entry include:

Verification - the process of entering data twice, and comparing the two entries to find differences. Aa common example of this is when you are asked to enter a new password twice to ensure it has been entered accurately.

Limited lists (value lists) - Some controls that help enforce validation rules are limited lists (or value lists in Filemaker Pro talk) where the user must choose data from a list of options. This is perfect to ensure they enter only legitimate values in a form that can be understood (e.g. what state of Australia they are from). Calendar controls are good for preventing the entry of invalid dates.

Note that limited lists are NOT a validation measure - they do not check for invalid data; they prevent invalid data being entered in the first place.

NOTE: Don't abuse limited lists by using them where the possible entries are not actually limited

e.g. If you forced users to select their title from a limited list and you offered the options of Mr, Mrs, Miss and Dr - what about professors, reverends, Ms etc? You could never guarantee you included every title in existence... and even if you DID list every title, a new one may be invented tomorrow... and your "limited" list would be so large it would be impossible to work with.

BE CAREFUL ABOUT...

  • Validation only involves checking the reasonableness (not the accuracy) of input data - usually checks of existence, type and range. e.g. if a student in year 7 says he's 13, the data is valid (unlike an age of 99 or -3), but not necessarily accurate - he could be 12.
  • Validation is not the same as testing. Testing ensures the output is accurate.
  • Efficient processing means manipulating data in a way that is not wasting time, money or effort.
  • Effective validation is that which works well and leads to valuable, reliable output.

Remember: you validate input data, and you test output information (e.g. the answers produced by calculations, the readability of printed text)

 

Back to the IT Lecture Notes index

Back to the last page you visited

Broken off the IPC page 23 August 2007

Last changed: November 23, 2011 12:02 PM

VCE IT Lecture notes copyright © Mark Kelly 2001-