VCE IT Lecture Notes by Mark Kelly, McKinnon Secondary College

Formatting and structure of input and output

(2011-2014) SD U3O2 KK05- formatting and structural characteristics of efficient and effective input and output

This is a dense piece of key knowledge! If you unpack it, you get these:

Formatting characteristics of efficient input.
Formatting characteristics of efficient output.
Formatting characteristics of effective input.
Formatting characteristics of effective output
Structural characteristics of efficient input.
Structural characteristics of efficient output.
Structural characteristics of effective input.
Structural characteristics of effective output

Remember that efficient refers to not wasting time, money or effort.
Effective
refers to how well it does its job - e.g. accuracy, attractiveness, accessibility, relevance, usability, functionality, maintainability, robustness, reliability, compatibility etc.

Working out what is relevant to each item is the tricky bit.

Let's assume input is data, and output is information.
Let's also assume formatting refers to the appearance of stuff, and structure refers to how it is built internally.

So let's move on...

 

Formatting characteristics of efficient input

Time and effort can be saved if input data are logically formatted.

e.g. if Australian money values are entered with 2 decimal places and commas to separate thousands (e.g. $12,345.60 instead of 12345.6) there are fewer chances of confusion and errors that could slow down data input.

Entering months as 3-character codes instead of their full names can save time and effort.

Formatted input can be achieved by using input masks which control what characters can be entered and when - e.g. a credit card might be constrained to have the format ####-#####-####-#### where # is a digit.

Simple examples on the interface can also inform users as to what formats are expected for input, e.g. Birth Date (dd/mm/yyyy)? This is particularly important where different cultures use different formats, which could lead to ambiguous or unreliable data. e.g. 1st of February 2003 looks like "1/2/03" to Australians, "2/1/03" to Americans and 03/2/1 to Asians. Just looking at the date, you could not tell which is right without having formatted input to guide the user into using the right format.

Using contrained controls such as drop-down calendars for date entry or limited lists to enter Australian states can also enforce data input formats because the user has no choice but to enter properly formatted data.

 

 

Formatting characteristics of efficient output

Formatting of output is not very strongly connected to saving time, money or effort.

If dates are formatted as short dates instead of long dates (e.g. '2 Feb 2012' instead of 'Monday February 2, 2012') it would be quicker to print.

If the efficiencies of the users of the output were considered, there's more scope to answer. For example, if a mass of numeric data were output in a neat table, correctly labelled and aligned, nicely paginated, and colour coded it would save the user time and effort in interpreting the output. (??)

Formatted output is produced by statements like C's printf or Basic's print using.

printf ("Processing of `%s' is %d%% finished.\nPlease be patient.\n",

Formatting a catalogue product's SKU as a barcode will be vastly more efficient at the cash register than is printing the SKU as a number and having to type it in.

 

Formatting characteristics of effective input

Related to "Formatting characteristics of efficient input" (above), if input is well formatted it is not only saves time, but it also aids validation in rejecting invalid data, thereby improving the integrity of the data being entered.

 

 

Formatting characteristics of effective output

As suggested above, well-formatted output is a strong contributor to effectiveness.

Good formatting of output makes information easier to find, easier to read, and easier to interpret.

Simple things like right-aligning columns of numbers (or aligning on the decimal point) makes it easier to distinguish the values of numbers at a glance.

Judicial use of bold, italics etc make the meaning of different parts of the output clearerCompare these columns - which is easier to interpret?

1.13

1.130
2.456
2.456
11.3
11.300
24.56
24.560
113
113.000
245.6
245.600

Large numbers (e.g. in computer storage and communication figures) also benefit from effective output formatting...

Downloaded Downloaded
141503142 141,503,142
380728863 380,728,863
693395291 693,395,291
702647942 702,647,942
1226157909 1,226,157,909
1173310581 1,173,310,581
807593341 807,593,341
721611069 721,611,069
3142442145 3,142,442,145
7200195974 7,200,195,974
9632103009 9,632,103,009
7374781560 7,374,781,560

 

 

Structural characteristics of efficient input

This has many SD teachers confused about its meaning. A structural characteristic of input probably refers to the way the input data is arranged, compared to how it looks when presented.

I'd say that it could include:

  • data types (e.g. chosing between text, integer or single precision variables)
  • data structures (e.g. variables, arrays, linked lists)
  • file structure (e.g. random vs serial)

Efficiency is important when doing input from files. Random file access is faster than serial files because the starting point of any record in a random file can be calculated and accessed immediately without having to scroll through all the intervening records. Random files are like old LP records and serial files are like cassettes: with a record one can life the arm and drop it anywhere you like; with a cassette you have to wind through all the tape between the current spot and the desired spot.

On the other hand, random files can only work if all records have exactly the same length, which can impose considerable restrictions on later free data entry. If one leaves enough space in a record for the longest conceivable piece of valid data, file sizes quickly grow.

Serial files, while slower to access, have virtually no restrictions on the length of a record, which gives considerable freedom of data entry and no wasted storage space. An improvement to a plain serial file would be an indexing system whereby the starting point of each record in the file is recorded in a separate index, so a record can be accessed relatively quickly at the cost of having to maintain the index as records' lengths change.

So random files could be said to be more efficient in terms of speed), but serial files may be more effective in terms of flexibility and lack of wasted storage space.

 

 

Structural characteristics of effective/efficient input/output

Structured output includes producing output like strings, graphs, trees, and time series.

It might also refer to: sorting; categorising and grouping; and summarising information. (??)

Outputting to XML gives output a structure that allows the data to be used as organised input later.

Data can be structured in several different ways for input, such as:

- raw serial data stream (one datum per line), e.g.

John
Smith
4 Jones St
McKinnon
3204
Mary
Brown
5 Red St
Cheltenham
3192

- CSV (comma separated values) with a complete set of fields in one line e.g.

"John", "Smith", "4 Jones St", "McKinnon", "3204"
"Mary", "Brown", "5 Red St", "Cheltenham", "3192"

- structured records (random files with fixed-length fields)

- highly structured files (e.g. Excel file format)

- XML data collection, e.g.

<?xml version="1.0" encoding="UTF-8" ?>
<METADATA>
<FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="address" TYPE="TEXT"/>
<FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="gname" TYPE="TEXT"/>
<FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="postcode" TYPE="TEXT"/>
<FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="sname" TYPE="TEXT"/>
<FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="suburb" TYPE="TEXT"/>
</METADATA>
<RESULTSET FOUND="2">
<ROW MODID="1" RECORDID="1">
<COL> <DATA>5 Red St</DATA> </COL>
<COL> <DATA>Mary</DATA> </COL>
<COL> <DATA>3192</DATA> </COL>
<COL> <DATA>Brown</DATA> </COL>
<COL> <DATA>Cheltenham</DATA> </COL>
</ROW>
<ROW MODID="1" RECORDID="2">
<COL> <DATA>4 Jones St</DATA> </COL>
<COL> <DATA>John</DATA> </COL>
<COL> <DATA>3204</DATA> </COL>
<COL> <DATA>Smith</DATA> </COL>
<COL> <DATA>McKinnon</DATA> </COL>
</ROW>
</RESULTSET>


Linked lists - read up on linked list details. Linked lists are interesting because they are like the indexed serial file I mentioned above.

Another use of a linked list is to create an internal index for a serial data file. It works something like this:

Each group of fields making up a record is preceded by a pointer to the start of the next record. That is a singly-linked list (i.e. you can only navigate forwards through records.) A doubly-linked would have another pointer aimed at the starting byte of the preceding record. This would let you seek backwards.

1 2 3 4 5 6 7 8 9 10 1 12 3 4 5 6 7 8 9 20 1 2 23 4 5 6 7 8 9 30 1 2 3 4 5 6 37 8 9 40 1 2 3 4 5 6 7 8 9 50
12 J o h n , S m i t h 23 M a r y , B r o w n 37 M a s a h i r o , K o b e 51 B o u r k e , H u n t e r

To find record 3, you'd read the first pointer (12) and jump to that location where you'd read 23 and jump to that - and record 3 is right there! It's pretty fast but does involve seeking through a file, and the pointers need to be updated when any value's length is changed. Random file navigation is faster and does not need to maintain indexes, but the price you pay is inflexible field lengths.


Binary trees - see details on binary trees

Similar to a linked list, often used to store massive amounts of data coming in random order (e.g. counting the occurences of each word in a novel). The tree is made of nodes, each of which can have children that are of a value less than or greater than the value of the parent node. Traversing the tree is made simple just by following the "Child" and "Parent" pointers of each node in the tree.

binary tree

 

 

Structured & unstructured data files

When reading and writing settings files (e.g. .ini) , there are a couple of approaches:

1) Read and write the data in a structured order. e.g.

c:\games\trident
340
506
FF0000
000000
29
13948

2) Use an unstructured ordering with tags to identify the data being read. e.g.

BackColour=000000
SaveFolder=c:\games\trident
LastX=340
Score=13948
ForeColour=FF0000
LastY=506
Speed=29

The fixed-order method is quicker and easier to implement, but if the data order isn't correct, it collapses. It's very rigid, but suitable when data I/O is totally predictable and controlled.
The tags let you accept data in any order, but takes extra coding to identify the tags, parse out [separate] the data part, and store the values in the right places.

A pseudocode example to implement it...

Begin
Open inifile
While Not EOF(inifile)
  Read dataline
  EqualsPosition <-- position of "=" sign in dataline
  If EqualsPosition <> 0 'there is an "=" sign)
    Tag <-- text to the left of the "=" sign
    Value <-- Text to the right of the "=" sign
    SELECT CASE Tag
      Case "BackColour" : store value in BackColour variable
      Case "Speed": store value in Speed variable
      etc etc - one case for each tag
      Case Else : no tag match? complain about unknown tag
    END SELECT
  End If
End While
End

An advantage of the tagged input method is if there is no limit to the number of times values can be read. For example, if a settings file could contain zero or many "Friend" values, a fixed-order list of data could not cope: it would have to specify a fixed number of values that could be accepted. A tagged system, however, could read values as often as needed.

HTML - structured input/output

HTML is a markup language used to create webpages, and it is full of structure that is understood and expected by browsers, editors and webservers.

The most basic structured output of a webpage editor would consist of:

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Untitled Document</title>
  </head>
  <body>
  This is the body section.
</body>
</html>

The browser expects to see the <html>, <head> and body tags open and close in the right order so they can interpret the page properly. Badly structured HTML files may appear strangely, or not at all.

Another use of tags to provide data is in the use of "rich text format" or RTF which is a universal way of transferring formatted text between applications.

 

 

Anything I've missed? Please let me know.

Back to the IT Lecture Notes index

Back to the last page you visited

Created 17 Sep 2010

Last changed: May 17, 2012 3:20 PM

VCE IT Lecture notes copyright © Mark Kelly 2001-