Introduction
This section presents the fundamentals of constructing COBOL programs. It explains the notation used in COBOL syntax diagrams and enumerates the COBOL coding rules. It shows how user-defined names are constructed and examines the structure of COBOL programs.
COBOL syntax
COBOL syntax is defined using particular notation sometimes called the COBOL MetaLanguage.
In this notation, words in uppercase are reserved words. When underlined they are mandatory. When not underlined they are "noise" words, used for readability only, and are optional. Because COBOL statements are supposed to read like English sentences there are a lot of these "noise" words.
Words in mixed case represent names that must be devised by the programmer (like data item names).
When material is enclosed in curly braces { }, a choice must be made from the options within the braces. If there is only one option then that item in mandatory.
Material enclosed in square brackets [ ], indicates that the material is optional, and may be included or omitted as required.
The ellipsis symbol ... (three dots), indicates that the preceding syntax element may be repeated at the programmer's discretion.
COBOL coding rules
Traditionally, COBOL programs were written on coding forms and then punched on to punch cards. Although nowadays most programs are entered directly into a computer, some COBOL formatting conventions remain that derive from its ancient punch-card history.
On coding forms, the first six character positions are reserved for sequence numbers. The seventh character position is reserved for the continuation character, or for an asterisk that denotes a comment line.
The actual program text starts in column 8. The four positions from 8 to 11 are known as Area A, and positions from 12 to 72 are Area B.
Although many COBOL compilers ignore some of these formatting restrictions, most still retain the distinction between Area A and Area B.
When a COBOL compiler recognizes the two areas, all division names, section names, paragraph names, FD entries and 01 level numbers must start in Area A. All other sentences must start in Area B.
In our example programs we use the compiler directive (available with the NetExpress COBOL compiler) - $ SET SOURCEFORMAT"FREE" - to free us from these formatting restrictions.
Ancient COBOL coding form
Name construction
vAll user-defined names, such as data names, paragraph names, section names condition names and mnemonic names, must adhere to the following rules:
vThey must contain at least one character, but not more than 30 characters. They must contain at least one alphabetic character. They must not begin or end with a hyphen.
vThey must be constructed from the characters A to Z, the numbers 0 to 9, and the hyphen.
vThey must not contain spaces.
vNames are not case-sensitive: TotalPay is the same as totalpay, Totalpay or TOTALPAY
The structure of COBOL programs
COBOL programs are hierarchical in structure. Each element of the hierarchy consists of one or more subordinate elements.
The hierarchy consists of Divisions, Sections, Paragraphs, Sentences and Statements.
A Division may contain one or more Sections, a Section one or more Paragraphs, a Paragraph one or more Sentences and a Sentence one or more Statements.
We can represent the COBOL hierarchy using the COBOL metalanguage as follows;
Divisions
A division is a block of code, usually containing one or more sections, that starts where the division name is encountered and ends with the beginning of the next division or with the end of the program text.
Sections
A section is a block of code usually containing one or more paragraphs. A section begins with the section name and ends where the next section name is encountered or where the program text ends.
Section names are devised by the programmer, or defined by the language. A section name is followed by the word SECTION and a period.
See the two example names below -
SelectUnpaidBills SECTION.
FILE SECTION.
Paragraphs
A paragraph is a block of code made up of one or more sentences. A paragraph begins with the paragraph name and ends with the next paragraph or section name or the end of the program text.
A paragraph name is devised by the programmer or defined by the language, and is followed by a period.
See the two example names below -
PrintFinalTotals.
PROGRAM-ID.
Sentences and statements
A sentence consists of one or more statements and is terminated by a period.
For example:
MOVE .21 TO VatRate
MOVE 1235.76 TO ProductCost
COMPUTE VatAmount = ProductCost * VatRate.
A statement consists of a COBOL verb and an operand or operands.
For example:
SUBTRACT Tax FROM GrossPay GIVING NetPay
The Four Divisions
At the top of the COBOL hierarchy are the four divisions. These divide the program into distinct structural elements. Although some of the divisions may be omitted, the sequence in which they are specified is fixed, and must follow the order below.
General Layout
IDENTIFICATION DIVISION.
Contains program information
ENVIRONMENT DIVISION.
Contains environment information
DATA DIVISION.
Contains data descriptions
PROCEDURE DIVISION.
Contains the program algorithms
The IDENTIFICATION DIVISION
The IDENTIFICATION DIVISION supplies information about the program to the programmer and the compiler.
Most entries in the IDENTIFICATION DIVISION are directed at the programmer. The compiler treats them as comments.
The PROGRAM-ID clause is an exception to this rule. Every COBOL program must have a PROGRAM-ID because the name specified after this clause is used by the linker when linking a number of subprograms into one run unit, and by the CALL statement when transferring control to a subprogram.
The IDENTIFICATION DIVISION has the following structure:
IDENTIFICATION DIVISION
PROGRAM-ID. ProgramName.
[AUTHOR. ProgramerName.]
other entries here
The keywords - IDENTIFICATION DIVISION - represent the division header, and signal the commencement of the program text.
PROGRAM-ID is a paragraph name that must be specified immediately after the division header.
NameOfProgram is a name devised by the programmer, and must satisfy the rules for user-defined names.
Here's a typical program fragment:
The ENVIRONMENT DIVISION
The ENVIRONMENT DIVISION is used to describe the environment in which the program will run.
The purpose of the ENVIRONMENT DIVISION is to isolate in one place all aspects of the program that are dependant upon a specific computer, device or encoding sequence.
The idea behind this is to make it easy to change the program when it has to run on a different computer or one with different peripheral devices.
In the ENVIRONMENT DIVISION, aliases are assigned to external devices, files or command sequences. Other environment details, such as the collating sequence, the currency symbol and the decimal point symbol may also be defined here.
The DATA DIVISION
As the name suggests, the DATA DIVISION provides descriptions of the data-items processed by the program.
The DATA DIVISION has two main sections: the FILE SECTION and the WORKING-STORAGE SECTION. Additional sections, such as the LINKAGE SECTION (used in subprograms) and the REPORT SECTION (used in Report Writer based programs) may also be required.
The FILE SECTION is used to describe most of the data that is sent to, or comes from, the computer's peripherals.
The WORKING-STORAGE SECTION is used to describe the general variables used in the program.
The DATA DIVISION has the following structure and syntax:
IDENTIFICATION DIVISION.
PROGRAM-ID. SequenceProgram.
AUTHOR. XXXXX.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 Num1 PIC 9 VALUE ZEROS.
01 Num2 PIC 9 VALUE ZEROS.
01 Result PIC 99 VALUE ZEROS
The PROCEDURE DIVISION
The PROCEDURE DIVISION contains the code used to manipulate the data described in the DATA DIVISION. It is here that the programmer describes his algorithm.
The PROCEDURE DIVISION is hierarchical in structure and consists of sections, paragraphs, sentences and statements.
Only the section is optional. There must be at least one paragraph, sentence and statement in the PROCEDURE DIVISION.
Paragraph and section names in the PROCEDURE DIVISION are chosen by the programmer and must conform to the rules for user-defined names.
SAMPLE COBOL PROGRAM
IDENTIFICATION DIVISION.
PROGRAM-ID. SequenceProgram.
AUTHOR. Michael Coughlan.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 Num1 PIC 9 VALUE ZEROS.
01 Num2 PIC 9 VALUE ZEROS.
01 Result PIC 99 VALUE ZEROS.
PROCEDURE DIVISION.
CalculateResult.
ACCEPT Num1.
ACCEPT Num2.
MULTIPLY Num1 BY Num2 GIVING Result.
DISPLAY "Result is = ", Result.
STOP RUN.
CONDITIONAL STATEMENTS:
Conditional Processing
This
is where a lot of uneducated programmers come unstuck! Even though COBOL allows
the following:
IF <condition> {THEN} <statement-1> ELSE <statement-2> {END-IF}.
There
are some basic guidelines which can be applied in order to make the code more
readable and easier to maintain. These are:
vEach portion (condition, ELSE
, statement-1, statement-2, END-IF
) should be on a separate line.
This allows for future additions or deletions without having to modify more
lines than is necessary.
vThe word ELSE
should be aligned in exactly the
same column as the IF
to which it is associated. This
makes the association more obvious in the listing, especially with multiple or
nested IFs
.
vCOBOL'85 allows each condition to
be terminated with an END-IF
.
Its use should be encouraged as it makes it absolutely clear where each
condition is supposed to end, thus avoiding the possibility of confusion and
mistakes. Like the ELSE
, the END-IF
should be aligned in exactly the
same column as IF
with which it is associated.
vStatement-1 and statement-2 should
be indented, usually by four character positions. This allows the IF
, ELSE
and END-IF
to be more distinctive in the
listing.
This
now gives us the following construction:
IF <condition>
<statement-1>
ELSE
<statement-2>
END-IF.
Here
are some extra guidelines for nested IFs
:
vbFor each level of nested IF
indent all associated lines by
four characters. This gives the following:
IF <condition-1>
IF <condition-2>
<statement-1>
ELSE
<statement-2>
END-IF
ELSE
<statement-3>
END-IF.
vDon't ever use more than three
levels of nested IF
- they are extremely difficult to
debug and maintain.
vRemember that each ELSE
is paired with the IF
that immediately precedes it in
the code, not necessarily the one under which it is aligned. Take the following
example:
IF <condition-1>
IF <condition-2>
<statement-2>
ELSE
<statement-1>.
According
to the indentation <statement-1>
is supposed to be executed if <condition-1>
is false, but COBOL follows its
own rules and executes <statement1>
if <condition-1>
is true and <condition-2>
is false. This type of error is more avoidable if
the END-IF
is used, as in the following
example:
IF <condition-1>
IF <condition-2>
<statement-2>
END-IF
ELSE
<statement-1>
END-IF. | or... | IF <condition1>
IF <condition2>
<statement-2>
ELSE
<statement-1>
END-IF
END-IF. |
vIn the case where an ELSE
is immediately followed by an IF
without any intervening
statements (ie: where only one out of a series of conditions will be TRUE) it
is not necessary to indent at each new IF
otherwise you will quickly fall off the page. Consider
the following example:
IF X-VALUE = 1
<statememt-1>
ELSE
IF X-VALUE = 2
<statement-2>
ELSE
IF X-VALUE = 3
<statement-3>
ELSE
IF X-VALUE = 4
<statement-4>
ELSE
IF X-VALUE = 5
<statement-5>
etc. | IF X-VALUE = 1
<statememt-1>
ELSE
IF X-VALUE = 2
<statement-2>
ELSE
IF X-VALUE = 3
<statement-3>
ELSE
IF X-VALUE = 4
<statement-4>
ELSE
IF X-VALUE = 5
<statement-5>
etc. |
vWith the arrival
of COBOL'85 this should be written as follows:
vEVALUATE X-VALUE
WHEN 1 <statement-1>
WHEN 2 <statement-2>
WHEN 3 <statement-3>
WHEN 4 <statement-4>
WHEN 5 <statement-5>
WHEN OTHER .....
END-EVALUATE.
vHere
are even more guidelines for complex conditions:
vEnclose each individual condition
in parentheses.
vIf several conditions combine to
form a group condition, (ie. all conditions have to be true in order to make
the group condition true) then enclose the whole group in parentheses as well.
vBy having each condition on a
separate line, and by careful alignment of ANDs and ORs, it is possible to make
absolutely clear that conditions are linked or are alternatives.
These
guidelines should produce something like this:
IF ((A = 1 OR 2 OR 3)
AND
(B NOT = 4))
OR ((C = "A" OR "Z")
OR
(D < E))
<statement>
ENDIF.
This
example, however, is rapidly approaching the stage at which it becomes too
unwieldy to be maintainable. Don't be afraid to split a complex condition into
its component parts, even if it involves the use of the GO TO
statement. Don't try to prove how
clever you can be - keep it simple and straightforward.
COBOL Data Types
Introduction
There
are three categories of data item used in COBOL programs:
vVariables.
vLiterals.
vFigurative
Constants.
A
data-name or identifier is the name used to identify the area of memory
reserved for a variable. A variable is a named location in memory into which a
program can put data, and from which it can retrieve data.
Variables
Every
variable used in a COBOL program must be described in the DATA DIVISION.
In
addition to the data-name, a variable declaration also defines the type of data
to be stored in the variable. This is known as the variable's data type.
Variable Data types
Some
languages like Modula-2,Pascal or Ada
are described as being strongly typed. In these languages there are a
large number of different data types and the distinction between them is
rigorously enforced by the compiler. For instance, the compiler will reject a statement
that attempts to assign character value to an integer data item.
In
COBOL, there are really only three data types -
vNumeric
vAlphanumeric
(text/string)
vAlphabetic
The
distinction between these data types is a little blurred and only weakly enforced
by the compiler. For instance, it is perfectly possible to assign a non-numeric
value to a data item that has been declared to be numeric.
The
problem with this lax approach to data typing is that, since COBOL programs
crash (halt unexpectedly) if they attempt to do computations on items that
contain non-numeric data, it is up to the programmer to make sure this never
happens.
COBOL
programmers must make sure that non-numeric data is never assigned to numeric
items intended for use in calculations. Programmers who use strongly typed
languages don't need this level of discipline because the compiler ensures that
a variable of a particular types can only be assigned appropriate values.
Literals
A
literal is a data-item that consists only of the data-item value itself. It
cannot be referred to by a name. By definition, literals are constant
data-items.
There
are two types of literal -
vString/Alphanumeric
Literals
vNumeric
Literals
String Literals
String/Alphanumeric
literals are enclosed in quotes and consist of alphanumeric characters.
For
example: "Michael Ryan", "-123", "123.45"
Numeric Literals
Numeric
literals may consist of numerals, the decimal point, and the plus or minus
sign. Numeric literals are not enclosed in quotes.
For
example: 123, 123.45, -256, +2987.
Figurative Constants
Unlike
most other programming languages COBOL does not provide a mechanism for
creating user-defined constants but it does provide a set of special constants
called Figurative Constants.
A
Figurative Constant may be used wherever it is legal to use a literal but
unlike literals, when a Figurative Constant is assigned to a data-item it fills
the whole item overwriting everything in it.
The
Figurative Constants are:
vSPACE
or SPACES Acts like one or more spaces
vZERO
or ZEROS or ZEROES Acts like one or more zeros
vQUOTE
or QUOTES Used instead of a quotation mark
vHIGH-VALUE
or HIGH-VALUES Uses the maximum value possible
vLOW-VALUE
or LOW-VALUES Uses the minimum value possible
vALL
literal Allows a ordinary literal to act as Figurative Constant
Figurative Constant Notes
vWhen
the ALL Figurative Constant is used, it must be followed by a one character
literal. The designated literal then acts like the standard Figurative
Constants.
vZERO,
ZEROS and ZEROES are synonyms, not separate Figurative Constants. The same
applies to SPACE and SPACES, QUOTE and QUOTES, HIGH-VALUE and HIGH-VALUES, LOW-VALUES
and LOW-VALUES.
Sequential files
COBOL
is generally used in situations where the volume of data to be processed is
large. These systems are sometimes referred to as “data intensive” systems.
Generally, large volumes arise not because the data is inherently voluminous
but because the same items of information have been recorded about a great many
instances of the same object. Record-based files are used to record this
information.
Files, Records, Fields
In
record-based files;
vWe use the term file, to
describe a collection of one or more occurrences (instances) of a record type
(template).
vWe use the term record, to
describe a collection of fields which record information about an object.
vWe use the term field, to
describe an item of information recorded about an object (e.g. StudentName,
DateOfBirth).
Record instance vs Record type
It is important to distinguish between a record occurrence
(i.e. the values of a record) and the record type or template (i.e. the
structure of the record).
Each record occurrence in a file will have a different
value but every record in the file will have the same structure.
For
instance, in the student details file, illustrated below, the occurrences of
the student records are actual values in the file. The record type/template
describes the structure of each record occurrence.
The record buffer
Before
a computer can do any processing on a piece of data, the data must be loaded
into main memory (RAM). The CPU can only address data that is in RAM.
A
record-based file may consist of hundreds of thousands, millions or even tens
of millions of records, and may require gigabytes of storage. Files of this
size cannot be processed by loading the whole file into memory in one go.
Instead, files are processed by reading the records into memory, one at a time.
To
store the record read into memory and to allow access to the individual fields
of the record, a programmer must declare the record structure (see the diagram
above) in his program. The computer uses the programmer's description of the
record (the record template) to set aside sufficient memory to store one
instance of the record. The memory allocated for storing a record is usually called
a "record buffer".
A
record buffer is capable of storing the data recorded for only one instance of
the record. To process a file a program must read the records one at a time
into the record buffer. The record buffer is the only connection between the program
and the records in the file.
If
a program processes more than one file, a record buffer must be defined for
each file.
To
process all the records in an INPUT file, we must ensure that each record
instance is copied (read) from the file, into the record buffer, when required.
To
create an OUTPUT file containing data records, we must ensure that each record
is placed in the record buffer and then transferred (written) to the file.
To
transfer a record from an input file to an output file we must read the record
into the input record buffer, transfer it to the output record buffer and then
write the data to the output file from the output record buffer. This type of
data transfer between ‘buffers’ is quite common in COBOL programs.
Creating a record
To
create a record buffer large enough to store one instance of a record,
containing the information described above, we must decide on the type and size
of each of the fields.
·The student identity number is 7
digits in size so we need to declare the data-item to hold it as PIC 9(7).
·To store the student name, we will
assume that we require only 10 characters. So we can declare a data-item to
hold it as PIC X(10).
·The date of birth is 8 digits long
so we declare it as PIC 9(8).
·The course code is 4 characters
long so we declare it as PIC X(4).
·Finally, the gender is only one
character so we declare it as PIC X.
The
fields described above are individual data items but we must collect them
together into a record structure as follows;
01 StudentRec.
02 StudentId PIC 9(7).
02 StudentName PIC X(10).
02 DateOfBirth PIC 9(8).
02 CourseCode PIC X(4).
02 Gender PIC X.
The
record description above is correct as far as it goes. It reserves the correct
amount of storage for the record buffer. But it does not allow us to access all
the individual parts of the record that we might require.
For
instance, the name is actually made up of the student's surname and initials
while the date consists of 4 digits for the year, 2 digits for the month and 2
digits for the day .
To
allow us to access these fields individually we need to declare the record as
follows;
01 StudentRec.
02 StudentId PIC 9(7).
02 StudentName.
03 Surname PIC X(8).
03 Initials PIC XX.
02 DateOfBirth.
03 YOBirth PIC 9(4).
03 MOBirth PIC 99.
03 DOBirth PIC 99.
02 CourseCode PIC X(4).
02 Gender PIC X.
In
this description, StudentName is a group item consisting of Surname and
Initials, and DateOfBirth consists of YOBirth, MOBirth and DOBirth.
Declaring a record buffer in your
program
The
record type/template/buffer of every file used in a program must be described
in the FILE SECTION by means of an FD (file description) entry. The FD entry
consists of the letters FD and an internal name that the programmer assigns to
the file.
So
the full file description for the students file might be;.
DATA DIVISION.
FILE SECTION.
FD StudentFile.
01 StudentRec.
02 StudentId PIC 9(7).
02 StudentName.
03 Surname PIC X(8).
03 Initials PIC XX.
02 DateOfBirth.
03 YOBirth PIC 9(4).
03 MOBirth PIC 99.
03 DOBirth PIC 99.
02 CourseCode PIC X(4).
02 Gender PIC X.
Note
that we have assigned the name StudentFile as the internal file name. The
actual name of the file on disk is Students.Dat.
The SELECT and ASSIGN clause
Although
the name of the students file on disk is Students.Dat we are going to
refer to it in our program as StudentFile. How can we connect the name we are
going to use internally with the actual name of the program on disk?
The
internal file name used in a file's FD entry is connected to an external file
(on disk, tape or CD-ROM) by means of the SELECT and ASSIGN clause. The SELECT and
ASSIGN clause is an entry in the FILE-CONTROL paragraph in the INPUT-OUTPUT
SECTION in the ENVIRONMENT DIVISION.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT StudentFile
ASSIGN TO “STUDENTS.DAT”.
DATA DIVISION.
FILE SECTION.
FD StudentFile.
01 StudentRec.
02 StudentId PIC 9(7).
02 StudentName.
03 Surname PIC X(8).
03 Initials PIC XX.
02 DateOfBirth.
03 YOBirth PIC 9(4).
03 MOBirth PIC 99.
03 DOBirth PIC 99.
02 CourseCode PIC X(4).
02 Gender PIC X.
SELECT and ASSIGN syntax for Sequential
fil
The
Microfocus COBOL compiler recognizes two kinds of Sequential File organization
LINE SEQUENTIAL
and
RECORD SEQUENTIAL.
LINE
SEQUENTIAL files, are files in which each record is followed by the carriage
return and line feed characters. These are the kind of files produced by a text
editor such as Notepad.
RECORD
SEQUENTIAL files, are files where the file consists of a stream of bytes. Only
the fact that we know the size of each record allows us to retrieve them. Files
that are not record based, can be processed by defining them as RECORD
SEQUENTIAL.
The
ExternalFileReference can be a simple file name, or a full, or a
partial, file specification. If a simple file name is used, the drive and
directory where the program is running is assumed but we may choose to include
the full path to the file. For instance, we could associate the StudentFile
with an actual file using statements like:
SELECT StudentFile
ASSIGN TO "D:\Cobol\ExampleProgs\Students.Dat"
SELECT StudentFile
ASSIGN TO "A:\Students.Dat"
Introduction
Sequential
files are uncomplicated. To write programs that process Sequential Files you
only need to know four new verbs - the OPEN, CLOSE, READ and WRITE.
You
must ensure that (before terminating) your program closes all the files it has
opened. Failure to do so may result in data not being written to the file or
users being prevented from accessing the file.
The OPEN verb
Before your program can access the
data in an input file or place data in an output file, you must make the file
available to the program by OPENing it.
When you open a file you have to
indicate how you intend to use it (e.g. INPUT, OUTPUT, EXTEND) so that the
system can manage the file correctly. Opening a file does not transfer any data
to the record buffer, it simply provides access.
OPEN notesWhen a file is opened for INPUT or EXTEND, the file must
exist or the OPEN will fail.
When a file is opened for INPUT, the Next
Record Pointer is positioned at the beginning of the file.
When the file is opened for EXTEND,
the Next Record Pointer is positioned after the last record in the file. This
allows records to be appended to the file.
When a file is opened for OUTPUT, it
is created if it does not exist, and is overwritten, if it already exists.
The CLOSE verb
CLOSE InternalFileName...
You must ensure that, before
terminating, your program closes all the files it has opened. Failure to do so
may result in some data not being written to the file or users being prevented
from accessing the file.
The READ verb
Once the system has opened a file and
made it available to the program it is the programmers responsibility to
process it correctly. To process all the records in the file we have to
transfer them, one record at a time, from the file to the file's record buffer.
The READ is provided this purpose.
The READ copies a record
occurrence/instance from the file and places it in the record buffer.
READ notes When the READ attempts to read a record from the file and
encounters the end of file marker, the AT END is triggered and the StatementBlock
following the AT END is executed.
Using the INTO Identifier
clause, causes the data to be read into the record buffer and then copied from
there, to the Identifier, in one operation. When this option is used,
there will be two copies of the data. One in the record buffer and one in the Identifier.
Using this clause is the equivalent of executing a READ and then moving the
contents of the record buffer to the Identifier.
How the READ works
When a record is read it is copied from the backing
storage file into the record buffer in RAM. When an attempt to READ detects the
end of file the AT END is triggered and the condition name EndOfFile is set to
true. Since the condition name is set up as shown below, setting it to true
fills the whole record with HIGH-VALUES.
FD StudentFile.
01 StudentRec.
88 EndOfFile VALUE HIGH-VALUES.
02 StudentId PIC 9(7).
etc |
The WRITE verb
WRITE RecordName [FROM
Identifier]
The
WRITE verb is used to copy data from the record buffer (RAM) to the file on
backing storage (Disk, tape or CD-ROM).
To
WRITE data to a file we must move the data to the record buffer (declared in
the FD entry) and then WRITE the contents of record buffer to the file.
When
the WRITE..FROM is used the data contained in the Identifier is copied
into the record buffer and is then written to the file. The WRITE..FROM is the
equivalent of a MOVE Identifier TO RecordBuffer statement followed by a WRITE
RecordBuffer statement.
Read a file, Write a record
If
you were paying close attention to the syntax diagrams above you probably
noticed that while we READ a file, we must WRITE a record.
The
reason we read a file but write a record, is that a file can contain a number of
different types of record. For instance, if we want to update the students file
we might have a file of transaction records that contained Insertion records
and Deletion records. While the Insertion records would contain all the student
record fields, the Deletion only needs the StudentId.
When
we read a record from the transaction file we don't know which of the types
will be supplied; so we must - READ Filename. It is the programmers
responsibility to discover what type of record has been supplied.
When
we write a record to the a file we have to specify which of the record types we
want to write; so we must - WRITE RecordName.
Tables and Occurs
A powerful feature of COBOL is the use of tables,
via the "OCCURS" and "OCCURS DEPENDING ON"
clauses. This section describes COBOL Tables and the OCCURS and OCCURS
DEPENDING ON clauses, both of which cause fields or groups to repeat some
number of times. urs Depending On
Tables and the OCCURS clause
Suppose you wanted to store
your monthly sales figures for the year. You could define 12 fields, one for
each month, like this:
05 MONTHLY-SALES-1 PIC S9(5)V99.
05 MONTHLY-SALES-2 PIC S9(5)V99.
05 MONTHLY-SALES-3 PIC S9(5)V99.
...
05 MONTHLY-SALES-11 PIC S9(5)V99.
05 MONTHLY-SALES-12 PIC S9(5)V99.
But there's an easier way in COBOL. You can
specify the field once and declare that it repeats 12 times.
You do this with the OCCURS clause, like this:
05 MONTHLY-SALES OCCURS 12 TIMES PIC S9(5)V99.
(By now you should also know
this can be written on two lines like this):
05 MONTHLY-SALES OCCURS 12 TIMES
PIC S9(5)V99.
This specifies 12 fields, all
of which have the same PIC, and is called a table (also called an array).
The individual fields are referenced in COBOL by using subscripts, such
as "MONTHLY-SALES(1)". This table occupies 84 bytes in the
record (12 * (5+2)). (The sign is embedded, not separate, and the decimal is
implied.)
The OCCURS can also be at the group level, and this
is the most useful application of OCCURS. For example, all 25 line items
on an invoice (75 fields) could be held in this group:
05 LINE-ITEMS OCCURS 25 TIMES.
10 QUANTITY PIC 9999.
10 DESCRIPTION PIC X(30).
10 UNIT-PRICE PIC S9(5)V99.
Notice the OCCURS is listed at
the group level, so the entire group occurs 25 times. The order of the data in
the file is as-if you had specified multiple groups, like this:
05 LINE-ITEMS-1.
10 QUANTITY PIC 9999.
10 DESCRIPTION PIC X(30).
10 UNIT-PRICE PIC S9(5)V99.
05 LINE-ITEMS-2.
10 QUANTITY PIC 9999.
10 DESCRIPTION PIC X(30).
10 UNIT-PRICE PIC S9(5)V99.
...
05 LINE-ITEMS-25.
10 QUANTITY PIC 9999.
10 DESCRIPTION PIC X(30).
10 UNIT-PRICE PIC S9(5)V99.
There can be nested occurs -- an occurs
within an occurs. In the next example, suppose we stock ten products and
we want to keep a record of the monthly sales
of each product for the past 12 months. We could do just that with this table:
01 INVENTORY-RECORD.
05 INVENTORY-ITEM OCCURS 10 TIMES.
10 MONTHLY-SALES OCCURS 12 TIMES PIC 999.
In this case,
"INVENTORY-ITEM" is a group composed only of
"MONTHLY-SALES", which occurs 12 times for each occurrence of an
inventory item. This gives an array (table) of 10 * 12 fields. The only
information in this record are the 120 monthly sales figures -- 12 months for
each of 10 items.
We could also have a description for each item. The
description would go under the 05 level INVENTORY-ITEM group, at the 10 level,
the same as the monthly sales. Further, we could track, say, the sale
price of each item for each month. A record which will do these things
is:
01 INVENTORY-RECORD.
05 INVENTORY-ITEM OCCURS 10 TIMES.
10 ITEM-DESCRIPTION PIC X(30).
10 MONTHLY-SALES OCCURS 12 TIMES.
15 QUANTITY-SOLD PIC 999.
15 UNIT-PRICE PIC 9(5)V99.
Notice we have made MONTHLY-SALES
a group, which now contains two fields, and the whole group repeats 12 times
for each instance of INVENTORY-ITEM. This short layout has 250 fields:
two fields (QUANTITY-SOLD and UNIT-PRICE) that repeat 12 times for each
inventory item, times 10 items, plus the ITEM-DESCRIPTION field for each of the
10 items. Fields and groups can be nested several levels deep, and it's
possible to have thousands of fields in a layout only a couple pages long.
Occurs Depending On
One really great feature of
COBOL tables, and a really nasty one to convert to other languages, is the
"OCCURS DEPENDING ON". This is an OCCURS, like above, but the
number of times it occurs in a particular record can vary (between some
limits). The number of times it actually occurs in any particular record will
be given by a value in another field of that record. This creates records that
vary in size from record to record.
The OCCURS-DEPENDING-ON can include many
subordinate fields and groups, all of which occur multiple times.
Further, most compilers allow one or more (fixed) OCCURS to be nested within an
OCCURS-DEPENDING-ON, and some compilers allow multiple OCCURS-DEPENDING-ON to
be nested, or to occur in succession. This can get pretty involved, so we
will only give one simple example, that of a patient's medical
treatment-history record .
01 PATIENT-TREATMENTS.
05 PATIENT-NAME PIC X(30).
05 PATIENT-SS-NUMBER PIC 9(9).
05 NUMBER-OF-TREATMENTS PIC 99 COMP-3.
05 TREATMENT-HISTORY OCCURS 0 TO 50 TIMES
DEPENDING ON NUMBER-OF-TREATMENTS
INDEXED BY TREATMENT-POINTER.
10 TREATMENT-DATE.
15 TREATMENT-DAY PIC 99.
15 TREATMENT-MONTH PIC 99.
15 TREATMENT-YEAR PIC 9(4).
10 TREATING-PHYSICIAN PIC X(30).
10 TREATMENT-CODE PIC 99.
Here are the significant points
of this record:
vThe
name of the record is "PATIENT-TREATMENTS".
vThe
first three fields "PATIENT-NAME", "PATIENT-SS-NUMBER", and
"NUMBER-OF-TREATMENTS" occur in the fixed portion of every
record. This fixed portion is the same for every record.
vThe
TREATMENT-HISTORY group is the variable portion of the record. It can
occur from 0 to 50 times.
v""NUMBER-OF-TREATMENTS"
is a number from 0 to 50 that tells us how many times the group
TREATMENT-HISTORY occurs in this record.
vThe
value in NUMBER-OF-TREATMENTS is stored in a comp-3 packed format. This is very
common. Also very common is comp or binary format. All of these are
binary data formats.
vTREATMENT-HISTORY
is a group that is comprised of all the lower level fields beneath it. (Down to
the next 05 level, or the end of the record).
vAll the
fields and groups within TREATMENT-HISTORY occur between 0 and 50 times.
vBecause
0 is a valid number of occurrences, it is possible the variable portion of the
record is not present.
vThe
"INDEXED BY TREATMENT-POINTER" clause may or may not be
present. If present it tells the compiler the name of the variable
(TREATMENT-POINTER) to use as the index into the array. If you don't understand
this, you can safely ignore the "indexed by..." clause, unless you
are programming in COBOL.
vTREATMENT-DATE
is a group that is comprised of the day, month, and year fields beneath it.
vThese
records vary in size from 41 to 2041 bytes, and would be stored in some type of
variable length file.
No comments:
Post a Comment