Monday, 23 May 2011

cobol


COBOL

COBOL, an acronym for COmmon Business Oriented Language,
COBOL History

1952
Grace Hopper, "the mother of COBOL", begins developing computer languages.
1959
The American Department of Defense (DOD) asked a group of specialists to develop a business language that met their demands.
1960
COBOL-60 (Common Business Oriented Language) is launched.
1961
First COBOL compilers are available.
1965
The momentum of COBOL success accellerates.
1968
The first COBOL standard, COBOL-68 is released.
1970
The COBOL-68 standard is accepted by The International Organization for Standardization (ISO).
1974
The COBOL-74 standard is released.
1985
The COBOL-85 standard is released.
1989
Intrinsic functions are added to the standard.
2002
The COBOL 2002 standard is released with object oriented capabilities.

Introduction

COBOL is a high-level programming language first developed by the CODASYL Committee (Conference on DataSystems Languages) in 1960. Since then, responsibility for developing new COBOL standards has been assumed by the American National Standards Institute (ANSI).
Three ANSI standards for COBOL have been produced: in 1968, 1974 and 1985. A new COBOL standard introducing object-oriented programming to COBOL, is due within the next few years.
The word COBOL is an acronym that stands for COmmon Business Oriented Language. As the the expanded acronym indicates, COBOL is designed for developing business, typically file-oriented, applications. It is not designed for writing systems programs. For instance you would not develop an operating system or a compiler using COBOL.

How widely used is COBOL?

For over four decades COBOL has been the dominant programming language in the business computing domain. In that time it it has seen off the challenges of a number of other languages such as PL1, Algol68, Pascal, Modula, Ada, C, C++. All these languages have found a niche but none has yet displaced COBOL. Two recent challengers though, Java and Visual Basic, are proving to be serious contenders.
COBOL's dominance in underlined by the reports from the Gartner group.
  • In 1997 they estimated that there were about 300 billion lines of computer code in use in the world. Of that they estimated that about 80% (240 billion lines) were in COBOL and 20% (60 billion lines) were written in all the other computer languages combined.
  • In 1999 they reported that over 50% of all new mission-critical applications were still being done in COBOL and their recent estimates indicate that through 2004-2005 15% of all new applications (5 billion lines) will be developed in COBOL while 80% of all deployed applications will include extensions to existing legacy (usually COBOL) programs.
Gartner estimates for 2002 are that there are about two million COBOL programmers world-wide compared to about about one million Java programmers and one million C++ programmers.

Surprised by COBOL's success?

People are often surprised when presented with the evidence for COBOL's dominance in the market place. The hype that surrounds some computer languages would persuade you to believe that most of the production business applications in the world are written in Java, C, C++ or Visual Basic and that only a small percentage are written in COBOL. In fact, the reverse is actually the case.
One reason for this misconception lies in the difference between the vertical and the horizontal software markets.
In the vertical software market (sometimes called "bespoke" software) applications cost many millions of dollars to produce, are tailored to a specified company, encapsulate the business rules of that company, and only a limited number of copies of the software may be in use. A good example of this kind of application is the DoD MRP II system. This system is "used to manage almost 550,000 spare and repair parts and equipment items with an inventory value of $28 billion. The system runs on Amdahl mainframes at multiple locations throughout the U.S. and contains over 4,000,000 lines of COBOL code."
In the horizontal software market, applications may still cost millions of dollars to produce but thousands, and in some cases millions, of copies of the software are in use. As a result, these applications often have a very high profile, a short life span, and a relatively low per-copy replacement cost. The Microsoft Office suite (Word, Excel, Access) is an example of an application in the horizontal software market. Because of the highly competitive nature of this marketplace considerations of speed, size and efficiency often make languages like C or C++ the language of choice for creating these applications.
Applications written for the vertical market, on the other hand, often have a low profile (because they are usually written for use in one particular company), a very high per-copy replacement cost, and consequently, a very long life span. For example, the cost of replacing COBOL code has been estimated at approximately twenty five dollars ($25) per line of code. At this rate, the cost of replacing the DoD MRP II system mentioned above, with a system written in some other language, would be some one hundred million dollars ($100,000,000). The importance of ease of maintenance often makes COBOL the language of choice for these applications.
The high visibility of horizontal applications like Microsoft Word or Excel persuades people that the languages used to write these applications are the market leaders. But however many copies of Excel are sold, it is just a single application produced by a limited number of programmers. Many more programmers are involved in coding or maintaining one off, "bespoke", applications. And these programmers generally write their programs in COBOL.

Characteristics of COBOL.

COBOL is a simple language (no pointers, no user defined functions, no user defined types) with a limited scope of function. It encourages a simple straightforward programming style. Curiously enough though, despite its limitations, COBOL has proven itself to be well suited to its targeted problem domain (business computing). Most COBOL programs operate in a domain where the program complexity lies in the business rules that have to be encoded rather than in the sophistication of the data structures or algorithms required. And in cases where sophisticated algorithms are required COBOL usually meets the need with an appropriate verb such as the SORTand the SEARCH.
We noted above that COBOL is a simple language with a limited scope of function. And that is the way it used to be but the introduction of OO-COBOL has changed all that. OO-COBOL retains all the advantages of previous versions but now includes -
      • User Defined Functions
      • Object Orientation
      • National Characters - Unicode
      • Multiple Currency Symbols
      • Cultural Adaptability (Locales)
      • Dynamic Memory Allocation (pointers)
      • Data Validation Using New VALIDATE Verb
      • Binary and Floating Point Data Types
      • User Defined Data Types

COBOL is non-proprietary (portable)

The COBOL standard does not belong to any particular vendor. The vendor independent ANSI COBOL committee legislates formal, non-vendor-specific syntax and semantic language standards. COBOL has been ported to virtually every hardware platform - from every favour of Windows, to every falser of Unix, to AS/400, VSE, OS/2, DOS, VMS, Unisys, DG, VM, and MVS.


COBOL is Maintainable

COBOL has a 30 year proven track record for application maintenance, enhancement and production support at the enterprise level. Early indications from the year 2000 problem are that COBOL applications were actually cheaper to fix than applications written in more recent languages.
Free Reciprocal Link Exchange Software - Automatic Reciprocal Link Exchange Software (Script). Rank Top 10 In Search Engines For Your Keyword. Increase Link Popularity And Get Free Targeted Traffic. Automate Your Link Exchange Like Never Before.
Cobol Programing Basics:

Introduction

This section presents the fundamentals of constructing COBOL programs. It explains the notation used in COBOL syntax diagrams and enumerates the COBOL coding rules. It shows how user-defined names are constructed and examines the structure of COBOL programs.

COBOL syntax

COBOL syntax is defined using particular notation sometimes called the COBOL MetaLanguage.
In this notation, words in uppercase are reserved words. When underlined they are mandatory. When not underlined they are "noise" words, used for readability only, and are optional. Because COBOL statements are supposed to read like English sentences there are a lot of these "noise" words.
Words in mixed case represent names that must be devised by the programmer (like data item names).
When material is enclosed in curly braces { }, a choice must be made from the options within the braces. If there is only one option then that item in mandatory.
Material enclosed in square brackets [ ], indicates that the material is optional, and may be included or omitted as required.
The ellipsis symbol ... (three dots), indicates that the preceding syntax element may be repeated at the programmer's discretion.

COBOL coding rules

Traditionally, COBOL programs were written on coding forms and then punched on to punch cards. Although nowadays most programs are entered directly into a computer, some COBOL formatting conventions remain that derive from its ancient punch-card history.
On coding forms, the first six character positions are reserved for sequence numbers. The seventh character position is reserved for the continuation character, or for an asterisk that denotes a comment line.
The actual program text starts in column 8. The four positions from 8 to 11 are known as Area A, and positions from 12 to 72 are Area B.
Although many COBOL compilers ignore some of these formatting restrictions, most still retain the distinction between Area A and Area B.
When a COBOL compiler recognizes the two areas, all division names, section names, paragraph names, FD entries and 01 level numbers must start in Area A. All other sentences must start in Area B.
In our example programs we use the compiler directive (available with the NetExpress COBOL compiler) - $ SET SOURCEFORMAT"FREE" - to free us from these formatting restrictions.
Ancient COBOL coding form

Name construction

vAll user-defined names, such as data names, paragraph names, section names condition names and mnemonic names, must adhere to the following rules:
vThey must contain at least one character, but not more than 30 characters. They must contain at least one alphabetic character. They must not begin or end with a hyphen.
vThey must be constructed from the characters A to Z, the numbers 0 to 9, and the hyphen.
vThey must not contain spaces.
vNames are not case-sensitive: TotalPay is the same as totalpay, Totalpay or TOTALPAY

The structure of COBOL programs

COBOL programs are hierarchical in structure. Each element of the hierarchy consists of one or more subordinate elements.
The hierarchy consists of Divisions, Sections, Paragraphs, Sentences and Statements.
A Division may contain one or more Sections, a Section one or more Paragraphs, a Paragraph one or more Sentences and a Sentence one or more Statements.
We can represent the COBOL hierarchy using the COBOL metalanguage as follows;
Divisions
A division is a block of code, usually containing one or more sections, that starts where the division name is encountered and ends with the beginning of the next division or with the end of the program text.

Sections
A section is a block of code usually containing one or more paragraphs. A section begins with the section name and ends where the next section name is encountered or where the program text ends.
Section names are devised by the programmer, or defined by the language. A section name is followed by the word SECTION and a period.
See the two example names below
 -
SelectUnpaidBills SECTION.
FILE SECTION.
Paragraphs
A paragraph is a block of code made up of one or more sentences. A paragraph begins with the paragraph name and ends with the next paragraph or section name or the end of the program text.
A paragraph name is devised by the programmer or defined by the language, and is followed by a period.
See the two example names below -
PrintFinalTotals.
PROGRAM-ID.
Sentences and statements
A sentence consists of one or more statements and is terminated by a period.
For example:
MOVE .21 TO VatRate
MOVE 1235.76 TO ProductCost
COMPUTE VatAmount = ProductCost * VatRate.

A statement consists of a COBOL verb and an operand or operands.
For example
:
SUBTRACT Tax FROM GrossPay GIVING NetPay
The Four Divisions
At the top of the COBOL hierarchy are the four divisions. These divide the program into distinct structural elements. Although some of the divisions may be omitted, the sequence in which they are specified is fixed, and must follow the order below.
General Layout
IDENTIFICATION DIVISION.
Contains program information
ENVIRONMENT DIVISION.
Contains environment information
DATA DIVISION.
Contains data descriptions
PROCEDURE DIVISION.
Contains the program algorithms
The IDENTIFICATION DIVISION
The IDENTIFICATION DIVISION supplies information about the program to the programmer and the compiler.
Most entries in the IDENTIFICATION DIVISION are directed at the programmer. The compiler treats them as comments.
The PROGRAM-ID clause is an exception to this rule. Every COBOL program must have a PROGRAM-ID because the name specified after this clause is used by the linker when linking a number of subprograms into one run unit, and by the CALL statement when transferring control to a subprogram.
The IDENTIFICATION DIVISION has the following structure:
IDENTIFICATION DIVISION
PROGRAM-ID. ProgramName.
[AUTHOR. ProgramerName.]
other entries here
The keywords - IDENTIFICATION DIVISION - represent the division header, and signal the commencement of the program text.
PROGRAM-ID is a paragraph name that must be specified immediately after the division header.
NameOfProgram is a name devised by the programmer, and must satisfy the rules for user-defined names.
Here's a typical program fragment:
The ENVIRONMENT DIVISION
The ENVIRONMENT DIVISION is used to describe the environment in which the program will run.
The purpose of the ENVIRONMENT DIVISION is to isolate in one place all aspects of the program that are dependant upon a specific computer, device or encoding sequence.
The idea behind this is to make it easy to change the program when it has to run on a different computer or one with different peripheral devices.
In the ENVIRONMENT DIVISION, aliases are assigned to external devices, files or command sequences. Other environment details, such as the collating sequence, the currency symbol and the decimal point symbol may also be defined here.
The DATA DIVISION
As the name suggests, the DATA DIVISION provides descriptions of the data-items processed by the program.
The DATA DIVISION has two main sections: the FILE SECTION and the WORKING-STORAGE SECTION. Additional sections, such as the LINKAGE SECTION (used in subprograms) and the REPORT SECTION (used in Report Writer based programs) may also be required.
The FILE SECTION is used to describe most of the data that is sent to, or comes from, the computer's peripherals.
The WORKING-STORAGE SECTION is used to describe the general variables used in the program.
The DATA DIVISION has the following structure and syntax:
IDENTIFICATION DIVISION.
PROGRAM-ID. SequenceProgram.
AUTHOR. XXXXX.


DATA DIVISION.
WORKING-STORAGE SECTION.
01 Num1 PIC 9 VALUE ZEROS.
01 Num2 PIC 9 VALUE ZEROS.
01 Result PIC 99 VALUE ZEROS

The PROCEDURE DIVISION

The PROCEDURE DIVISION contains the code used to manipulate the data described in the DATA DIVISION. It is here that the programmer describes his algorithm.
The PROCEDURE DIVISION is hierarchical in structure and consists of sections, paragraphs, sentences and statements.
Only the section is optional. There must be at least one paragraph, sentence and statement in the PROCEDURE DIVISION.
Paragraph and section names in the PROCEDURE DIVISION are chosen by the programmer and must conform to the rules for user-defined names.
SAMPLE COBOL PROGRAM
IDENTIFICATION DIVISION.

PROGRAM-ID. SequenceProgram.

AUTHOR. Michael Coughlan. 
DATA DIVISION.

WORKING-STORAGE SECTION.

01 Num1 PIC 9 VALUE ZEROS.

01 Num2 PIC 9 VALUE ZEROS.

01 Result PIC 99 VALUE ZEROS.
PROCEDURE DIVISION.

CalculateResult.

ACCEPT Num1.

ACCEPT Num2.

MULTIPLY Num1 BY Num2 GIVING Result.

DISPLAY "Result is = ", Result.

STOP RUN.
CONDITIONAL STATEMENTS:

Conditional Processing

This is where a lot of uneducated programmers come unstuck! Even though COBOL allows the following:
IF <condition> {THEN} <statement-1> ELSE <statement-2> {END-IF}.
There are some basic guidelines which can be applied in order to make the code more readable and easier to maintain. These are:
vEach portion (condition, ELSE, statement-1, statement-2, END-IF) should be on a separate line. This allows for future additions or deletions without having to modify more lines than is necessary.
vThe word ELSE should be aligned in exactly the same column as the IF to which it is associated. This makes the association more obvious in the listing, especially with multiple or nested IFs.
vCOBOL'85 allows each condition to be terminated with an END-IF. Its use should be encouraged as it makes it absolutely clear where each condition is supposed to end, thus avoiding the possibility of confusion and mistakes. Like the ELSE, the END-IF should be aligned in exactly the same column as IF with which it is associated.
vStatement-1 and statement-2 should be indented, usually by four character positions. This allows the IF, ELSE and END-IF to be more distinctive in the listing.
This now gives us the following construction:
IF <condition>

<statement-1>

ELSE

<statement-2>

END-IF.
Here are some extra guidelines for nested IFs:
vbFor each level of nested IF indent all associated lines by four characters. This gives the following:
IF <condition-1>

IF <condition-2>

<statement-1>

ELSE

<statement-2>

END-IF

ELSE

<statement-3>

END-IF.
vDon't ever use more than three levels of nested IF - they are extremely difficult to debug and maintain.
vRemember that each ELSE is paired with the IF that immediately precedes it in the code, not necessarily the one under which it is aligned. Take the following example:
IF <condition-1>

IF <condition-2>

<statement-2>

ELSE

<statement-1>.
According to the indentation <statement-1> is supposed to be executed if <condition-1> is false, but COBOL follows its own rules and executes <statement1> if <condition-1> is true and <condition-2> is false. This type of error is more avoidable if the END-IF is used, as in the following example:
IF <condition-1>

IF <condition-2>

<statement-2>

END-IF

ELSE

<statement-1>

END-IF.
or...
IF <condition1>

IF <condition2>

<statement-2>

ELSE

<statement-1>

END-IF

END-IF.
vIn the case where an ELSE is immediately followed by an IF without any intervening statements (ie: where only one out of a series of conditions will be TRUE) it is not necessary to indent at each new IF otherwise you will quickly fall off the page. Consider the following example:
IF X-VALUE = 1

<statememt-1> 

ELSE

IF X-VALUE = 2

<statement-2>

ELSE

IF X-VALUE = 3

<statement-3>

ELSE

IF X-VALUE = 4

<statement-4>

ELSE

IF X-VALUE = 5

<statement-5>

etc.
IF X-VALUE = 1

<statememt-1> 

ELSE

IF X-VALUE = 2

<statement-2>

ELSE

IF X-VALUE = 3

<statement-3>

ELSE

IF X-VALUE = 4

<statement-4>

ELSE

IF X-VALUE = 5

<statement-5>

etc.
vWith the arrival of COBOL'85 this should be written as follows:
vEVALUATE X-VALUE WHEN 1 <statement-1> WHEN 2 <statement-2> WHEN 3 <statement-3> WHEN 4 <statement-4> WHEN 5 <statement-5> WHEN OTHER ..... END-EVALUATE.
vHere are even more guidelines for complex conditions:
vEnclose each individual condition in parentheses.
vIf several conditions combine to form a group condition, (ie. all conditions have to be true in order to make the group condition true) then enclose the whole group in parentheses as well.
vBy having each condition on a separate line, and by careful alignment of ANDs and ORs, it is possible to make absolutely clear that conditions are linked or are alternatives.
These guidelines should produce something like this:
IF ((A = 1 OR 2 OR 3)

AND

(B NOT = 4))

OR ((C = "A" OR "Z")

OR

(D < E))

<statement>

ENDIF.
This example, however, is rapidly approaching the stage at which it becomes too unwieldy to be maintainable. Don't be afraid to split a complex condition into its component parts, even if it involves the use of the GO TO statement. Don't try to prove how clever you can be - keep it simple and straightforward.
Data Types & Variables

COBOL Data Types

Introduction

There are three categories of data item used in COBOL programs:
vVariables.
vLiterals.
vFigurative Constants.
A data-name or identifier is the name used to identify the area of memory reserved for a variable. A variable is a named location in memory into which a program can put data, and from which it can retrieve data.

Variables

Every variable used in a COBOL program must be described in the DATA DIVISION.
In addition to the data-name, a variable declaration also defines the type of data to be stored in the variable. This is known as the variable's data type.

Variable Data types

Some languages like Modula-2,Pascal or Ada are described as being strongly typed. In these languages there are a large number of different data types and the distinction between them is rigorously enforced by the compiler. For instance, the compiler will reject a statement that attempts to assign character value to an integer data item.
In COBOL, there are really only three data types -
vNumeric
vAlphanumeric (text/string)
vAlphabetic
The distinction between these data types is a little blurred and only weakly enforced by the compiler. For instance, it is perfectly possible to assign a non-numeric value to a data item that has been declared to be numeric.
The problem with this lax approach to data typing is that, since COBOL programs crash (halt unexpectedly) if they attempt to do computations on items that contain non-numeric data, it is up to the programmer to make sure this never happens.
COBOL programmers must make sure that non-numeric data is never assigned to numeric items intended for use in calculations. Programmers who use strongly typed languages don't need this level of discipline because the compiler ensures that a variable of a particular types can only be assigned appropriate values.

Literals

A literal is a data-item that consists only of the data-item value itself. It cannot be referred to by a name. By definition, literals are constant data-items.
There are two types of literal -
vString/Alphanumeric Literals
vNumeric Literals
String Literals
String/Alphanumeric literals are enclosed in quotes and consist of alphanumeric characters.
For example: "Michael Ryan", "-123", "123.45"
Numeric Literals
Numeric literals may consist of numerals, the decimal point, and the plus or minus sign. Numeric literals are not enclosed in quotes.
For example: 123, 123.45, -256, +2987.
Figurative Constants
Unlike most other programming languages COBOL does not provide a mechanism for creating user-defined constants but it does provide a set of special constants called Figurative Constants.
A Figurative Constant may be used wherever it is legal to use a literal but unlike literals, when a Figurative Constant is assigned to a data-item it fills the whole item overwriting everything in it.
The Figurative Constants are:
vSPACE or SPACES Acts like one or more spaces
vZERO or ZEROS or ZEROES Acts like one or more zeros
vQUOTE or QUOTES Used instead of a quotation mark
vHIGH-VALUE or HIGH-VALUES Uses the maximum value possible
vLOW-VALUE or LOW-VALUES Uses the minimum value possible
vALL literal Allows a ordinary literal to act as Figurative Constant
Figurative Constant Notes
vWhen the ALL Figurative Constant is used, it must be followed by a one character literal. The designated literal then acts like the standard Figurative Constants.
vZERO, ZEROS and ZEROES are synonyms, not separate Figurative Constants. The same applies to SPACE and SPACES, QUOTE and QUOTES, HIGH-VALUE and HIGH-VALUES, LOW-VALUES and LOW-VALUES.
FilesHandling:

Sequential files

COBOL is generally used in situations where the volume of data to be processed is large. These systems are sometimes referred to as “data intensive” systems. Generally, large volumes arise not because the data is inherently voluminous but because the same items of information have been recorded about a great many instances of the same object. Record-based files are used to record this information.
Files, Records, Fields
In record-based files;
vWe use the term file, to describe a collection of one or more occurrences (instances) of a record type (template).
vWe use the term record, to describe a collection of fields which record information about an object.
vWe use the term field, to describe an item of information recorded about an object (e.g. StudentName, DateOfBirth).

Record instance vs Record type

It is important to distinguish between a record occurrence (i.e. the values of a record) and the record type or template (i.e. the structure of the record).
Each record occurrence in a file will have a different value but every record in the file will have the same structure.
For instance, in the student details file, illustrated below, the occurrences of the student records are actual values in the file. The record type/template describes the structure of each record occurrence.

The record buffer

Before a computer can do any processing on a piece of data, the data must be loaded into main memory (RAM). The CPU can only address data that is in RAM.
A record-based file may consist of hundreds of thousands, millions or even tens of millions of records, and may require gigabytes of storage. Files of this size cannot be processed by loading the whole file into memory in one go. Instead, files are processed by reading the records into memory, one at a time.
To store the record read into memory and to allow access to the individual fields of the record, a programmer must declare the record structure (see the diagram above) in his program. The computer uses the programmer's description of the record (the record template) to set aside sufficient memory to store one instance of the record. The memory allocated for storing a record is usually called a "record buffer".
A record buffer is capable of storing the data recorded for only one instance of the record. To process a file a program must read the records one at a time into the record buffer. The record buffer is the only connection between the program and the records in the file.
If a program processes more than one file, a record buffer must be defined for each file.
To process all the records in an INPUT file, we must ensure that each record instance is copied (read) from the file, into the record buffer, when required.
To create an OUTPUT file containing data records, we must ensure that each record is placed in the record buffer and then transferred (written) to the file.
To transfer a record from an input file to an output file we must read the record into the input record buffer, transfer it to the output record buffer and then write the data to the output file from the output record buffer. This type of data transfer between ‘buffers’ is quite common in COBOL programs.
Declaring Records and Files:
Creating a record
To create a record buffer large enough to store one instance of a record, containing the information described above, we must decide on the type and size of each of the fields.
·The student identity number is 7 digits in size so we need to declare the data-item to hold it as PIC 9(7).
·To store the student name, we will assume that we require only 10 characters. So we can declare a data-item to hold it as PIC X(10).
·The date of birth is 8 digits long so we declare it as PIC 9(8).
·The course code is 4 characters long so we declare it as PIC X(4).
·Finally, the gender is only one character so we declare it as PIC X.
The fields described above are individual data items but we must collect them together into a record structure as follows;
01 StudentRec. 02 StudentId PIC 9(7). 02 StudentName PIC X(10). 02 DateOfBirth PIC 9(8). 02 CourseCode PIC X(4). 02 Gender PIC X.
The record description above is correct as far as it goes. It reserves the correct amount of storage for the record buffer. But it does not allow us to access all the individual parts of the record that we might require.
For instance, the name is actually made up of the student's surname and initials while the date consists of 4 digits for the year, 2 digits for the month and 2 digits for the day .
To allow us to access these fields individually we need to declare the record as follows;
01 StudentRec. 02 StudentId PIC 9(7). 02 StudentName. 03 Surname PIC X(8). 03 Initials PIC XX. 02 DateOfBirth. 03 YOBirth PIC 9(4). 03 MOBirth PIC 99. 03 DOBirth PIC 99. 02 CourseCode PIC X(4). 02 Gender PIC X.
In this description, StudentName is a group item consisting of Surname and Initials, and DateOfBirth consists of YOBirth, MOBirth and DOBirth.
Declaring a record buffer in your program
The record type/template/buffer of every file used in a program must be described in the FILE SECTION by means of an FD (file description) entry. The FD entry consists of the letters FD and an internal name that the programmer assigns to the file.
So the full file description for the students file might be;.
DATA DIVISION. FILE SECTION. FD StudentFile. 01 StudentRec. 02 StudentId PIC 9(7). 02 StudentName. 03 Surname PIC X(8). 03 Initials PIC XX. 02 DateOfBirth. 03 YOBirth PIC 9(4). 03 MOBirth PIC 99. 03 DOBirth PIC 99. 02 CourseCode PIC X(4). 02 Gender PIC X.
Note that we have assigned the name StudentFile as the internal file name. The actual name of the file on disk is Students.Dat.

The SELECT and ASSIGN clause
Although the name of the students file on disk is Students.Dat we are going to refer to it in our program as StudentFile. How can we connect the name we are going to use internally with the actual name of the program on disk?
The internal file name used in a file's FD entry is connected to an external file (on disk, tape or CD-ROM) by means of the SELECT and ASSIGN clause. The SELECT and ASSIGN clause is an entry in the FILE-CONTROL paragraph in the INPUT-OUTPUT SECTION in the ENVIRONMENT DIVISION.
ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT StudentFile ASSIGN TO “STUDENTS.DAT”. DATA DIVISION. FILE SECTION. FD StudentFile. 01 StudentRec. 02 StudentId PIC 9(7). 02 StudentName. 03 Surname PIC X(8). 03 Initials PIC XX. 02 DateOfBirth. 03 YOBirth PIC 9(4). 03 MOBirth PIC 99. 03 DOBirth PIC 99. 02 CourseCode PIC X(4). 02 Gender PIC X.
SELECT and ASSIGN syntax for Sequential fil
The Microfocus COBOL compiler recognizes two kinds of Sequential File organization
LINE SEQUENTIAL and RECORD SEQUENTIAL.
LINE SEQUENTIAL files, are files in which each record is followed by the carriage return and line feed characters. These are the kind of files produced by a text editor such as Notepad.
RECORD SEQUENTIAL files, are files where the file consists of a stream of bytes. Only the fact that we know the size of each record allows us to retrieve them. Files that are not record based, can be processed by defining them as RECORD SEQUENTIAL.
The ExternalFileReference can be a simple file name, or a full, or a partial, file specification. If a simple file name is used, the drive and directory where the program is running is assumed but we may choose to include the full path to the file. For instance, we could associate the StudentFile with an actual file using statements like:
SELECT StudentFile 

ASSIGN TO "D:\Cobol\ExampleProgs\Students.Dat"



SELECT StudentFile 

ASSIGN TO "A:\Students.Dat"
File Handling Verbs:

Introduction

Sequential files are uncomplicated. To write programs that process Sequential Files you only need to know four new verbs - the OPEN, CLOSE, READ and WRITE.
You must ensure that (before terminating) your program closes all the files it has opened. Failure to do so may result in data not being written to the file or users being prevented from accessing the file.
The OPEN verb
Before your program can access the data in an input file or place data in an output file, you must make the file available to the program by OPENing it.
When you open a file you have to indicate how you intend to use it (e.g. INPUT, OUTPUT, EXTEND) so that the system can manage the file correctly. Opening a file does not transfer any data to the record buffer, it simply provides access.
OPEN notesWhen a file is opened for INPUT or EXTEND, the file must exist or the OPEN will fail.
When a file is opened for INPUT, the Next Record Pointer is positioned at the beginning of the file.
When the file is opened for EXTEND, the Next Record Pointer is positioned after the last record in the file. This allows records to be appended to the file.
When a file is opened for OUTPUT, it is created if it does not exist, and is overwritten, if it already exists.

The CLOSE verb
CLOSE InternalFileName...
You must ensure that, before terminating, your program closes all the files it has opened. Failure to do so may result in some data not being written to the file or users being prevented from accessing the file.

The READ verb

Once the system has opened a file and made it available to the program it is the programmers responsibility to process it correctly. To process all the records in the file we have to transfer them, one record at a time, from the file to the file's record buffer. The READ is provided this purpose.
The READ copies a record occurrence/instance from the file and places it in the record buffer.
READ notes When the READ attempts to read a record from the file and encounters the end of file marker, the AT END is triggered and the StatementBlock following the AT END is executed.
Using the INTO Identifier clause, causes the data to be read into the record buffer and then copied from there, to the Identifier, in one operation. When this option is used, there will be two copies of the data. One in the record buffer and one in the Identifier. Using this clause is the equivalent of executing a READ and then moving the contents of the record buffer to the Identifier.
How the READ works
When a record is read it is copied from the backing storage file into the record buffer in RAM. When an attempt to READ detects the end of file the AT END is triggered and the condition name EndOfFile is set to true. Since the condition name is set up as shown below, setting it to true fills the whole record with HIGH-VALUES.
FD StudentFile.

01 StudentRec.

88 EndOfFile VALUE HIGH-VALUES.



02 StudentId PIC 9(7).

etc
The WRITE verb
WRITE RecordName [FROM Identifier]
The WRITE verb is used to copy data from the record buffer (RAM) to the file on backing storage (Disk, tape or CD-ROM).
To WRITE data to a file we must move the data to the record buffer (declared in the FD entry) and then WRITE the contents of record buffer to the file.
When the WRITE..FROM is used the data contained in the Identifier is copied into the record buffer and is then written to the file. The WRITE..FROM is the equivalent of a MOVE Identifier TO RecordBuffer statement followed by a WRITE RecordBuffer statement.
Read a file, Write a record
If you were paying close attention to the syntax diagrams above you probably noticed that while we READ a file, we must WRITE a record.
The reason we read a file but write a record, is that a file can contain a number of different types of record. For instance, if we want to update the students file we might have a file of transaction records that contained Insertion records and Deletion records. While the Insertion records would contain all the student record fields, the Deletion only needs the StudentId.
When we read a record from the transaction file we don't know which of the types will be supplied; so we must - READ Filename. It is the programmers responsibility to discover what type of record has been supplied.
When we write a record to the a file we have to specify which of the record types we want to write; so we must - WRITE RecordName.
Tables OR Array:
Tables and Occurs
A powerful feature of COBOL is the use of tables, via the "OCCURS" and "OCCURS DEPENDING ON" clauses. This section describes COBOL Tables and the OCCURS and OCCURS DEPENDING ON clauses, both of which cause fields or groups to repeat some number of times. urs Depending On

Tables and the OCCURS clause

Suppose you wanted to store your monthly sales figures for the year. You could define 12 fields, one for each month, like this:
   05  MONTHLY-SALES-1    PIC S9(5)V99.


   05  MONTHLY-SALES-2    PIC S9(5)V99.



   05  MONTHLY-SALES-3    PIC S9(5)V99.



   ...



   05  MONTHLY-SALES-11   PIC S9(5)V99.



   05  MONTHLY-SALES-12   PIC S9(5)V99.
But there's an easier way in COBOL. You can specify the field once and declare that it repeats 12 times.
You do this with the OCCURS clause, like this:
   05  MONTHLY-SALES  OCCURS 12 TIMES  PIC S9(5)V99.
(By now you should also know this can be written on two lines like this):
   05  MONTHLY-SALES  OCCURS 12 TIMES  

                                   PIC S9(5)V99.
This specifies 12 fields, all of which have the same PIC, and is called a table (also called an array). The individual fields are referenced in COBOL by using subscripts, such as "MONTHLY-SALES(1)". This table occupies 84 bytes in the record (12 * (5+2)). (The sign is embedded, not separate, and the decimal is implied.)
The OCCURS can also be at the group level, and this is the most useful application of OCCURS. For example, all 25 line items on an invoice (75 fields) could be held in this group:
   05  LINE-ITEMS OCCURS 25 TIMES.

       10  QUANTITY            PIC 9999.

       10  DESCRIPTION         PIC X(30).

       10  UNIT-PRICE          PIC S9(5)V99.
Notice the OCCURS is listed at the group level, so the entire group occurs 25 times. The order of the data in the file is as-if you had specified multiple groups, like this:
   05  LINE-ITEMS-1.
       10  QUANTITY            PIC 9999.
       10  DESCRIPTION         PIC X(30).
       10  UNIT-PRICE          PIC S9(5)V99.

   05  LINE-ITEMS-2.
       10  QUANTITY            PIC 9999.
       10  DESCRIPTION         PIC X(30).
       10  UNIT-PRICE          PIC S9(5)V99.

      ...

   05  LINE-ITEMS-25.
       10  QUANTITY            PIC 9999.
       10  DESCRIPTION         PIC X(30).
       10  UNIT-PRICE          PIC S9(5)V99.
There can be nested occurs -- an occurs within an occurs. In the next example, suppose we stock ten products and
we want to keep a record of the monthly sales of each product for the past 12 months. We could do just that with this table:
   01  INVENTORY-RECORD.

       05  INVENTORY-ITEM OCCURS 10 TIMES.

           10  MONTHLY-SALES OCCURS 12 TIMES  PIC 999.
In this case, "INVENTORY-ITEM" is a group composed only of "MONTHLY-SALES", which occurs 12 times for each occurrence of an inventory item. This gives an array (table) of 10 * 12 fields. The only information in this record are the 120 monthly sales figures -- 12 months for each of 10 items.
We could also have a description for each item. The description would go under the 05 level INVENTORY-ITEM group, at the 10 level, the same as the monthly sales. Further, we could track, say, the sale price of each item for each month. A record which will do these things is:
   01  INVENTORY-RECORD.

       05  INVENTORY-ITEM OCCURS 10 TIMES.

           10  ITEM-DESCRIPTION               PIC X(30).

           10  MONTHLY-SALES OCCURS 12 TIMES.

               15  QUANTITY-SOLD              PIC 999.

               15  UNIT-PRICE                 PIC 9(5)V99.
Notice we have made MONTHLY-SALES a group, which now contains two fields, and the whole group repeats 12 times for each instance of INVENTORY-ITEM. This short layout has 250 fields: two fields (QUANTITY-SOLD and UNIT-PRICE) that repeat 12 times for each inventory item, times 10 items, plus the ITEM-DESCRIPTION field for each of the 10 items. Fields and groups can be nested several levels deep, and it's possible to have thousands of fields in a layout only a couple pages long.

Occurs Depending On

One really great feature of COBOL tables, and a really nasty one to convert to other languages, is the "OCCURS DEPENDING ON". This is an OCCURS, like above, but the number of times it occurs in a particular record can vary (between some limits). The number of times it actually occurs in any particular record will be given by a value in another field of that record. This creates records that vary in size from record to record.
The OCCURS-DEPENDING-ON can include many subordinate fields and groups, all of which occur multiple times. Further, most compilers allow one or more (fixed) OCCURS to be nested within an OCCURS-DEPENDING-ON, and some compilers allow multiple OCCURS-DEPENDING-ON to be nested, or to occur in succession. This can get pretty involved, so we will only give one simple example, that of a patient's medical treatment-history record .
   01  PATIENT-TREATMENTS.

       05  PATIENT-NAME                PIC X(30).

       05  PATIENT-SS-NUMBER           PIC 9(9).

       05  NUMBER-OF-TREATMENTS        PIC 99 COMP-3.

       05  TREATMENT-HISTORY OCCURS 0 TO 50 TIMES

              DEPENDING ON NUMBER-OF-TREATMENTS

              INDEXED BY TREATMENT-POINTER.

           10  TREATMENT-DATE.

               15  TREATMENT-DAY        PIC 99.

               15  TREATMENT-MONTH      PIC 99.

               15  TREATMENT-YEAR       PIC 9(4).

           10  TREATING-PHYSICIAN       PIC X(30).

           10  TREATMENT-CODE           PIC 99.
Here are the significant points of this record:
vThe name of the record is "PATIENT-TREATMENTS".
vThe first three fields "PATIENT-NAME", "PATIENT-SS-NUMBER", and "NUMBER-OF-TREATMENTS" occur in the fixed portion of every record. This fixed portion is the same for every record.
vThe TREATMENT-HISTORY group is the variable portion of the record. It can occur from 0 to 50 times.
v""NUMBER-OF-TREATMENTS" is a number from 0 to 50 that tells us how many times the group TREATMENT-HISTORY occurs in this record.
vThe value in NUMBER-OF-TREATMENTS is stored in a comp-3 packed format. This is very common. Also very common is comp or binary format. All of these are binary data formats.
vTREATMENT-HISTORY is a group that is comprised of all the lower level fields beneath it. (Down to the next 05 level, or the end of the record).
vAll the fields and groups within TREATMENT-HISTORY occur between 0 and 50 times.
vBecause 0 is a valid number of occurrences, it is possible the variable portion of the record is not present.
vThe "INDEXED BY TREATMENT-POINTER" clause may or may not be present. If present it tells the compiler the name of the variable (TREATMENT-POINTER) to use as the index into the array. If you don't understand this, you can safely ignore the "indexed by..." clause, unless you are programming in COBOL.
vTREATMENT-DATE is a group that is comprised of the day, month, and year fields beneath it.
vThese records vary in size from 41 to 2041 bytes, and would be stored in some type of variable length file.

No comments:

Post a Comment