Data are the principal resources of an organization.Data stored in computer systems form a hierarchy extending from a single bit to adatabase, the major record-keeping entity of a firm. Each higher rung of this hierarchy isorganized from the components below it.
Data are logically organized into:
1. Bits (characters)
Bit (Character) – a bit is the smallest unit ofdata representation (value of a bit may be a 0 or 1). Eight bits make a byte which canrepresent a character or a special symbol in a character code.
Field – a field consists of a grouping ofcharacters. A data field represents an attribute (a characteristic or quality) of someentity (object, person, place, or event).
Record – a record represents a collection ofattributes that describe a real-world entity. A record consists of fields, with each fielddescribing an attribute of the entity.
File – a group of related records. Files arefrequently classified by the application for which they are primarily used (employeefile). A primary key in a file is the field (or fields) whose valueidentifies a record among others in a data file.
Database – is an integrated collection oflogically related records or files. A database consolidates records previously stored inseparate files into a common pool of data records that provides data for manyapplications. The data is managed by systems software called database management systems(DBMS). The data stored in a database is independent of the application programs using itand of the types of secondary storage devices on which it is stored.
6.2 File Environment and its Limitations
There are three principal methods of organizing files,of which only two provide the direct access necessary in on-line systems.
Data files are organized so as to facilitate access torecords and to ensure their efficient storage. A tradeoff between these two requirementsgenerally exists: if rapid access is required, more storage is required to make itpossible.
Access to a record for reading it is theessential operation on data. There are two types of access:
1. Sequential access – is performed when records are accessed in the order they are stored. Sequential access is the main access mode only in batch systems, where files are used and updated at regular intervals.
2. Direct access – on-line processing requires direct access, whereby a record can be accessed without accessing the records between it and the beginning of the file. The primary key serves to identify the needed record.
There are three methods of file organization:
1. Sequential organization
2. Indexed-sequential organization
3. Direct organization
In sequential organization records are physicallystored in a specified order according to a key field in each record.
Advantages of sequential access:
1. It is fast and efficient when dealing with large volumes of data that need to be processed periodically (batch system).
Disadvantages of sequential access:
1. Requires that all new transactions be sorted into the proper sequence for sequential access processing.
2. Locating, storing, modifying, deleting, or adding records in the file requires rearranging the file.
3. This method is too slow to handle applications requiring immediate updating or responses.
In the indexed-sequential files method, records arephysically stored in sequential order on a magnetic disk or other direct access storagedevice based on the key field of each record. Each file contains an index that referencesone or more key fields of each data record to its storage location address.
Direct file organization provides the fastest directaccess to records. When using direct access methods, records do not have to be arranged inany particular sequence on storage media. Characteristics of the direct access methodinclude:
1. Computers must keep track of the storage location of each record using a variety of direct organization methods so that data can be retrieved when needed.
2. New transactions” data do not have to be sorted.
3. Processing that requires immediate responses or updating is easily performed.
6.3 Database Environment
A database is an organized collection of interrelateddata that serves a number of applications in an enterprise. The database stores not onlythe values of the attributes of various entities but also the relationships between theseentities. A database is managed by a database management system (DBMS), a systems softwarethat provides assistance in managing databases shared by many users.
1. Helps organize data for effective access by a variety of users with different access needs and for efficient storage.
2. It makes it possible to create, access, maintain, and control databases.
3. Through a DBMS, data can be integrated and presented on demand.
Advantages of a database management approach:
1. Avoiding uncontrolled data redundancy and preventing inconsistency
2. Program-data independence
3. Flexible access to shared data
4. Advantages of centralized control of data
6.4 Levels of Data Definition in Databases
The user view of a DBMS becomes the basis for the datemodelling steps where the relationships between data elements are identified. These datamodels define the logical relationships among the data elements needed to support a basicbusiness process. A DBMS serves as a logical framework (schema, subschema, and physical)on which to base the physical design of databases and the development of applicationprograms to support the business processes of the organization. A DBMS enables us todefine a database on three levels:
1. Schema – is an overall logical view ofthe relationships between data in a database.
2.Subschema – is a logical view ofdata relationships needed to support specific end user application programs that willaccess the database.
3.Physical – looks at how data isphysically arranged, stored, and accessed on the magnetic disks and other secondarystorage devices of a computer system.
A DBMS provides the language, called datadefinition language (DDL), for defining the database objects on the three levels.It also provides a language for manipulating the data, called the data manipulationlanguage (DML), which makes it possible to access records, change values ofattributes, and delete or insert records.
6.5 Data Models or How to RepresentRelationships between Data
A data model is a method for organizing databases onthe logical level, the level of the schema and subschemas. The main concern in such amodel is how to represent relationships among database records. The relationships amongthe many individual records in databases are based on one of several logical datastructures or models. DBMS are designed to provide end users with quick, easy access toinformation stored in databases. Three principal models include:
1. Hierarchical Structure
2. Network Structure
3. Relational Structure
Early mainframe DBMS packages used the hierarchicalstructure, in which:
1. Relationships between records form a hierarchy or tree like structure.
2. Records are dependent and arranged in multilevel structures, consisting of one root record & any number of subordinate levels.
3. Relationships among the records are one-to-many, since each data element is related only to one element above it.
4. Data element or record at the highest level of the hierarchy is called the root element. Any data element can be accessed by moving progressively downward from the root and along the branches of the tree until the desired record is located.
The network structure:
1. Can represent more complex logical relationships, and is still used by many mainframe DBMS packages.
2. Allows many-to-many relationship among records. That is, the network model can access a data element by following one of several paths, because any data element or record can be related to any number of other data elements.
The relational structure:
1. Most popular of the three database structures.
2. Used by most microcomputer DBMS packages, as well as many minicomputer and mainframe systems.
3. Data elements within the database are stored in the form of simple tables. Tables are related if they contain common fields.
4. DBMS packages based on the relational model can link data elements from various tables to provide information to users.
Evaluation of Database Structures
|Hierarchical Data Structure||Ease with which data can be stored and retrieved in structured, routine types of transactions.
Ease with which data can be extracted for reporting purposes.
Routine types of transaction processing is fast and efficiently.
|Hierarchical one-to many relationships must be specified in advance, and are not flexible. Cannot easily handle ad hoc requests for information.
Modifying a hierarchical database structure is complex.
Great deal of redundancy.
Requires knowledge of a programming language.
|Network Structure||More flexible that the hierarchical model.Ability to provide sophisticated logical relationships among the records||Network many-to-many relationships must be specified in advanceUser is limited to retrieving data that can be accessed using the established links between records. Cannot easily handle ad hoc requests for information.
Requires knowledge of a programming language.
|Relational Structure||Flexible in that it can handle ad hoc information requests.Easy for programmers to work with. End users can use this model with litter effort or training.
Easier to maintain than the hierarchical and network models.
|Cannot process large amounts of business transactions as quickly and efficiently as the hierarchical and network models.|
6.6 Relational Databases
A relational database is a collection of tables. Such adatabase is relatively easy for end users to understand. Relational databases affordflexibility across the data and are easy to understand and modify.
1. Select, which selects from a specified table the rows that satisfy a given condition.
2. Project, which selects from a given table the specified attribute values
3. Join, which builds a new table from two specified tables.
The power of the relational model derives from the joinoperation. It is precisely because records are related to one another through a joinoperation, rather than through links, that we do not need a predefined access path. Thejoin operation is also highly time-consuming, requiring access to many records stored ondisk in order to find the needed records.
6.7 SQL – A Relational Query Language
Structured Query Languages (SQL) has become aninternational standard access language for defining and manipulating data in databases. Itis a data-definition-and-management language of most well-known DBMS, including somenonrelational ones. SQL may be used as an independent query language to define the objectsin a database, enter the data into the database, and access the data. The so-calledembedded SQL is also provided for programming in procedural languages (Ahost