Internship Report


Laslo Bokor

Summer 2001


What is a bar code? 

Bar codes are machine-readable symbols made of patterns of black and white bars and spaces, or in some cases checkerboard-like grids, which represent numbers, letters or characters. These bars and spaces together are named "elements".

There is a mystique surrounding bar codes, which intimidates many people. But there isnít too much mystery to it: the data in a bar code is just a reference number, which the computer uses to look up associated computer disk record(s), which contain descriptive data and other pertinent information.The bar code itself usually doesn't contain descriptive data, (like social security numbers, car's license plate numbers, addresses, names). Bar codes found on food items at grocery stores don't contain the price or description of the food item; instead the bar code has a "product number" (12 digits) in it. When read by a bar code reader and transmitted to the computer, the computer finds the disk file item record(s) associated with that item number. In the disk file is the price, vendor name, quantity on-hand, description, etc. The computer does a "price lookup" by reading the bar code, and then it creates a register of the items and adds the price to the subtotal of the groceries purchased. (It also subtracts the quantity from the "on-hand" total.)
So, bar codes typically have only ID data in them; the ID data is used by the computer to look up all the pertinent detailed data associated with the ID data.

Bar code history

Long before bar codes and scanners were actually invented, people knew they desperately needed something like them. Punch cards, first developed for the 1890 U.S. Census, seemed to offer some early hope. In 1932 a business student named Wallace Flint wrote a master's thesis in which he envisioned a supermarket where customers would pierce cards to mark their selections. At the checkout counter they would insert them into a reader, which would activate machinery to bring the purchases to them on conveyor belts. Store management would also have a record of what was being bought. The problem was that the card reading equipment of the day was very expensive and inaccurate.

The first step toward today's bar codes came in 1948, when Woodland, a graduate student, overheard a conversation in the halls of Philadelphia's Drexel Institute of Technology. The president of a food chain was pleading with one of the deans to undertake research on capturing product information automatically at checkout.Woodlandís first idea was to use patterns of ink that would glow under ultraviolet light, and he built a device to test the concept. It worked, but he encountered problems ranging from ink instability to printing costs. Nonetheless, Woodland was convinced he had a workable idea. After several months of work he came up with the linear bar code, using elements from two established technologies: movie soundtracks and Morse code. He decided to replace his wide and narrow vertical lines with concentric circles, so that they could be scanned from any direction. This became known as the bull's eye code

In 1951 Woodland got a job at IBM, where he hoped his scheme would flourish. The following year he built the first actual bar code reader in the living room of his home in Binghamton, New York. The device was the size of a desk and had to be wrapped in black oilcloth to keep out ambient light. It relied on two key elements: a five-hundred-watt incandescent bulb as the light source and an RCA 935 photo-multiplier tube, designed for movie sound systems, as the reader. Woodland hooked the 935 tube up to an oscilloscope. Then he moved a piece of paper marked with lines across a thin beam emanating from the light source. The reflected beam was aimed at the tube. At one point the heat from the powerful bulb set the paper smoldering. Nonetheless, Woodland got what he wanted. As the paper moved, the signal on the oscilloscope jumped. He had created a device that could electronically read printed material.

The use of bar coding has been growing dramatically over the last 15 years. With the adoption of UPC as the standard for retail grocery stores in the late 70's, bar codes have become an everyday experience for most people. Bar codes are a fast, easy, and accurate data entry method. The correct use of bar codes can decrease employee time required and increase an organization's efficiency.

One thing to remember with bar codes: the application software that accepts the bar code data is in 95% control of the success or failure of an application. The bar codes are just another data input method; what you do with the data is most important. With the introduction of the IBM PC in the early 80's, bar coding applications expanded along with the PC explosion. Worth Data was and is a pioneer in providing bar code hardware and printing software to the PC (and Macintosh) user.


Why use a barcode?

Bar codes are accurate. They eliminate manual data entry errors. Research has shown that the error rate due to bar code misreads is less than one thousandth of one percent. Tests have shown that bar coded information had a throughput accuracy rate of 1 error in 10,000,000 characters. Compare that to keyboard entry error rates of 1 error in 100 characters.Bar codes speed data entry. Even with a simple wand, a bar code can be scanned in a fraction of the time it takes to enter the information manually. CCD and laser scanners are also available for even faster data entry.

Bar codes can be produced easily and cheaply. Bar codes can be printed on most computer printers, for the cost of ink and paper. Even a low cost dot matrix printer can produce bar codes of adequate quality.


How does the scanner work?

Scanners are the devices that read bar codes. A scanner shoots pulses of light. If it falls on a light area, a zero (0) is read. If it falls on a dark area, it reads a one (1). Scanning the bar code generates a string of zeros and ones. This pattern of zeros and ones represents the characters encoded. The scanner software, or firmware, translates or decodes the strings into characters.

The scanner must be able to shoot a straight line across the bars and spaces. The taller the bars the greater the angle and the greater the chances of getting a good reading. The shorter the bars the less likely the scanner will be able to shoot a straight line through the bars and spaces.

What Does the Bar Code Represent?

No matter which bar code is used, the information encoded in the bars and spaces may be displayed above or below the bars. Since this is the aspect understandable to us, the characters are referred to as human readables. The bars and spaces are readable by machine.

UPC (A) is just one of several bar code symbologies. In the typical format, each of the elements of the bar code symbol represent predefined information.

The system digit and the manufacturer number are assigned by the Uniform Code Council, Inc. for UPC (Universal Product Code) in the United States and Canada. UPC is a subset of EAN (European Article Number), the international product code standard throughout the rest of the world. The product identification number is assigned by the manufacturer. The check digit is used to check the data that is read.

Bar Code Structure, Elements of the Bar code

No matter which symbology we are using, all bar codes share elements that make up the symbol. These are the bars and spaces, the human readables, and the quiet zone. In addition, a symbology may be either Discrete or Continuous.

Bars and Spaces

The bars and spaces determine the pattern of the encoded data. Each symbology represents a different strategy behind the creation of these patterns such as: being as condense as possible, printing as easily as possible, being as easy to decipher as quickly as possible, etc. Each bar code has slightly different quiet zone requirements. For example, the quiet zone of Code 39 is ten times the width of the thinnest bar/space or 0.25 inches, whichever is greater.


The human-readable is the data represented by the bars and spaces printed as text for people to read. For example the actual encoded data can be 3*35353*2.

The Quiet Zone

The quiet zone is the clear area (free from marks) before and after the bars and spaces. Having a quiet zone is as important to readability as the bars and spaces. Scanners need to establish values for the quiet zone before they can evaluate the bars and spaces. Reading the color and reflectance of the quiet zone establishes how the spaces will read and determines the difference between the spaces and the bars. Bar code cannot be read without a quiet zone.

What is a ďChecksumĒ ?

Checksums are additional characters appended to bar codes to guarantee good reads. Checksums are necessary on some bar codes that are prone to errors. For example, Interleaved 2 of 5 is a very dense, numeric-only bar code, but it is prone to substitution errors. We should always use a checksum on this code. Other codes, such as Code 128 and Code 39, are self-checking and seldom require a checksum.

The USPS adds a "checksum" digit at the end ZIP+4 barcode. The checksum is calculated the following way: sum up the numbers of the 9 digits of the ZIP+4 code and deduct this sum from the next greater number that is a multiple of 10. This is done so that in case there is a mistake in reading one of the digits, the checksum will help us recover that digit.

Here is an example: our ZIP+4 code is 14228-2583. The postnet code corresponding to this would have 52 bars: 

"framing" bars (one tall bar on each end) 
bars for the 9 digits of the ZIP+4 code 

14228-2583 (each digit takes 5 bars) 
bars for the checksum digit, which is 5

1+4+2+2+8+2+5+8+3 = 35; adding 5 to this makes the
total 40, a multiple of 10. 

Bar code technology is millions of times more accurate than typing when it comes to entering information into the computer. Check digits make the systems even more accurate. Tests have shown that operators may do 10,000,000 entries between errors when using check digits.

What's the most popular Bar Code Format?

The most popular BarCode format is the UPC (Universal Product Code) Format, which we find in all supermarket products. Available since the early 1970's this format is known worldwide, and is universally recognized.

For Automatic Identification Applications, however, BarCode CODE 39 Format is the de facto standard for Government, Manufacturing, BarCode Industry, Education, and Business applications. The popularity of the CODE 39 Format is based on several factors, which include: ease of use, ability to code numbers and letters, flexible word length capability (can generate BarCodes with any number of characters), and universal reading capability (BarCode equipment from any manufacturer can read this code).

UPC, EAN, Bookland, & ISSN

What is a UPC Bar Code Format? 

The UPC BarCode Format is the standard BarCode Format for items that are for sale to the public. Probably the largest user of the UPC code is your local supermarket. The UPC BarCode Format is used to encode a 12 digit number. The first number is the number system character, the next five are the manufacturer number, the next five are the product number, and the last digit is the checksum character. This BarCode Format only encodes numeric information and must have 12 characters in length (exactly). EAN and JAN symbols are used in Europe and Japan respectively.  Bookland symbols, based on ISBN numbers, are used on books.  ISSN bar codes are used on non-U.S. periodicals.  All of these symbologies are numeric-only, have a fixed length, and one or more check digits.

Code 39 

The CODE 39 BarCode Format (aka: 3 of 9) is the most commonly used BarCode Format because it enables numbers, upper case letters, and some punctuation marks (Capital Letters A-Z, Numbers 0-9, the "space" character, and the symbols:-,+,/,$,.,%) to be BarCoded. CODE 39 is a variable length format, allowing for encoding any number of digits. This format has become the standard for Government, Manufacturing, BarCode Industry, Education, and Business applications.


POSTNET bar codes are used to encode ZIP codes on U.S. mail.  Unlike other bar codes, POSTNET symbols consist of bars that vary in height, not width.  A check digit is appended to the bar code, which can be used for 5-digit ZIP codes, 9-digit ZIP+4 codes, or the newer 11-digit delivery point barcodes.

Code 128 

The CODE 128 BarCode Format is a very compact BarCode for codes with all numeric information. Alphanumeric information can also be encoded, but at the expense of loosing the "very compact" benefit. The compact size of the BarCode printed with the CODE 128 when using only numeric digits is achived by using "double density" (two numbers are included in one character width). When alphanumeric data is encoded, however, CODE 128 uses "single density", and the BarCodes are twice as long. This is not a simple BarCode Format to use, as there are several CODE 128 subsets, each with specific specifications and limitations.

Interleaved 2 of 5 & 2 of 5 

The Interleaved 2 of 5 BarCode Format (aka: CODE 25) is a numeric only code that prints out a little larger than the UPC BarCode when ten digits are encoded. The Interleaved 2 of 5 is an excellent choice for numeric only applications, because it has the flexibility of having from 2 to 30 digits.

PDF417 and other 2D bar codes 

2D (two dimensional) symbologies are extremely dense bar codes that look like a crossword puzzle or a honeycomb-like matrix.  PDF417 has emerged as the 2D bar code of choice.  The other popular 2D symbology is Maxicode used by United Parcel Service.  Because PDF417 encodes up to 1108 bytes of information, it is really a portable data file (PDF), as opposed to simply being a pointer into an external database.

What's the benefit of using Bar Codes?

The benefits of using BarCodes for automated data collection are very simple: speed and accuracy. Time after time, it has been proven that entering BarCode data is at least 100 times faster and more accurate than traditional manual keyboard entry, which translates into a dramatic increase in efficiency and productivity for any operation.

2 Dimensional Bar Code

For years one-dimensional bar codes have been promoted as a machine-readable license plates. The car license plate by itself doesnít mean much, but when entered into a motor vehicle database, can bring back all sorts of information regarding the car and its owner. One-dimensional bar codes seldom represent more than a dozen characters. Each label contains a unique serial number coded in black and white bars that is a key into a database containing detailed information. But many end users wanted to code more information. They wanted the bar code to be a portable database rather than just a database key. Thatís when 2-D bar code came along. 2-D bar code technology should be thought of as the complementary to the traditional 1-D scanning technology, not its replacement.

The first truly two-dimensional bar code was introduced by Intermec Corporation in 1988 when they announced Code 49. Since Code 49's introduction, six other coded have either been invented or have been redesigned to meet the need to place a portable database in as little space as possible.

Ordinary, one-dimensional bar code is "vertically redundant", meaning that the same information is repeated vertically. The heights of the bars can be truncated without any loss of information. However, the vertical redundancy allows a symbol with printing defects, such as spots or voids, to still be read. The higher the bar heights, the more probability that at least one path along the bar code will be readable.

A two-dimensional code stores information along the height as well as the length of the symbol. In fact, all human alphabets are two-dimensional codes, over a thousand alphanumeric characters can be placed in a single symbol the size of a large postage stamp. Since both dimensions contain information, at least some of the vertical redundancy is gone. Other techniques must be used to prevent misreads and to produce an acceptable read rate. Misread prevention is relatively easy. Most two-dimensional codes use check words to insure accurate reading.

Two-dimensional code systems have become more feasible with the increased use of moving laser beam scanners, and Charge Coupled Device (CCD) scanners. The 2-D symbol can be read with hand held moving beam scanners by sweeping the horizontal beam down the symbol. This way of reading such a symbol is similar to the way 1D bar code is read. The speed of sweep, resolution of the scanner, and symbol/reader distance are mutually important in both technologies.

Initially, two-dimensional bar codes were developed for applications where only a small amount of space was available for an automatic ID symbol. The first application for such symbols was unit-dose packages in the healthcare industry. These packages were small and had little room to place a bar code. The electronics industry also showed an early interest in very high-density bar codes, since free space on electronics assemblies was scarce.

More recently, the ability to encode a portable database has made two-dimensional bar codes attractive in applications where space is not at a premium. One example is storing name, address, and demographic information on direct mail business reply cards. A good direct mail response is often less than two percent. If the return card is only coded with a 1D bar code, the few replies must be checked against a very large database, perhaps millions of names. This can be quite expensive in computer time. If all the important information is printed in two-dimensional code at the time the mailing label is printed, there is very little additional cost, and a potential for great savings when the cards are returned. Similar savings can occur in field service applications where servicing data is stored in a 2-D symbol on equipment. The field engineer uses a portable reader to get the information rather than dialing up the home office's computer.

One of the amazing and beneficial aspects of two-dimensional symbols is their potential durability. To distort the readability of a 1D symbol, one only has to add another bar to the beginning or end of the symbol or draw a line through the symbol, parallel to the stripes. This throws off the checks and balances built into the decoding algorithms of a 1D decoder and makes the symbol unreadable. By comparison, many degrees of redundancy can be built into a 2D symbol. While it makes the symbol somewhat larger, the remaining symbol is remarkably secure. The symbol can take lot of abuse and it will remain readable. For example: punching it with holes, tearing it or using a black marker.

Reading/Scanning of 2-D Bar Codes

The reading (scanning) of 2-D codes is accomplished using different scanners than those made to scan 1-D symbols. Two strategies are currently utilized. The first, and most common, utilizes a moving laser beam scanner that not only sweeps back and forth across the symbol, but also up and down in what is termed a "raster" pattern.Alternately, CCD (charge coupled device) scanners are utilized. CCD scanners use a two-dimensional array of photo sensors to scan the image in its entirety.

Two-dimensional scanners were far more expensive than 1-D scanners when introduced in 1994. Recent microprocessor developments have brought the cost of 2-D scanners down to about 125% of the cost of a comparable 1-D scanner. Also, advancing decoding algorithms have made scanning quicker and easier and provided even greater readability of excessively damaged symbols.

2-D Applications

Here are some example applications for 2D scanning:

Packing List: Trading partners agree on a standard methodology for encoding shipping information in a 2D symbol, attached to a shipped order. Order data (PO number, shipping date, product codes, quantities, etc.) can automatically be entered into the receiverís receiving computer terminal in a couple of seconds.

Driverís License - The driverís name, address, license number, expiration date and driving restriction codes are encoded in a 2D symbol that is printed on the operatorís license. Police officers, car rental agencies, hotels, etc. can easily enter the information regarding the license holder, with the virtual possibility of no miss-keyed characters.

Patient Record - On a hospital patientís chart record is a 2D symbol, encoding their name, health care number, doctorís name, date of admission, allergies, etc. When direct care is given to the patient, the caregiver or doctor records the action by scanning the bar code. Also, the bar code is scanned when medication is administered and the possibility of giving a patient the wrong medicine is virtually eliminated.

2-D Bar Code Symbologies:

There are well over 20 different 2-D bar code symbologies available today. They fall into two categories: matrix and stacked. Here are some examples:

Array Tag

ArrayTag was invented by Dr. Warren D. Little of the University of Victoria and is a proprietary code. The symbol is made up of elemental hexagonal symbols with a patented complementing border, which are printed either alone or in sequenced groups. Array tags can encode hundreds of characters and can be read at distances up to 50 meters and is optimized for reading at a distance or in variable lighting situations. The principle application of the code is to track logs and lumber. 

Aztec Code

Aztec Code was invented by Andy Longacre in 1995. Aztec Code was designed for ease-of-printing and ease-of-decoding. The symbols are square overall on a square grid with a square central bulls eye finder. The smallest Aztec Code symbol is 15x15 modules square, and the largest is 151x151. The smallest Aztec Code symbol encodes 13 numeric or 12 alphabetic characters, while the largest Aztec Code symbol encodes 3832 numeric or 3067 alphabetic characters or 1914 bytes of data. No quiet zone is required outside the bounds of the symbol.

Small Aztec Code

Small Aztec Code is a special space-saving version of Aztec Code for encoding shorter messages (up to 95 characters). Space is saved by removing one set of rings from the finder pattern, eliminating the reference grid, and using a shorter mode message which limits the symbols to four data layers; otherwise, the encoding rules are generally the same as for standard Aztec Code.


Codablock is a stacked symbology. It was invented by Heinrich Oehlmann and was originally a stack of Code 39 symbols.Each Codablock symbol contains from 1 to 22 rows. The number of characters per row is a function of the x-dimension of the symbol. In other words, each row can contain a variable amount of characters. Each symbol has a start and stop bar group that extends the height of the symbol. Each row has a two-character row indicator, and the last row of the symbol has an optional check digit.The advantage of this code is that it can be read by moving beam laser scanners with very little modification. Codablock was adopted by German blood banks for the identification of blood.

Code 1

Code 1 was invented by Ted Williams in 1992 and is the earliest public domain matrix symbology. It uses a finder pattern of horizontal and vertical bars crossing the middle of the symbol. The symbol can encode ASCII data, error correction data, function characters, and binary encoded data. There are 8 sizes ranging from code 1A to code 1H. Code 1A can hold 13 alphanumeric characters or 22 digits while code 1H can hold 2218 alphanumeric characters or 3550 digits. The largest symbol version measures 134x wide by 148x high. The code itself can be made into many shapes such as an L, U, or T form. Code 1 is currently used in the health care industry for medicine labels and the recycling industry to encode container content for sorting.

Code 16K 

The code is a continuous, variable-length symbology that can encode the complete ASCII 128-character set. The minimum value of the x-dimension is 7.5 mils for a symbol to be read by an unknown reader. Minimum bar height is 8 times the x-dimension The maximum data density is 208 alphanumeric characters per square inch. In the health care industry for example, a Code 16K symbol printed with a 7.5 mil x-dimension including a flag character, a 10 digit NCD number, a 5 digit expiration date, and a 10 alphanumeric lot code, would fit in a symbol measuring only .35 inches by .61 inches. Code 16K symbols can be read by modified moving beam laser or CCD scanners. Rows can be scanned in any order. After the last row has been scanned, the bar code reader automatically puts the information in proper sequence. Labels can be printed by standard printing technologies.

Code 49

Code 49 was developed by David Allais in 1987 to fill a need to pack a lot of information into a very small symbol. Code 49 accomplishes this by using a series of bar code symbols stacked one on top of another. Each symbol can have between two and eight rows. Each row consists of a leading quiet zone; a starting pattern; four data words encoding eight characters, with the last character a row check character; a stop pattern; and a trailing quiet zone. Every row encodes the data in exactly 18 bars and 17 spaces, and each row is separated by a one-module high separator bar. Scanning Code 49 can be done with modified moving beam laser scanners or CCD scanner. Intermec makes a CCD scanner which will decode Code 49 symbols along with standard bar code symbologies. Labels can be printed by standard printing technologies.

CP Code 

CP Code is a proprietary code developed by CP Tron, Inc. It is made up of square matrix symbols with an L-shaped peripheral Finder and adjacent timing marks. Visually similar to Data Matrix Code.

Data Matrix

2-D matrix code designed to pack a lot of information in a very small space. A Data Matrix symbol can store between one and 500 characters. The symbol is also scalable between a 1-mil square to a 14-inch square. That means that a Data Matrix symbol has a maximum theoretical density of 500 million characters to the inch! The practical density will, of course, be limited by the resolution of the printing and reading technology used. The most popular applications for Datamatrix is the marking of small items such as integrated circuits and printed circuit boards. These applications make use of the codeís ability to encode approximately fifty characters of data in a symbol 2 or 3mm square and the fact that the code can be read with only a 20 percent contrast ratio.

Datastrip Code

Datastrip Code was originally called Softstrip and was developed by Softstrip Systems. It is the oldest of the two dimensional symbologies. It is a patented encoding and scanning system that allows, data, graphics and even digitized sound to be printed on plain paper in a highly condensed format and read error-free into a computer. A Datastrip Code consists of a matrix pattern, comprising very small, rectangular black and white areas (or DiBits). Markers down the side and across the top of the strip (start line, checkerboard and rack) contain alignment information for the Datastrip Code readers and ensure data integrity. Header information contains details about the data stored on the strip: file name, number of bytes, density of the data strip, etc. The Datastrip encoding method, which includes parity bits on each encoded line, offers excellent reliability and error correction capabilities.

Dot Code A

Dot Code A is one of a limited number of dot code symbologies. This symbology was designed for unique identification of objects in a relatively small area, or for direct marking by low precision marking technologies. The symbol consists of a square array of dots ranging from 6 x 6, to 12 x 12, the latter enabling over 42 billion, billion, billion, billion individual items to be distinguished. Applications include the identification of laboratory glassware and the marking of laundry.


Maxicode is a matrix code developed by United Parcel Service in 1992. However, rather than being made up of a series of square dots, MaxiCode is made up of an a 1-inch by 1-inch array of 866 interlocking hexagons. This allows the code to be at least 15 percent denser than a square dot code, but requires higher resolution printers like thermal transfer or laser to print the symbol. There is a central bull-eye to allow a scanner to locate the label regardless or orientation. Approximately 100 ASCII characters can be held in the 1-inch square symbol. The symbol can still be read even when up to 25 percent of the symbol has been destroyed and can be read by CCD camera or scanner.

PDF 417

PDF417 is a stacked symbology and was invented by Ynjiun Wang in 1991. PDF stands for Portable Data File, and the symbology consists of 17 modules each containing 4 bars and spaces (thus the number "417"). The code is in the public domain. The structure of the code allows for between 1000 to 2000 characters per symbol with an information density of between 100 and 340 characters. Each symbol has a start and stop bar group that extends the height of the symbol.

3D Barcode (bumpy barcode)

3D barcode really is any linear (1D) barcode (like Code 39 or Code 128) that is embossed on a surface. The code is read by using differences in height, rather than contrast, to distinguish between bars and spaces using a special reader. The code can be used where printed labels will not adhere, or will be otherwise destroyed by a hostile or abrasive environment. They can be painted or coated and still be read. They can be made a permanent feature of a part, making mislabeling impossible.


3-DI was developed by Lynn Ltd and is a proprietary code. 3-DI uses small circular symbols. It is most suited for identification marks on shiny, curved metal surfaces such as surgical instruments. 




Using Bar Codes: Why It's Taking Over
David Jarrett Collins and Nancy Nasuti Whipple
Data Capture Institute, May 1994

Understanding Bar Code
James R. Plunkett
Computer Applications, June 1993

Lines of Communication: Bar Code and Data Collection
Craig K. Harmon
Helmers Publishing, June 1994

The Bar Code Book
of Bar Code Symbols
Roger C. Palmer
Helmers Publishing, 1995

Automatic ID: Questions and Answers
Richard B. Meyers
Advanstar Communications, September 1992