A few scanning tips

www.scantips.com

Menu of the other Photo and Flash pages here

Understanding File Types, Bit Depth, & Memory Cost of Images

The RGB Color topic was moved to its own page.

Calculators below:

Image Size Goal for Desired Print Size
The Four Sizes of a Digital Image - How many bytes?
Convert B, KB, MB, GB, and TB size numbers
Scanned Image Size

Large photo images consume much memory and can make our computers struggle. Uploads can be very slow. Memory cost for an image is computed from the image size. Our common 24-bit RGB image size is three bytes per pixel when uncompressed in memory (so 24 megapixels is x3 or 72,000,000 bytes, which is 68.7 MB uncompressed in memory, but can be smaller in a compressed file. Digital camera images today are typically much larger than most viewing or printing purposes can use (but the plentiful pixels offer advantages for largest prints or more extreme cropping, etc).

One needed basic that shows the image size necessary to have sufficient pixels to properly print a photo is this very simple calculation:

Image Size Goal for
desired Print Size

To print x
 inches
 mm
at dpi resolution  

This will show the required image size (pixels) to print this paper size at the desired dpi resolution.
Scanning Size is the same calculation, more detail at end below.
File size is shown at The Four Sizes of a Digital Image below

Printing photos at 250 or 300 dpi is considered very desirable and optimum. But this dpi number does NOT need to be exact, 10% or 15% variation won't have great effect. But planning image size to have sufficient pixels to be somewhere around 240 to 300 pixels per inch is a very good thing for printing, called "photo quality". More than 300 dpi really can't help photo prints, but less than 200 dpi can suffer less image quality. It's generally about what our eye is capable of seeing, but varies with the media. See a printing guideline for the resolution needed for several common purposes.

That's a pretty simple calculation. More pixels will work too (but slow to upload, essentially wasted effort). The printer or the print lab will simply discard the excess, but too many fewer pixels can seriously limit the resolution and sharpness of the printed copy.

Cropping Aspect Ratio to fit the paper size is an important concern too.

And there is a larger dpi calculator that knows about scanning, printing, and enlargement.

The memory cost for the initial default 8x10 inch color image is:

  3000 x 2400 pixels x 3 = 21.6 million bytes = 20.6 megabytes.

The last "× 3" is for 3 bytes of RGB color information per pixel for 24-bit color (3 RGB values per pixel, which is one 8-bit byte for each RGB value, which totals 24-bit color).

But the compressed file will be smaller (maybe 10% of that size for JPG), selected by our choice for JPG Quality. But the smaller it is, the worse the image quality. The larger it is, the better the image quality. If uncompressed, the data is three bytes per pixel.

Data Compression and File Sizes

Image size is always dimensioned in pixels, for example 6000x4000 pixels, or 24 megapixels.
Data and File size is dimensioned in bytes, for example, 12 megabytes (often compressed for storage).
24-bit RGB photo data is always 3 bytes per pixel (when uncompressed for use).

Data is often compressed to smaller size for storage in the file (very radically smaller for JPG). It must be uncompressed for use.

Lossy compression can make small changes in the data values. Lossy compression cannot be used in like Quicken, Excel or Word or backup software, because we insist that every byte compressed come back out exactly as it went into the file. Anything else is corruption. However, image tonal values can be more forgiving in casual use, until it becomes excessive.

Image Data Compression is of two types, lossless or lossy.

Data compression while in the file varies data size too much for bytes to have specific meaning about image size. Saying, the size of our 24 megapixel image is 6000x4000 pixels. This "dimension in pixels" is the important parameter that tells us how we can use that image. The data size may be 72 MB (uncompressed, or maybe 12 MB or other numbers if compressed in a JPG file), but that file size doesn't tell us anything about the image size, only about storage space or internet speed. For example, normally we have a 24-bit color photo image which is 3 bytes of data per pixel when uncompressed (one byte each of RGB data). That means any 24 megapixel camera takes RGB images of data size 72 million bytes (the calculator below converts this to 68.7 MB, the data size before compression). However, data compression techniques can make this data smaller while stored in the file. In some cases drastically smaller, and maybe the 68.7 MB goes into perhaps a 4 to 16 MB file if JPG compression. We can't state any exact size numbers, because when creating the JPG file (in camera or in editor), we can select different JPG Quality settings. For this example 24 megapixel image, the JPG results might range from:

Of course, we do prefer higher quality. We do our photos no favor by choosing lower JPG quality. However, emailing grandma a picture of the kids doesn't need to be 24 megapixels. Maximum dimension of maybe 1000 pixels is reasonable for email, still large on the screen. Or even less if to a cell phone. Even printing 5x7 inches only needs 1500x2100 pixels. But this resample should be a COPY. Never overwrite your original image.

JPG files made too small are certainly not a plus, larger is better image quality. Surely we want our camera images to be the best they can be. Also this compressed file size naturally varies some with image content too. Images containing much fine detail everywhere (a tree full of small leaves) will be a little larger, and images with much blank featureless content (walls or blue sky, etc.) will be noticeably smaller (better compressed). File sizes might vary over a 2:1 range due to extreme scene detail differences. But JPG files are typically 1/5 to 1/12 of the image data size (but other extremes do exist). Both larger and smaller are possible (an optional choice set by JPG Quality setting).

Then when the file is opened and the image data is uncompressed and shown, the image data comes back out of the file uncompressed, and original size, with the original number of bytes and pixels when open in computer memory. Still the same pixel counts, but JPG Quality differences affect the color accuracy of some of the pixels (image detail is shown by the pixel colors). Bad compression effects can add visible added JPG artifacts, which we can learn to see.

Best Safe Plan to Use JPG Images

The wise choice is to ALWAYS archive and preserve the original pristine JPG image from the camera. When edit or resize is needed, edit the image as desired, but then only make another high quality JPG file COPY to use (with a different file name). Never overwrite your original file, it may be important to have later. The more important the image, the more important it is to retain a copy of the pristine original image intact. There is no other way to go back.

And a second reason: Don't re-edit any JPG copy additional times, meaning if subsequent plans require yet another edit or resized image, NEVER start from that previously edited JPG file (JPG lossy compression means it already has two sets of JPG artifacts in it, from the camera and then the first edit), so a third or fourth won't help that. Treat a JPG copy as expendable, discard it when done with it). START OVER from the archived unmodified original file. Because, each SAVE operation to a JPG file does the JPG compression again, on top of any previous Saves as JPG. Or, if the first edit was extensive work (more than just simple), you could think ahead then to also save that work into a lossless file (TIF LZW or 24-bit PNG, which are lossless and will not add additional JPG artifacts), and also save that file as an archive, and then use it as a master version, and make any subsequent JPG copy from it. This TIF save will not remove any existing JPG artifacts in the image data, but it will not add more.

JPG artifacts are something we all need to know about, but it takes more to show this, so it was placed on its own page.

You might reason that these first edits are important, and any use would want them, so overwriting the original file might seem an acceptable plan. Which might even sometimes be true, but I've been there and done that, but won't do it again (certainly not on any image even a bit important), because each save (as JPG) adds additional JPG artifacts. My future plans could change, which does happen. Any important image certainly should first save the original (certainly without cropping or resampling, since even just printing a different size print needs different dimensions). Always retaining the original file is one important benefit that Raw images provide (called lossless editing). A High Quality JPG save does not seem to hurt much, but eventually (after repeated SAVES as JPG) you may discover that your most important image has suffered damage, and then it is too late. Each Save as JPG would be one more cumulative SAVE AS JPG, which adds additional JPG losses each time saved, and the only way to prevent that is to not do it, and to instead go back to the unmodified original file, if you still have it. Plan to keep it safe. The best insurance is to preserve your original image (and keep a backup too, on a different disk drive).

An alternative plan for important images is to always save your archived edits as TIF LZW or as 24-bit PNG for photos (NOT 8-bit PNG, which is for graphics), which are larger files but are lossless compression, so no concern about image quality. Edit and save TIF LZW or 24-bit PNG at will, all you please (but still preserving your original file of course). However, realize that saving an existing JPG as TIF or PNG simply retains all of the original JPG artifacts too. Then at the end, archive it, but make one final high quality JPG copy of it to be used out in the world. When and if you need additional change, discard that JPG as expendable, and start from your archived lossless file, and finally make a replacement JPG. The idea is that the image only suffers two JPG compressions, the original one in the camera, and this final one after the edits. Both should of course use HIGH JPG QUALITY.

So do plan ahead, there is no going back. The more important the image, the more you need to think this out. Don't begin by messing up your only original image. After you have "been there, done that", this idea will become very important to you. One advantage of using Raw files is that it makes this step mandatory and easy (lossless edits, but Raw also has other bigger advantages).

Photo programs differ in how they describe JPG Quality. The software has options about how it is done, and Quality 100 is arbitrary (Not a percentage of anything), and it NEVER means 100% Quality. It is always JPG. But Maximum JPG Quality at 100, and even Quality of 90 (or 9 on a ten scale) should be pretty decent. I usually use Adobe Quality 9 for JPG pictures to be printed, as "good enough". Web pictures usually are less quality, because file size is so important on the web, and they are only glanced at one time.

13 MB JPG from 68.7 MB data would be 19% original size (~1/5), and we'd expect fine quality (not exactly perfect, but extremely adequate, hard to fault).

6 MB JPG from 68.7 MB would be compression to 8% size (~1/12), and we would Not expect best quality. Perhaps acceptable for some casual uses, like for the internet, but anything smaller would likely be bad news.

Compromising small, down towards 1/10 size (10%) might be a typical and reasonable file size for JPG, except when we might prefer better results. We should realize too, that images with much blank featureless areas like sky or blank smooth walls can compress exceptionally well, less than 10%, which that is Not an issue itself, but a number like 10% is just a very vague specification. File size is not the final criteria, we have to judge how the picture looks. We can learn to see and judge JPG artifacts. We would prefer not to see any of them in our images.

But there are downsides with JPG, because it is lossy compression, and image quality can be lost (not recoverable). The only way to recover is to discard the bad JPG copy and start over again from the pristine original camera image. Selecting higher JPG Quality is better image quality but a larger file size. Lower JPG Quality is a smaller file, but lower image quality. Don't cut off your nose to spite your face. Large is Good regarding JPG, the large one is still small. File size may matter when the file is stored, but image quality is important when we look at the image. Lower JPG quality causes JPG artifacts (lossy compression) which means the pixels may not all still be the same original color (image quality suffers from visible artifacts). There are the same original number of bytes and pixels when opened, but the original image quality may not be retained if JPG compression was too great. Most other types of file compression (including PNG and GIF and TIF LZW) are lossless, never any issue, but while impressive, they are not as dramatically effective (both vary greatly, perhaps 70% size instead of 10% size).

How many bytes? There are four sizes of a digital image.

Image Size is dimensioned in pixels, which is what determines how the image might be suitably used. The FIRST numbers you need to know about using a digital image is its dimensions in pixels (and the image size viewed on the monitor screen is still dimensioned in pixels).

Data Size is its uncompressed size in bytes when the file is opened into computer memory.

File Size is its size in bytes in a disk file (which is Not a meaningful number regarding how the image might be used. Image size is in pixels). Data compression (such as JPG) can reduce the file size drastically, but image size and data size remain the original same.

Print Size is its size when printed on paper (inches or mm). The size of film is also inches or mm. Sensor size or film size must be enlarged to the print or viewing size.

Again, image size on a monitor screen is still dimensioned in pixels (print paper is dimensioned in inches or mm, but screens are dimensioned in pixels). If the image size is larger than the screen size, we normally are shown a temporary resampled smaller copy of more suitable smaller size.

The usual and most common type of color image (such as any JPG file) is the 24-bit RGB choice.

Calculate the Four Sizes of an Image

  Specify image size with one of these two options:
Image Size x pixels
Megapixels   and Aspect Ratio
  Data Type
  Add estimated Exif size (optional)  
Bytes
KB
  If Printed at pixels per inch
Image Size
Data Size
File Size
Print Size

Disclaimer: Image Size is the actual size of binary image, in pixels. Data Size is the uncompressed data bytes for the image pixels when the file is opened into computer memory. These parts are known and simple, but there are also other factors.

Note that uncompressed 24-bit RGB data is always three bytes per pixel, regardless of image size. Color data in JPG files is 24-bit RGB. For example, an uncompressed 24 megapixel 6000x4000 pixel image is 6000x4000 x 3 = 72 million bytes, also 24 x 3, every time. That is its actual size in computer memory bytes when the file is opened. Fill in your own numbers, but converting to MB units is bytes divided by 1048576 (or just divide by 1024 twice) which converts units to 68.66 megabytes. The JPG files will vary in size, because JPG compression degree varies with scene detail level, and with the proper JPG Quality factor specified when writing the JPG.

Speaking of scene size variations, if you have several dozen JPG images from widely assorted random scenes, in one folder (but specifically, all written from one source at the same image size with same JPG settings), and then sorted by size, the largest and smallest file might often vary by 2:1 file size (possibly much more for extremes). Smooth areas of featureless detail (cloudless sky, smooth walls, etc) compress significantly smaller than a scene full of highly detailed areas (like many trees or many tree leaves for example). If a JPG in this 24 megapixel example is say 12.7 MB size, then (ignoring small Exif) it is 12.7 MB / 68.66 MB = 18.5% size of uncompressed, which is 1/0.185 = 5.4 : 1 size reduction. That would be a high quality JPG. But JPG file size does also vary with the degree of scene detail, so file size is not a hard answer of quality. See a sample of this JPG size variation. See more detail about pixels.

Compatible File Types

Different color modes have different size data values, as shown.

Image TypeBytes per pixelPossible color
combinations
Compatible
File Types
1-bit
Line art
1/8 byte per pixel2 colors, 1 bit per pixel.
One ink on white paper
TIF, PNG, GIF
8-bit Indexed ColorUp to 1 byte per pixel if 256 colors256 colors maximum.
For graphics use today
TIF, PNG, GIF
8-bit Grayscale1 byte per pixel256 shades of grayLossy: JPG
Lossless:
  TIF, PNG
16-bit Grayscale2 bytes per pixel65636 shades of grayTIF, PNG
24-bit RGB
(8-bit mode)
3 bytes per pixel (one byte each for R, G, B) Computes 16.77 million colors max. 24-bits is the "Norm" for photo images, e.g., JPGLossy: JPG
Lossless:
  TIF, PNG
32-bit CMYK4 bytes per pixel, for PrepressCyan, Magneta, Yellow and Black ink, typically in halftonesTIF
48-bit RGB
(16-bit mode)
6 bytes per pixel 2.81 trillion colors max.
Except we don't have 16-bit display devices
TIF, PNG

The number of color combinations are the "maximum possible" computed. The human eye is limited, and might be able to distinguish 1 to 3 million of the 16.77 million possible in 24-bit color. A typical real photo image might have about 100K to 400K unique colors used.

A few notes:

A few features of common file types
File PropertyJPG   TIF   PNG   GIF
Web pages can show itYes YesYes
Uncompressed option Yes
Lossy compressionYes
Lossless compression YesYesYes
GrayscaleYesYesYesYes
RGB colorYesYesYes
8-bit color (24-bit data)YesYesYes
16-bit color (48-bits) YesYes
CMYK or LAB colorYes
Indexed color option YesYesYes
Transparency option YesYes
Animation option Yes

8-bits: As is common practice, there are often multiple definitions used for the same words, with different meanings: 8-bits is one of those.

In RGB images - 8-bit "mode" means three 8-bit channels of RGB data, also called 24-bit "color depth" data. This is three 8-bit channels, one byte for each of the R or G or B components, which is 3 bytes per pixel, 24-bit color, and up to 16.7 million possible color combinations (256 x 256 x 256). Our monitors or printers are 8-bit devices, meaning 24-bit color. 24-bits is very good for photos.

In Grayscale images (B&W photos), the pixel values are one channel of 8-bit data, of single numbers representing a shade of gray from black (0) to white (255).

Indexed color: Typically used for graphics containing relatively few colors (like only 4 or 8 colors). All GIF and PNG8 files are indexed color, and indexed is an option in TIF. These indexed files include a color palette (is just a list of the actual RGB colors). An 8-bit index is 28 = 256 values of 0..255, which indexes into a 256 color palette. Or a 3-bit index is 23 = 8 values of 0..7, which indexes into an 8 color palette. The actual pixel data is this index number into that limited palette of colors. For example, the pixels data might say "use color number 3", so the pixel color comes from the palette color number 3, which could be any 24-bit RGB color stored there. The editor creating the indexed file rounds all image colors into the closest values of just this limited number of possible palette values. The indexed pixel data is most commonly still one byte per pixel before compression, but if the bytes only contain these small index numbers for say 4-bit 16 colors, compression (lossless) can do awesome size reductions in the file. Being limited to only 256 colors is not good for photo images, which normally contain 100K to 400K colors, but 8 or 16 colors is a very small file and very suitable for a graphics of only a few colors. More on Indexed color.

8-bit color was in common use before our current 24-bit color hardware became available. A note from history, we might still see old mentions of "web safe colors". This wasn't about security, this standard was back in the day when our 8-bit monitors could only show the few indexed colors. The "web safe" palette was six shades of each R,G,B (216), plus 40 system colors the OS might use. These colors would be rendered correctly, any others were just nearest match. "Web-safe" is obsolete now, every RGB color is "safe" for 24-bit color systems today.

Line Art (also called Bilevel) is two colors, normally black ink dots on white paper (the printing press could use a different color of ink or paper, but your home printer will only use black ink). Line art is packed bits and is not indexed (and is not the same as 2 color Indexed, which can be any two colors from a palette, and indexed uncompressed is still one byte per pixel, but compression is very efficient on the smaller values). Scanners have three standard scan modes, Line art, Grayscale, or Color mode (they may call it these names, or some (HP) may call them B&W mode and B&W Photo mode and Color, same thing. Line art is the smallest, simplest, oldest image type, 1 bit per pixel, which each pixel is simply either a 0 or 1 data. Examples are that fax is line art, sheet music would be best as line art, and printed text pages are normally best scanned as line art mode (except for any photo images on the same page). The name comes from line drawings such as newspaper cartoons which are normally line art (possibly color is added today inside the black lines, like a kids coloring book). We routinely scan color work at 300 dpi, but line art is sharper lines if created at 600 dpi, or possibly even 1200 dpi if you have some way to print that (that works because it's only one ink, there are no color dots that have to be dithered). Even so, line art makes very small files (especially if compressed). Line art is great stuff when applicable, the obvious first choice for these special cases. Line art mode in Photoshop is cleverly reached at Image - Mode - BitMap, where it won't say line art, but line art is created by selecting 50% Threshold there in BitMap (which has to already be a grayscale image to reach BitMap). BitMap there is actually for halftones, except selecting 50% Threshold there means all tones darker than middle will simply be black, and all tones lighter than middle will be white, which is line art. Two colors, black and white (50% threshold) means all tones darker than middle will simply be black, and all tones lighter than middle will be white, which is line art. Two colors, black and white.

One MB is a little more Than One Million Bytes

The memory size of images is often shown in megabytes. You may notice a little discrepancy from the number you calculate from pixels with WxHx3 bytes. This is because (as regarding memory sizes) "megabytes" and "millions of bytes" are not quite the same units.

Memory sizes in terms like KB, MB, GB, and TB count in units of 1024 bytes for one K, whereas humans count thousands in units of 1000.

A million is 1000x1000 = 1,000,000, powers of 10, or 106. But binary units are used for memory sizes, powers of 2, where one kilobyte is 1024 bytes, and a one megabyte is 1024x1024 = 1,048,576 bytes, or 220. So a number like 10 million bytes is 10,000,000 / (1024x1024) = 9.54 megabytes. One binary megabyte holds 4.86% (1024×1024/1000000) more bytes than one million, so there are 4.86% fewer megabytes than millions.

Understanding KB, MB, GB, TB Size Units of Memory Chips

Type a value somewhere here, and click its Units button, to convert the other Unit equivalences.

Convert: Bytes, KB, MB, GB, TB

Convert Memory sizes, units of 1024
Convert megapixel and disk sizes, units of 1000
Bytes B
Kilobytes KB
Megabytes MB
Gigabytes GB
Terabytes TB

If changing mode between 1024 (220) and 1000 (103) units, it will retain and use the previous K, KB, MB, GB or TB choice.

If you might see a format like an "e-7" in a result, it just means to move the decimal point 7 places to the left (or e+7, move to right). Example: 9.53e-7 is 0.000000953

Any computed fractional bytes are rounded to whole bytes. In binary mode, each line in the calculator is 1024 times the line below it (powers of 2). Which is binary, and is how memory computes byte addresses. However humans normally use 1000 units for their stuff (powers of 10). To be very clear:

Binary powers of 2 are 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 ... which is 2 to the power of 0, 1, 2, 3, 4, 5, etc.

Digital powers of 10 are 1, 10, 100, 1000, 10000, 100000 ... which is 10 to the power of 0, 1, 2, 3, 4, 5, etc.

Specifically, megapixels and the GB or TB hard disk drive we buy are correctly dimensioned in 1000 units, and a 500 GB drive is correctly 500,000,000,000 bytes. However when we format the drive, when Windows then shows 1024 units, calling it 465.7 GB - but which is exactly the same bytes either way. Memory chips (including SSD and camera cards and USB sticks) necessarily use 1024 units. File sizes do not need 1024 units, however it has been common practice anyway. Windows may show file size either way, depending on location (Windows File Explorer normally shows binary KB, but Cmd DIR shows actual decimal bytes).

Convert with direct math
From\ToBKBMBGBTB
B-/1024/1024
2 times
/1024
3 times
/1024
4 times
KBx1024-/1024/1024
2 times
/1024
3 times
MBx1024
2 times
x1024-/1024/1024
2 times
GBx1024
3 times
x1024
2 times
x1024-/1024
TBx1024
4 times
x1024
3 times
x1024
2 times
x1024-

The math is easy to do directly. Conversion goes from left to right in the table. If you want to convert bytes to MB, Bytes to MB is two steps right in the list (B, KB, MB, GB, TB), so just divide bytes by 1024 twice to get MB. Or divide three times for GB.

Example:
3 GB = 3×1024×1024 = 3,145,728 KB
( x 1024 two times for GB to KB)

We also see units of Mb as a rate of bandwidth. Small b is bits, as in Mb/second of bandwidth. Capital B is bytes of data, as in MB size. Bandwidth uses digital units, powers of 10. There are Eight bits per byte, so Mb = MB x 8.

About Megabyte and Megapixel numbers

Humans count in decimal units of 10 or 1000 (which is 103), but binary units are of 2 or 1024 (which is 210). Binary units are necessarily used for memory chips, including SSD and flash drives. These are different numbers.

Because every memory chip address line to select a byte can have two values, 0 and 1, therefore chip hardware memory Total byte count must be a power of 2, for example 2, 4, 8 16, 32, 64, 128, 512, 1024, etc, etc.) But then computer operating systems arbitrarily got the notion to also use 1024 units for file sizes, but it is not necessary for file sizes, and it just confuses most humans. :)   But all other human counting uses normal decimal 1000 units (powers of 10 instead of binary 2).

Specifically, specifications for megapixels in digital images, and hard disk drive size in gigabytes are both correctly advertised as multiples of decimal thousands... millions are 1000x1000. Or giga is 1000x1000x1000. Same way as humans count. The calculator offers a mode for units of 1000 to make the point about the difference. That 1000 is a smaller unit than 1024, therefore there are fewer memory units of KB, MB and GB, each of which holds a more bytes. The same amount of bytes just have different counting units. Thousands is just how humans count (in powers of 10) — and million IS THE DEFINITION of Mega.

However, after formatting the disk, the computer operating system has notions to count it in binary GB. There's no good reason for that on hard drives, it is merely a complication. The disk manufacturer did advertise size correctly, and formatting does NOT make the disk smaller, the units just change (in computer lingo, 1K became counted as 1024 bytes instead of 1000 bytes). This is why we buy a 500 GB hard disk drive (sold as 1000's, the actual real count, the decimal way humans count), and it does mean 500,000,000,000 bytes, and we do get them all. But then we format it, and then we see it said to be 465 gigabytes of binary file space (using 1024). Both numbering systems are precisely numerically correct in their way. An actual 2 TB disk is 2,000,000,000,000 bytes / (1024 x 1024 x 1024 x 1024) = 2.819 TB in the computer operating system. Still same exact size in bytes either way. But users who don't understand this numbering system switch might assume the disk manufacturer cheated them somehow. Instead, no, not at all, you got the honest count. The disk just counted in decimal, same way as we humans do. No crime in that, mega does in fact mean million (106), and we do count in decimal (powers of 10 instead of 2). It is the operating system that confuses us, calling mega something different, as powers of 2 (220 = 1,048,576).

So again, note that a 2 TB disk drive does actually have 2,000,000,000,000 bytes (digital count). But instead, the operating system converts it to specify it as 1.819 TB (binary, but it really does have 2 TB of bytes, in the way humans count in powers of 10). Powers of 10 is also true of camera megapixels, which also have no need to use the binary counting system (megapixels are NOT binary powers of 2).

So kilo, mega, giga and tera terms were defined as powers of 10, but were corrupted to have two meanings. Computers used those existing terms with different meanings for memory sizes. Memory chips necessarily must use the binary counting system, but it is not necessary for hard disks or disk files (even if the operating system insists on it anyway). The meaning of the Mega, Kilo, Giga and Tera prefixes does and always has meant decimal units of 1000. And with the goal to preserve their actual decimal meanings, new international SI units Ki and Mi and Gi were defined in 1998 for the binary power units, but they have not caught on. So, this is still a complication today. Memory chips are binary, but there is absolutely no reason why our computer operating system still does this regarding file sizes. Humans count in decimal powers of 10, including megapixels, and also the hard disk manufacturers counting bytes.

However, Memory chips (also including SSD and camera memory cards and USB flash sticks, which are all memory chips) are different, and their construction requires using binary kilobytes (counting in 1024 units) or megabytes (1024x1024) or gigabytes (1024x1024x1024). This is because each added address line exactly doubles size. Example, four address lines is a 4-bit number counting up to 1111 binary, which is 15 decimal, which therefore can address 16 bytes of memory (0 to 15). Or 8-bits counts 256 values, or 16-bits addresses 65536 bytes. So if the memory chip has N address lines, it necessarily provides 2N bytes of memory. That's why memory size is dimensioned in units of 1024 bytes for what we call a 1K step. When two of these 1K chips are connected together, the plan is that they count up to 2x or 2048 bytes. But if each implemented only 1000 bytes, that leaves a missing 24 byte gap between them, when memory addressing would fail.

So there are good necessary technical reasons for memory chips to use binary numbers, because each address bit is a power of two — the sequence 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, makes it be extremely impractical (simply unthinkable) to build a 1000 byte memory chip in a chip that counts to 1024 multiples. It simply would not come out even. The binary address lines count 0 to 1023, so it is necessary to add the other 24 bytes to fill it up. By totally filling the memory chip address lines, we can connect a few chips in series and have continuous larger memory. However, leaving any gaps in the addressing would totally ruin it (simply unusable bad byte values there), so it is never done (unthinkable).

In the early days, memory chips were very small, and it was a concern if they could hold a the size of one specific file. Describing these files in binary terms to match the memory chip was useful then to know if it would fit. However there is no good reason for file sizes in binary today. Files are just a sequential string of bytes, which can be any total number. But memory chip size must be a binary power of 2, to match the address lines. The memory chip arrays today likely hold gigabytes and thousands of any files. So it is now unimportant to know an exact binary count in a file any more, and counting them in binary is an useless complication now. Nevertheless, the operating system counting in binary 1024 units is still commonly still done on files too. If we did have a file of actual size exactly 200,000 bytes (base 10), the computer operating system will call it 195.3 KB (base 2).

In base 10, we know the largest numeric value we can represent in 3 digits is 999. That's 9 + 90 + 900 = 999. When we count by tens, 1000 requires 4 digits, 103 = 1000, which is one more than three digits can hold. Binary base 2 works the same way, the largest number possible in 8-bits is 255, because 28 = 256 (which is 9 bits). So 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 = 255. And 16-bits can contain addresses 0..65535. 216 = 65536 is one more than 16-bits can address.

Units of 1000 are extremely handy for humans, we can convert KB and MB and GB in our head, by simply moving the decimal point. Units of 1024 are not so easy, but it came about in the computer early days back when 1024 bytes was a pretty large chip. Historically, we had to count bytes precisely to ensure the data would fit into the chip, and the 1024 number was quite important for programmers. Not still true today, chips are huge, and exact counts are unimportant now. Hard drives dimension size in units of 1000, but our operating systems still like to convert file sizes to 1024 units. There is no valid reason why today...

But as a computer programmer, back in the day decades ago, I had the job of modifying a computer's boot loader in a 256 byte PROM. It was used in 8080 chips in factory test stations that booted from a console cassette tape, and I had to add booting from a central computer disk if it was present. I added the code, but it was too large. After my best tries, the two methods were still 257 bytes, simply one byte too large to fit in the PROM chip. It took some dirty tricks to make it fit and work. So memory size was very important in the earliest days (of tiny memory chips), but today, our computers have several GB of memory and maybe terabytes of disk storage, and the exact precise file sizes really matter little. Interesting color, at least for me. :)

The definition of the unit prefix "Mega" absolutely has always meant millions (decimal factors of 1000x1000) — and it still does mean 1000, it does NOT mean 1024. However, memory chips are necessarily dimensioned in binary units (factors of 1024), and they simply incorrectly appropriated the terms kilo and mega, years ago... so that's special, but we do use it that way. In the early days, when memory chips were tiny, it was useful to think of file sizes in binary, when they had to fit. Since then though, chips have become huge, and files can be relatively huge too, and we don't sweat a few bytes now.

Note that you may see different numbers in different units for the same file size dimension:


Scanning Size calculator

There is a larger dpi calculator that knows about scanning, printing, and enlargement.

Scanning any 6x4 inch photo will occupy the amounts of memory shown in the table below. I hope you realize that extreme resolution rapidly becomes impossible.

You may enter another resolution and scan size here, and it will also be calculated on the last line of the chart below. Seeing a result of NaN means that some input was Not a Number.

Scan size: by
inches
cm

At scan resolution: dpi    

(Result is shown on last row of the table below)

When people ask how to fix memory errors when scanning photos or documents at 9600 dpi, the answer is "don't do that" if you don't have 8 gigabytes of memory, and a 9600 dpi scanner, and have a special reason. It is normally correct to scan at 300 dpi to reprint at original size (600 dpi can help line art scans, but normally not if color or grayscale photos).

Saying that again: (is about a common first time error)

Scanning a 35 mm slide to print at 8x10 inches is very roughly 9x enlargement (approximate, allowing for very slight cropping).
The goal is that to print 8x10 inches at 300 dpi needs 2400x3000 pixels.
Two scanning methods work. Both examples here will scan at 2700 dpi:

The scan Input size is the 35 mm film size. The Output size is the print 8x10 inches.
You mark the Input size on the scanner preview with the mouse.

The pixels are the same either way (A or B), about 2400 x 3000 pixels. If sending it out with instruction to print 8x10, it will be 8x10 either way. You do need sufficient pixels (reasonably close), but it need not be precisely 300 dpi, most shops will probably print at 250 dpi anyway.

There are two points here:

Notice that when you increase resolution, the size formula above multiplies the memory cost by that resolution number twice, in both width and height. The memory cost for an image increases as the square of the resolution. The square of say 300 dpi is a pretty large number (more than double the square of 200).

Scan resolution and print resolution are two very different things. The idea is that we might scan about 1x1 inch of film at say 2400 dpi, and then print it 8x size at 300 dpi at 8x8 inches. We always want to print photos at about 300 dpi, greater scan resolution is only for enlargement purposes.
The enlargement factor is Scanning resolution / printing resolution. A scan at 600 dpi will print 2x size at 300 dpi.
Emphasizing, unless it is small film to be enlarged, you do not want a high resolution scan of letter size paper. You may want a 300 dpi scan to reprint it at original size.

When we double the scan resolution, memory cost goes up 4 times. Multiply resolution by 3 and the memory cost increases 9 times, etc. So this seems a very clear argument to use only the amount of resolution we actually need to improve the image results for the job purpose. More than that is waste. It's often even painful. Well, virtual pain.  <grin>


Copyright © 1997-2021 by Wayne Fulton - All rights are reserved.

Previous Main Next