WHY IS A BYTE 8 BITS? OR IS IT?

原文链接:http://www.bobbemer.com/BYTE.HTM

I recently received an e-mail from one Zeno Luiz Iensen Nadal, a worker for Siemens in Brazil. He asked "My Algorythms teacher asked me and my colleagues 'Why a byte has eight bits?' Is there a technical answer for that?"
Of course I could not resist a reply to someone named Zeno, after that teacher of ancient times. Some people copied on the reply thought it a useful document, so (having done the hard work already) I add it to my site as further bite of history.

I am way behind in my work, but I just cannot resist trying to answer your question on why a "byte" has eight bits.
The answer is that some do, and some don't. But that takes explaining, as follows:
If computers worked entirely in binary (and some did a long time ago), and did nothing but calculations with binary numbers, there would be no bytes.
But to use and manipulate character information we must have encodings for those symbols. And much of this was already known from punch card days.
The punch card of IBM (others existed) had 12 rows and 80 columns. Each column was assigned to a symbol, a term I use here although they have fancier names nowadays because computers have been used in so many new ways.
The columns, going down, starting from the top, were 12-11-0-1-2-3-4-5-6-7-8-9. A punch in the 0 to 9 rows signified the digits 0-9. A group of columns could be called a "field", and a number in such a field could carry a plus sign for the number (an additional punch in top row 12 of the units position of the number), or a minus sign (an additional punch in row 11 just under that).
Then they started to need alphabets. This was accomplished by adding the 12 punch to the digits 1-9 to make letters A through I, the 11 punch to make letters J through R. For S through Z they added the 0 punch to the digits 2 through 9 (the 0-1 combination was skipped -- 3x9=27, but the English alphabet has only 26 letters). The 12, 11, and 0 punches were called "zones", and you'll notice them today lurking in the high-order 4 bits. Remember that this was much prior to binary representations of those same characters.
The first bonus was that the 12 and 11 punches without any 0-9 punch gave us the characters + and -. But no other punctuation was represented then, not even a period (dot, full stop) in IBM or telecommunication equipment. One can see this in early telegrams, where one said "I MISS YOU STOP COME HOME STOP". "STOP" stood for the period the machine did not have.
Then punctuation and other marks had combinations of punches assigned, but there had to be 3 punches in a column to do this. In most case the third punch was an extra "8".
In this way, with 10 digits, 26 alphabetic, and 11 others, IBM got to 47 characters. UNIVAC, with different punch cards (round holes, not rectangles, and 90 columns, not 80) got to about 54. But most of these were commercial characters. When FORTRAN came along, they needed, for example, a "divide" symbol, and an "=" symbol, and others not in the commercial set. So they had to use an alternate set of rules for scientific and mathematical work. A set of FORTRAN cards would cause havoc in payroll!
With many early computers these punch cards were used as input and output, and inasmuch as the total number of characters representable did not exceed 64, why not use just 6 bits each to represent them? The same applied to 6-track punched tape for teletypes.
In this period I came to work for IBM, and saw all the confusion caused by the 64-character limitation. Especially when we started to think about word processing, which would require both upper and lower case. Add 26 lower case letters to 47 existing, and one got 73 -- 9 more than 6 bits could represent.
I even made a proposal (in view of STRETCH, the very first computer I know of with an 8-bit byte) that would extend the number of punch card character codes to 256 [1]. Some folks took it seriously. I thought of it as a spoof.
So some folks started thinking about 7-bit characters, but this was ridiculous. With IBM's STRETCH computer as background, handling 64-character words divisible into groups of 8 (I designed the character set for it, under the guidance of Dr. Werner Buchholz, the man who DID coin the term "byte" for an 8-bit grouping). [2] It seemed reasonable to make a universal 8-bit character set, handling up to 256. In those days my mantra was "powers of 2 are magic". And so the group I headed developed and justified such a proposal [3].
That was a little too much progress when presented to the standards group that was to formalize ASCII, so they stopped short for the moment with a 7-bit set, or else an 8-bit set with the upper half left for future work.
The IBM 360 used 8-bit characters, although not ASCII directly. Thus Buchholz's "byte" caught on everywhere. I myself did not like the name for many reasons. The design had 8 bits moving around in parallel. But then came a new IBM part, with 9 bits for self-checking, both inside the CPU and in the tape drives. I exposed this 9-bit byte to the press in 1973. But long before that, when I headed software operations for Cie. Bull in France in 1965-66, I insisted that "byte" be deprecated in favor of "octet".
You can notice that my preference then is now the preferred term. It is justified by new communications methods that can carry 16, 32, 64, and even 128 bits in parallel. But some foolish people now refer to a "16-bit byte" because of this parallel transfer, which is visible in the UNICODE set. I'm not sure, but maybe this should be called a "hextet".
But you will notice that I am still correct. Powers of 2 are still magic!
REFERENCES
R.W.Bemer, "A proposal for a generalized card code of 256 characters",
Commun. ACM 2, No. 9, 19-23, 1959 Sep
-- Computing Reviews 00025
Early public hint of 8-bit bytes to come.
R.W.Bemer, W.Buchholz, "An extended character set standard",
IBM Tech. Pub. TR00.18000.705, 1960 Jan, rev. TR00.721, 1960 Jun
-- Computing Reviews 00813
R.W.Bemer, H.J.Smith, Jr., F.A.Williams,
"Design of an improved transmission/data processing code",
Commun. ACM 4, No. 5, 212-217, 225, 1961 May
-- Computer Abstracts 61-1920
ASCII in its original form.
Back to Home Page

WHY IS A BYTE 8 BITS? OR IS IT?的更多相关文章

  1. 字节、字、bit、byte的关系

    字 word 字节 byte 位 bit 字长是指字的长度 1字=2字节(1 word = 2 byte) 1字节=8位(1 byte = 8bit)  一个字的字长为16 一个字节的字长是8 bps ...

  2. 解剖SQLSERVER 第五篇 OrcaMDF里读取Bits类型数据(译)

    解剖SQLSERVER 第五篇  OrcaMDF里读取Bits类型数据(译) http://improve.dk/reading-bits-in-orcamdf/ Bits类型的存储跟SQLSERVE ...

  3. [转]java byte 数据类型(基础)

    package com.suypower.chengyu.test; public class ByteTest { /** * byte 8 bits -128 - + 127 * 1 bit = ...

  4. Java bit、byte、位、字节、汉字、字符

    package com.suypower.chengyu.test; public class ByteTest { /** * byte 8 bits -128 - + 127 * 1 bit = ...

  5. 位(Bit)与字节(Byte)

    字 word 字节 byte 位 bit 字长是指字的长度 1字=2字节(1 word = 2 byte) 1字节=8位(1 byte = 8bit) 一个字的字长为16 一个字节的字长是8 bps ...

  6. 字节、字、bit、Byte、byte的关系区分

    1.位(bit)             来自英文bit,音译为"比特", 表示二进制位.位是计算机内部数据存储最小单位,11010100是一个8位二进制数.一个二进制位只可以表示 ...

  7. bit、Byte、bps、Bps、pps、Gbps的单位详细说明及换算

    1. bit 电脑记忆体中最小的单位,在二进位电脑系统中,每1bit 可以代表0 或 1 的数位讯号. 2. Byte 字节单位,一般表示存储介质大小的单位,一个B(常用大写的B来表示Byte)可代表 ...

  8. 字节、字、bit、byte的关系【转】

    字 word 字节 byte 位 bit 字长是指字的长度 1字=2字节(1 word = 2 byte) 1字节=8位(1 byte = 8bit)  一个字的字长为16 一个字节的字长是8 bps ...

  9. 字节(byte)与位(bit)基础回顾

    预估方式:一个uid,String类型,最长约50字节,即50Byte,一天100亿PV,则100亿*50Byte,约500G容量存ES中或Hbase中,无法存日志文件中,一个docker磁盘才50G ...

随机推荐

  1. .NET同步原语Barrier简介

    Barrier(屏障)是一种自定义的同步原语(synchronization primitive),它解决了多个线程(参与者)在多个阶段之间的并发和协调问题. 1)多个参与者执行相同的几个阶段的操作 ...

  2. netty系列之:netty架构概述

    目录 简介 netty架构图 丰富的Buffer数据机构 零拷贝 统一的API 事件驱动 其他优秀的特性 总结 简介 Netty为什么这么优秀,它在JDK本身的NIO基础上又做了什么改进呢?它的架构和 ...

  3. Python 统计列表中重复元素的个数并返回其索引值

    需求:统计列表list1中元素3的个数,并返回每个元素的索引 list1 = [3, 3, 8, 9, 2, 10, 6, 2, 8, 3, 4, 5, 5, 4, 1, 5, 9, 7, 10, 2 ...

  4. RHCE_DAY03

    shell函数 在shell环境中,将一些需要重复使用的操作,定义为公共的语句块,即可称为函数(给一堆命令取一个别名) 函数可以使脚本中的代码更加简洁,增强易读性,提高脚本的执行效率 #函数定义格式1 ...

  5. 工资8000以下的Android程序员注意了!接下来要准备面对残酷现实了……

    最近在知乎看到一个测试,特扎心: 以下三种情况,哪个最让你绝望? ❶ 每月工资去掉开销还存不到3千: ❷ 家人突然急病住院,医药费10万: ❸ 同班的家长都在争先恐后给孩子报名各种辅导班.兴趣班,但一 ...

  6. java数据类型和类型得转换

    java数据类型 强类型语言 ​ Java是一种强类型得语言,严格要求变量要符合规定,所有变量都必须先定义再使用 java得数据类型分为两大类 值得注意得是String并不是一个数据类型,它是一个类 ...

  7. js中其他数据类型的值转为字符串的相关总结

    有这样一个面试题: 此题考查的是其他类型的值转换为字符串后的结果 下面我们就由此来总结一下其他类型的值转为字符串后的值都是什么? 从上面的实例可以看出,基本数据类型的值转换成字符串都如我们预期的那样. ...

  8. Spring Cloud Alibaba - Gateway

    Gateway Gateway简介 底层使用Netty框架,性能大于Zuul 配置gateway模块,一般使用yaml格式: server: port: 80 #spring boot actuato ...

  9. SpringBoot开发十二-账号设置

    需求介绍-账号设置 账号设置里面的上传头像(文件) 首先请求必须是一个 POST 请求,其次表单的属性 enctype = "multipart/form-data" 然后就是利用 ...

  10. Jackson格式化时间和科学计数法问题

    1. 首先如果有自定义   WebMvcConfigurer 或者 WebMvcConfigurationSupport 的,一定不要在上面加 @EnableWebMvc 注解,因为这个注解会覆盖掉s ...