[go: nahoru, domu]

Jump to content

SQUOZE: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
+table
→‎References: improved refs
 
(31 intermediate revisions by 9 users not shown)
Line 1: Line 1:
{{Short description|Compression scheme}}
{{About|a compression scheme|the similarly named transponder code in aeronautics|Squawk code}}
{{About|a compression scheme|the similarly named transponder code in aeronautics|Squawk code}}
{{Use dmy dates|date=June 2020|cs1-dates=y}}
{{Use dmy dates|date=June 2020|cs1-dates=y}}
{{Use list-defined references|date=December 2021}}
'''SQUOZE''' (abbreviated as '''SQZ''') is a memory-efficient representation of a combined [[Source code|source]] and [[relocation (computer science)|relocatable]] [[object file|object]] program file with a [[symbol table]] on [[punched card]]s which was introduced in 1958 with the SCAT assembler<ref name="IBM_1961_SOS"/> on the [[SHARE Operating System]] (SOS) for the [[IBM 709]].<ref name="Jacob_2008"/><ref name="Jones_2018"/> A program in this format was called a ''SQUOZE [[card deck (computing)|deck]]''.<ref name="Boehm-Steel_1958"/><ref name="Boehm-Steel_1959"/><ref name="Shell_1959"/> It was also used on later machines including the [[IBM 7090]] and [[IBM 7094|7094]].<!-- any more? -->
'''SQUOZE''' (abbreviated as '''SQZ''') is a memory-efficient representation of a combined [[Source code|source]] and [[relocation (computer science)|relocatable]] [[object file|object]] program file with a [[symbol table]] on [[punched card]]s which was introduced in 1958 with the [[SCAT assembler]]<ref name="IBM_1961_SOS"/><ref name="Salomon_1993"/> on the [[SHARE Operating System]] (SOS) for the [[IBM 709]].<ref name="Jacob_2008"/><ref name="Jones_2018"/> A program in this format was called a ''SQUOZE [[card deck (computing)|deck]]''.<ref name="Boehm-Steel_1958"/><ref name="Boehm-Steel_1959"/><ref name="Shell_1959"/> It was also used on later machines including the [[IBM 7090]] and [[IBM 7094|7094]].<!-- any more? -->


== Encoding ==
In the ''SQUOZE encoding'', identifiers in the symbol table were represented in a 50-character [[alphabet (formal languages)|alphabet]], allowing a 36-bit [[machine word]] to represent six [[alphanumeric]] characters plus two flag bits, thus saving two bits per six characters,<ref name="Boehm-Steel_1959"/><ref name="IBM_1961_SOS"/> because the six bits normally allocated for each character could store up to 64 states rather than only the 50 states needed to represent the 50 letters of the alphabet, and 50<sup>6</sup>&nbsp;&lt;&nbsp;2<sup>34</sup>.
In the ''SQUOZE encoding'', identifiers in the symbol table were represented in a 50-character [[alphabet (formal languages)|alphabet]], allowing a 36-bit [[machine word]] to represent six [[alphanumeric]] characters plus two flag bits, thus saving two bits per six characters,<ref name="Boehm-Steel_1959"/><ref name="IBM_1961_SOS"/> because the six bits normally allocated for each character could store up to 64 states rather than only the 50 states needed to represent the 50 letters of the alphabet, and 50<sup>6</sup>&nbsp;&lt;&nbsp;2<sup>34</sup>.


{| class="wikitable"
{| class="wikitable" style="text-align:center;"
|+ '''SQUOZE character codes'''<ref name="IBM_1961_SOS"/>
|+ SQUOZE character codes<ref name="IBM_1961_SOS"/>
!style="background:#ccffff" rowspan=2|Most<br>significant<br>bits
! style="background:#ccffff" colspan="3"|Most<br>significant<br>digits
!colspan="8"|Least significant bits
! colspan="8"|Least significant digits
|-
|-
!
!width="10%"|000
!
!width="10%"|001
! Dec
!width="10%"|010
! +0
!width="10%"|011
! +1
!width="10%"|100
! +2
!width="10%"|101
! +3
!width="10%"|110
! +4
!width="10%"|111
! +5
! +6
! +7
|-
|-
!
!style="background:#ccffff"| 000
!
|style="background:white"| space ||style="background:#eeeeff"| 0 ||style="background:#eeeeff"| 1 ||style="background:#eeeeff"| 2 ||style="background:#eeeeff"| 3 ||style="background:#eeeeff"| 4 ||style="background:#eeeeff"| 5 ||style="background:#eeeeff"| 6
! Oct
! 0
! 1
! 2
! 3
! 4
! 5
! 6
! 7
|-
|-
! [[Decimal|Dec]]
!style="background:#ccffff"| 001
! [[Octal|Oct]]
|style="background:#eeeeff"| 7 ||style="background:#eeeeff"| 8 ||style="background:#eeeeff"| 9 ||style="background:#eeffee"| A ||style="background:#eeffee"| B ||style="background:#eeffee"| C ||style="background:#eeffee"| D ||style="background:#eeffee"| E
! [[Binary number|Bin]]
! width="10%"|000
! width="10%"|001
! width="10%"|010
! width="10%"|011
! width="10%"|100
! width="10%"|101
! width="10%"|110
! width="10%"|111
|-
|-
!style="background:#ccffff"| 010
! style="background:#ccffff"|+0
! style="background:#ccffff"|0
|style="background:#eeffee"| F ||style="background:#eeffee"| G ||style="background:#eeffee"| H ||style="background:#eeffee"| I ||style="background:#eeffee"| J ||style="background:#eeffee"| K ||style="background:#eeffee"| L ||style="background:#eeffee"| M
! style="background:#ccffff"|000
| style="background:white"|space || style="background:#eeeeff"|0 || style="background:#eeeeff"|1 || style="background:#eeeeff"|2 || style="background:#eeeeff"|3 || style="background:#eeeeff"|4 || style="background:#eeeeff"|5 || style="background:#eeeeff"|6
|-
|-
!style="background:#ccffff"| 011
! style="background:#ccffff"|+8
! style="background:#ccffff"|1
|style="background:#eeffee"| N ||style="background:#eeffee"| O ||style="background:#eeffee"| P ||style="background:#eeffee"| Q ||style="background:#eeffee"| R ||style="background:#eeffee"| S ||style="background:#eeffee"| T ||style="background:#eeffee"| U
! style="background:#ccffff"|001
| style="background:#eeeeff"|7 || style="background:#eeeeff"|8 || style="background:#eeeeff"|9 || style="background:#eeffee"|A || style="background:#eeffee"|B || style="background:#eeffee"|C || style="background:#eeffee"|D || style="background:#eeffee"|E
|-
|-
!style="background:#ccffff"| 100
! style="background:#ccffff"|+16
! style="background:#ccffff"|2
|style="background:#eeffee"| V ||style="background:#eeffee"| W ||style="background:#eeffee"| X ||style="background:#eeffee"| Y ||style="background:#eeffee"| Z ||style="background:#ffeeee"| = # ||style="background:#ffeeee"| / % ||style="background:#ffeeee"| ) ⌑
! style="background:#ccffff"|010
| style="background:#eeffee"|F || style="background:#eeffee"|G || style="background:#eeffee"|H || style="background:#eeffee"|I || style="background:#eeffee"|J || style="background:#eeffee"|K || style="background:#eeffee"|L || style="background:#eeffee"|M
|-
|-
!style="background:#ccffff"| 101
! style="background:#ccffff"|+24
! style="background:#ccffff"|3
|style="background:#ffeeee"| + & ||style="background:#ffeeee"| - ||style="background:#ffeeee"| - @ ||style="background:#ffeeee"| + & ||style="background:#ffeeee"| - ||style="background:#ffeeee"| * ||style="background:#ffeeee"| / ||style="background:#ffeeee"| $
! style="background:#ccffff"|011
| style="background:#eeffee"|N || style="background:#eeffee"|O || style="background:#eeffee"|P || style="background:#eeffee"|Q || style="background:#eeffee"|R || style="background:#eeffee"|S || style="background:#eeffee"|T || style="background:#eeffee"|U
|-
|-
!style="background:#ccffff"| 110
! style="background:#ccffff"|+32
! style="background:#ccffff"|4
|style="background:#ffeeee"| , ||style="background:#ffeeee"| . ||style="background:lightgray"| ||style="background:#lightgray"| ||style="background:lightgray"| ||style="background:#lightgray"| ||style="background:lightgray"| ||style="background:#lightgray"|
! style="background:#ccffff"|100
| style="background:#eeffee"|V || style="background:#eeffee"|W || style="background:#eeffee"|X || style="background:#eeffee"|Y || style="background:#eeffee"|Z || style="background:#ffeeee"|= # || style="background:#ffeeee"|/ % || style="background:#ffeeee"|) ⌑
|-
! style="background:#ccffff"|+40
! style="background:#ccffff"|5
! style="background:#ccffff"|101
| style="background:#ffeeee"|+ & || style="background:#ffeeee"|- || style="background:#ffeeee"|- @ || style="background:#ffeeee"|+ & || style="background:#ffeeee"|- || style="background:#ffeeee"|* || style="background:#ffeeee"|/ || style="background:#ffeeee"|$
|-
! style="background:#ccffff"|+48
! style="background:#ccffff"|6
! style="background:#ccffff"|110
| style="background:#ffeeee"|, || style="background:#ffeeee"|. || {{N/A|style=background:lightgray}} || {{N/A|style=background:lightgray}} || {{N/A|style=background:lightgray}} || {{N/A|style=background:lightgray}} || {{N/A|style=background:lightgray}} || {{N/A|style=background:lightgray}}
|}
|}
Using base 50 already saves a single bit every three characters, so it was used in two three-character chunks. The manual<ref name="IBM_1961_SOS"/> has a formula for encoding six characters ABCDEF: <math>(A*50^2 + B*50 + C) * 2^{17} + (D*50^2 + E*50 + F)</math>


For example "SQUOZE", normally 36 bits: <code>35 33 37 31 44 17</code><sub>(base 8)</sub> would be encoded in two 17-bit pieces to fit in the 34 bits as <code>( 0o220231 << 17 ) | 0o175473 == 0o110114575473</code>.
The name SQUOZE was later borrowed for similar schemes used on [[Digital Equipment Corporation|DEC]] machines;<ref name="Jones_2018"/> they had a 40-character alphabet (50 in [[octal]]) and were called [[DEC RADIX-50]], but often nicknamed [[DEC Squoze]] or [[DEC MOD40|MOD40]].<ref name="DEC_1971_PAL-11R"/>

A simpler example of the same logic would be how a three-digit [[Binary-coded decimal|BCD number]] would take up 12 bits, such as 987: <code>9 8 7</code><sub>(base 16)</sub> <code>1001 1000 0111</code><sub>(base 2)</sub>, but any such value could be stored in 10 bits directly, saving two bits, such as 987: <code>3db</code><sub>(base 16)</sub> <code>11 1101 1011</code><sub>(base 2)</sub>.


==Etymology==
==Etymology==
"Squoze" is a facetious past participle of the verb 'to squeeze'.<ref name="Boehm-Steel_1958"/><ref name="Boehm-Steel_1959"/>
"Squoze" is a facetious [[past participle]] of the verb 'to squeeze'.<ref name="Boehm-Steel_1958"/><ref name="Boehm-Steel_1959"/>

The name SQUOZE was later borrowed for similar schemes used on [[Digital Equipment Corporation|DEC]] machines;<ref name="Jones_2018"/> they had a 40-character alphabet (50 in [[octal]]) and were called [[DEC RADIX 50]] and [[DEC MOD40|MOD40]],<ref name="DEC_1971_PAL-11R"/> but sometimes nicknamed [[DEC Squoze]].


==See also==
==See also==
Line 52: Line 101:
* [[Densely packed decimal]] (DPD)
* [[Densely packed decimal]] (DPD)
* [[BCD (character encoding)]]
* [[BCD (character encoding)]]
* [[Base 50 (numeral system)]]
* [[Base conversion]]
* [[Base conversion]]


==References==
==References==
{{Reflist|refs=
{{Reflist|refs=
<ref name="IBM_1961_SOS">{{cite book |title=SOS Reference Manual - SHARE System for the IBM 709 |chapter=Section 02: SCAT Language; Appendix 1: Table of Permissible Characters; Appendix 3: SQUOZE Deck Format - Chapter 8: Dictionary |editor=SHARE 709 System Committee |date=June 1961 |orig-year=1959 |publisher=SOS Group, [[International Business Machines Corporation]] |id=X28-1213. Distribution No. 1–5 |publication-place=New York, USA |pages=02.00.01 – 02.00.11, 12.03.08.01 – 12.03.08.02, 12.01.00.01 |url=http://bitsavers.org/pdf/ibm/share/SOS_Reference_Manual_Jun61.pdf |access-date=2020-06-18 |url-status=live |archive-url=https://web.archive.org/web/20200618175636/http://bitsavers.org/pdf/ibm/share/SOS_Reference_Manual_Jun61.pdf |archive-date=2020-06-18 |quote=[…] Bit Positions Used […] Bit 0 […] Bit 1 […] Bits 2–35 […] Base 50 representation of the symbol with heading character. […] The base 50 representation of a symbol is obtained as follows: […] a. If the symbol has fewer than five characters, it is headed (by blank if it is in an unheaded region). […] b. The symbol with it heading character is left-justified and any unused low-order positions are filled with blanks. […] c. Each character in the symbol is replaced by it base 50 equivalent. […] d. The result is then converted by the following: if the symbol, after each character is repaced by its base 50 equivalent, is ABCDEF, its base 50 representation is (A*50<sup>2</sup>+B*50+C)*2<sup>17</sup>+(D*50<sup>2</sup>+E*50+F). […]}} [https://archive.org/details/bitsavers_ibmshareSO61_18030152][https://archive.org/download/bitsavers_ibmshareSO61_18030152/SOS_Reference_Manual_Jun61.pdf]</ref>
<ref name="Salomon_1993">{{cite book |author-first=David |author-last=Salomon |editor-first=Ian D. |editor-last=Chivers |title=Assemblers and Loaders |date=February 1993 |orig-date=1992 |edition=1 |series=Ellis Horwood Series In Computers And Their Applications |publisher=[[Ellis Horwood Limited]] / [[Simon & Schuster International Group]] |location=California State University, Northridge, California, USA |publication-place=Chicester, West Sussex, UK |isbn=0-13-052564-2 |url=http://www.davidsalomon.name/assem.advertis/asl.pdf |access-date=2008-10-01 |url-status=live |archive-url=https://web.archive.org/web/20200323010358/http://www.davidsalomon.name/assem.advertis/asl.pdf |archive-date=2020-03-23}} (xiv+294+4 pages)</ref>
<ref name="IBM_1961_SOS">{{cite book |title=SOS Reference Manual - SHARE System for the IBM 709 |chapter=Section 02: SCAT Language; Appendix 1: Table of Permissible Characters; Appendix 3: SQUOZE Deck Format - Chapter 8: Dictionary |editor=((SHARE 709 System Committee)) |date=June 1961 |orig-date=1959 |publisher=SOS Group, [[International Business Machines Corporation]] |id=X28-1213. Distribution No. 1–5 |location=New York, USA |pages=02.00.01 – 02.00.11, 12.03.08.01 – 12.03.08.02, 12.01.00.01 |url=http://bitsavers.org/pdf/ibm/share/SOS_Reference_Manual_Jun61.pdf |access-date=2020-06-18 |url-status=live |archive-url=https://web.archive.org/web/20200618175636/http://bitsavers.org/pdf/ibm/share/SOS_Reference_Manual_Jun61.pdf |archive-date=2020-06-18 |quote-pages=12.03.08.01 – 12.03.08.02 |quote=[…] Bit Positions Used […] Bit 0 […] Bit 1 […] Bits 2–35 […] [[Base 50 (numeral system)|Base 50]] representation of the symbol with heading character. […] The base 50 representation of a symbol is obtained as follows: […] a. If the symbol has fewer than five characters, it is headed (by blank if it is in an unheaded region). […] b. The symbol with it[s] heading character is left-justified and any unused low-order positions are filled with blanks. […] c. Each character in the symbol is replaced by it[s] base 50 equivalent. […] d. The result is then converted by the following: if the symbol, after each character is rep[l]aced by its base 50 equivalent, is ABCDEF, its base 50 representation is (A*50<sup>2</sup>+B*50+C)*2<sup>17</sup>+(D*50<sup>2</sup>+E*50+F). […]}} [https://archive.org/details/bitsavers_ibmshareSO61_18030152][https://archive.org/download/bitsavers_ibmshareSO61_18030152/SOS_Reference_Manual_Jun61.pdf]</ref>
<ref name="Jacob_2008">{{cite book |title=Memory Systems: Cache, DRAM, Disk |author-first1=Bruce |author-last1=Jacob |author-first2=Spencer W. |author-last2=Ng |author-first3=David T. |author-last3=Wang |author-first4=Samuel |author-last4=Rodrigez |chapter=Part I Chapter 3.1.3 On-Line Locality Optimizations: Dynamic Compression of Instructions and Data |date=2008 |series=The Morgan Kaufmann Series in Computer Architecture and Design |publisher=[[Morgan Kaufmann Publishers]] / Elsevier |isbn=978-0-12-379751-3 |page=147 |url=https://books.google.com/books?id=SrP3aWed-esC&pg=PA147}} (900 pages)</ref>
<ref name="Jacob_2008">{{cite book |title=Memory Systems: Cache, DRAM, Disk |author-first1=Bruce |author-last1=Jacob |author-first2=Spencer W. |author-last2=Ng |author-first3=David T. |author-last3=Wang |author-first4=Samuel |author-last4=Rodrigez |chapter=Part I Chapter 3.1.3 On-Line Locality Optimizations: Dynamic Compression of Instructions and Data |date=2008 |series=The Morgan Kaufmann Series in Computer Architecture and Design |publisher=[[Morgan Kaufmann Publishers]] / Elsevier |isbn=978-0-12-379751-3 |page=147 |chapter-url=https://books.google.com/books?id=SrP3aWed-esC&pg=PA147}} (900 pages)</ref>
<ref name="Jones_2018">{{cite web |work=Operating Systems, Spring 2018 |title=Lecture 7, Object Codes, Loaders and Linkers - Final steps on the road to machine code |author-first=Douglas W. |author-last=Jones |author-link=Douglas W. Jones |date=2018 |publisher=[[The University of Iowa]], Department of Computer Science |series=Part of the CS:3620 Operating Systems Collection |url=http://homepage.divms.uiowa.edu/~jones/opsys/notes/07.shtml |access-date=2020-06-06 |url-status=live |archive-url=https://web.archive.org/web/20200606164231/http://homepage.divms.uiowa.edu/~jones/opsys/notes/07.shtml |archive-date=2020-06-06}}</ref>
<ref name="Jones_2018">{{cite web |work=Operating Systems, Spring 2018 |title=Lecture 7, Object Codes, Loaders and Linkers - Final steps on the road to machine code |author-first=Douglas W. |author-last=Jones |author-link=Douglas W. Jones |date=2018 |publisher=[[The University of Iowa]], Department of Computer Science |series=Part of the CS:3620 Operating Systems Collection |url=http://homepage.divms.uiowa.edu/~jones/opsys/notes/07.shtml |access-date=2020-06-06 |url-status=live |archive-url=https://web.archive.org/web/20200606164231/http://homepage.divms.uiowa.edu/~jones/opsys/notes/07.shtml |archive-date=2020-06-06}}</ref>
<ref name="Boehm-Steel_1958">{{cite conference |author-first1=Elaine M. |author-last1=Boehm<!-- of IBM --> |author-first2=Thomas B. |author-last2=Steel, Jr.<!-- of SDC --> |title=Machine Implementation of Symbolic Programming - Summary of a Paper to be Presented at the Summer 1958 Meeting of the ACM |conference=ACM '58: Preprints of papers presented at the 13th national meeting of the Association for Computing Machinery |pages=17-1 – 17-3 |date=June 1958 |doi=10.1145/610937.610953 |url=https://dl.acm.org/doi/pdf/10.1145/610937.610953 |access-date=2020-06-06 |url-status=live |archive-url=https://web.archive.org/web/20200606075930/https://dl.acm.org/doi/pdf/10.1145/610937.610953 |archive-date=2020-06-06}} (3 pages)</ref>
<ref name="Boehm-Steel_1958">{{cite conference |author-first1=Elaine M. |author-last1=Boehm<!-- of IBM --> |author-first2=Thomas B. |author-last2=Steel, Jr.<!-- of SDC --> |title=Machine Implementation of Symbolic Programming - Summary of a Paper to be Presented at the Summer 1958 Meeting of the ACM |conference=ACM '58: Preprints of papers presented at the 13th national meeting of the Association for Computing Machinery |pages=17-1 – 17-3 |date=June 1958 |doi=10.1145/610937.610953 |url=https://dl.acm.org/doi/pdf/10.1145/610937.610953 |access-date=2020-06-06 |url-status=live |archive-url=https://web.archive.org/web/20200606075930/https://dl.acm.org/doi/pdf/10.1145/610937.610953 |archive-date=2020-06-06}} (3 pages)</ref>
<ref name="Shell_1959">{{cite journal |author-first1=Donald L. |author-last1=Shell |title=The SHARE 709 System: A Cooperative Effort |journal=[[Journal of the ACM]] |volume=6 |issue=2 |pages=123–127 |date=April 1959 |orig-year=October 1958 |doi=10.1145/320964.320966 |url=https://dl.acm.org/doi/pdf/10.1145/320964.320966 |access-date=2020-06-16 |url-status=live |archive-url=https://web.archive.org/web/20200617052333/https://dl.acm.org/doi/pdf/10.1145/320964.320966 |archive-date=2020-06-16}} (5 pages)</ref>
<ref name="Shell_1959">{{cite journal |author-first1=Donald L. |author-last1=Shell |title=The SHARE 709 System: A Cooperative Effort |journal=[[Journal of the ACM]] |volume=6 |issue=2 |pages=123–127 |date=April 1959 |orig-date=October 1958 |doi=10.1145/320964.320966 |s2cid=16476514 |url=https://dl.acm.org/doi/pdf/10.1145/320964.320966 |access-date=2020-06-16 |url-status=live |archive-url=https://web.archive.org/web/20200617052333/https://dl.acm.org/doi/pdf/10.1145/320964.320966 |archive-date=2020-06-17|doi-access=free }} (5 pages)</ref>
<ref name="Boehm-Steel_1959">{{cite journal |author-first1=Elaine M. |author-last1=Boehm |author-first2=Thomas B. |author-last2=Steel, Jr. |title=The SHARE 709 System: Machine Implementation of Symbolic Programming |journal=[[Journal of the ACM]] |volume=6 |issue=2 |pages=134–140 |date=April 1959 |doi=10.1145/320964.320968 |url=https://dl.acm.org/doi/pdf/10.1145/320964.320968 |access-date=2020-06-04 |url-status=live |archive-url=https://web.archive.org/web/20200604124209/https://dl.acm.org/doi/pdf/10.1145/320964.320968 |archive-date=2020-06-04 |quote=[…] There is an interesting feature related to the encoding of symbols for inclusion in the dictionary. In the usual mode of expression, symbols may be constructed from a set of 50 characters. If encoding were character by character, six bits would be required for the representation of each such character. As a symbol may contain as many as six characters, a total of 36 bits would be required for the representation of each symbol. This might seem convenient, as the length of a 709 word is exactly 36 bits, but a moment's consideration shows that it is unfortunate as it would be desirable to have a bit or two available in the same word as the symbol representation, giving a clue to the nature of the symbol. These flagging bits can be obtained. Let each character possible represent a digit in a number system having a base of fifty. Now six character symbols may be read as natural numbers in a base fifty system. If these numbers are converted to the usual base two system, only 34 bits are required for the maximum number and a gain of two flag bits has been made. This has the incidental feature of decreasing the requisite number of bits for representing the entire code, but conversion time would outweigh the saving by a significant margin were it not for the peculiar length of the 709 word. Here is a clear illustration of the critical effect the precise specifications of the machine concerned hold over the details of an encoding schema. […]}} (7 pages)</ref>
<ref name="Boehm-Steel_1959">{{cite journal |author-first1=Elaine M. |author-last1=Boehm |author-first2=Thomas B. |author-last2=Steel, Jr. |title=The SHARE 709 System: Machine Implementation of Symbolic Programming |journal=[[Journal of the ACM]] |volume=6 |issue=2 |pages=134–140 |date=April 1959 |doi=10.1145/320964.320968 |s2cid=16545134 |url=https://dl.acm.org/doi/pdf/10.1145/320964.320968 |access-date=2020-06-04 |url-status=live |archive-url=https://web.archive.org/web/20200604124209/https://dl.acm.org/doi/pdf/10.1145/320964.320968 |archive-date=2020-06-04 |quote-pages=137–138 |quote=[…] There is an interesting feature related to the encoding of symbols for inclusion in the dictionary. In the usual mode of expression, symbols may be constructed from a set of 50 characters. If encoding were character by character, six bits would be required for the representation of each such character. As a symbol may contain as many as six characters, a total of 36 bits would be required for the representation of each symbol. This might seem convenient, as the length of a 709 word is exactly 36 bits, but a moment's consideration shows that it is unfortunate as it would be desirable to have a bit or two available in the same word as the symbol representation, giving a clue to the nature of the symbol. These flagging bits can be obtained. Let each character possible represent a digit in a number system having a [[Base 50 (numeral system)|base of fifty]]. Now six character symbols may be read as natural numbers in a base fifty system. If these numbers are converted to the usual base two system, only 34 bits are required for the maximum number and a gain of two flag bits has been made. This has the incidental feature of decreasing the requisite number of bits for representing the entire code, but conversion time would outweigh the saving by a significant margin were it not for the peculiar length of the 709 word. Here is a clear illustration of the critical effect the precise specifications of the machine concerned hold over the details of an encoding schema. […]|doi-access=free }} (7 pages)</ref>
<ref name="DEC_1971_PAL-11R">{{cite book |title=PAL-11R Assemler - Programmer's Manual - Program Assembly Language and Relocatable Assembler for the Disk Operating System |date=May 1971 |orig-year=February 1971 |edition=2nd revised printing |chapter=8.10 .RAD50 |id=DEC-11-ASDB-D |publisher=[[Digital Equipment Corporation]] |publication-place=Maynard, Massachusetts, USA |page=8-8 |url=https://archive.org/details/bitsavers_decpdp11do11RAssemblerProgrammersManualMay71_2572677 |access-date=2020-06-18 |quote=[…] [[PDP-11]] systems programs often handle symbols in a specially coded form called [[DEC RADIX 50|RADIX 50]] (this form is sometimes referred to as [[DEC MOD40|MOD40]]). This form allows 3 characters to be packed into 16 bits […]}} [https://ia800602.us.archive.org/9/items/bitsavers_decpdp11do11RAssemblerProgrammersManualMay71_2572677/DEC-11-ASDB-D_PAL-11R_Assembler_Programmers_Manual_May71.pdf]</ref>
<ref name="DEC_1971_PAL-11R">{{cite book |title=PAL-11R Assembler - Programmer's Manual - Program Assembly Language and Relocatable Assembler for the Disk Operating System |date=May 1971 |orig-date=February 1971 |edition=2nd revised printing |chapter=8.10 .RAD50 |id=DEC-11-ASDB-D |publisher=[[Digital Equipment Corporation]] |location=Maynard, Massachusetts, USA |page=8-8 |url=https://archive.org/details/bitsavers_decpdp11do11RAssemblerProgrammersManualMay71_2572677 |access-date=2020-06-18 |quote-page=8-8 |quote=[…] [[PDP-11]] systems programs often handle symbols in a specially coded form called [[DEC RADIX 50|RADIX 50]] (this form is sometimes referred to as [[DEC MOD40|MOD40]]). This form allows 3 characters to be packed into 16 bits […]}} [https://archive.org/details/bitsavers_decpdp11do11RAssemblerProgrammersManualMay71_2572677]</ref>
}}
}}


== Further reading ==
== Further reading ==
* {{cite web |title=Squoze your data |author-first=Al |author-last=Williams <!-- |author-link=Al Williams (author)??? --> |date=2016-11-22 |work=Hackaday |url=https://hackaday.com/2016/11/22/squoze-your-data/ |access-date=2020-06-06 |url-status=live |archive-url=https://web.archive.org/web/20200606101541/https://hackaday.com/2016/11/22/squoze-your-data/ |archive-date=2020-06-06}}
* {{cite web |title=Squoze your data |author-first=Al |author-last=Williams <!-- |author-link=Al Williams (author)??? --> |date=2016-11-22 |work=Hackaday |url=https://hackaday.com/2016/11/22/squoze-your-data/ |access-date=2020-06-06 |url-status=live |archive-url=https://web.archive.org/web/20200606101541/https://hackaday.com/2016/11/22/squoze-your-data/ |archive-date=2020-06-06}}
* {{cite book |author-first1=J. |author-last1=Ehrman |author-first2=J. N. |author-last2=Snyder |publisher=[[University of Illinois]], Graduate College Digital Computer Laboratory |title=The PORTHOS Executive System for the IBM 7094 - User's Manual |chapter=3.3.2.1 SCAT |date=1964-04-15 |url=https://core.ac.uk/download/pdf/4834584.pdf |access-date=2020-06-07 |url-status=live |archive-url=https://web.archive.org/web/20200607163131/https://core.ac.uk/download/pdf/4834584.pdf |archive-date=2020-06-07 |quote=[…] SCAT is a two part assembler which in brief operates as follows: Programs written symbolicaally as one order per card are ingested during the first phase by the "compiler" which scans the program for symbols and outputs a condensed deck of cards (SQUOZE deck) containing tables of these symbols and the program condensed and efficiently coded. During the second phase this SQUOZE deck is ingested by the "modify and load" program which converts the object program to binary machine language which by option can either be loaded ready to run or output on absolute binary cards (23 orders per card) for loading and running at a later time. The "lister" can produce a printed version of the program at either of these stages. Symbolic corrections to a program can be inserted into the second phase along with the SQUOZE deck. […]}}
* {{cite book |author-first1=John Robert |author-last1=Ehrman<!-- 1935-2018 https://web.archive.org/web/20221015170120/https://oac.cdlib.org/findaid/ark:/13030/c8xk8mnr/ https://web.archive.org/web/20221015172646/https://www.forevermissed.com/john-ehrman/about https://www.linkedin.com/in/john-ehrman-8794b26b --> |author-first2=James N. |author-last2=Snyder<!-- 1923-1985 https://web.archive.org/web/20221015172216/https://ws.engr.illinois.edu/sitemanager/getfile.asp?id=559 https://web.archive.org/web/20221015172329/https://cs.illinois.edu/about/awards/undergraduate-scholarships-awards/james-n-snyder-memorial-award --> |publisher=[[University of Illinois]], Graduate College Digital Computer Laboratory |title=The PORTHOS Executive System for the IBM 7094 - User's Manual |chapter=3.3.2.1 SCAT |date=1964-04-15 |pages=<!-- no page numbers --> |url=https://core.ac.uk/download/pdf/4834584.pdf |access-date=2020-06-07 |url-status=live |archive-url=https://web.archive.org/web/20200607163131/https://core.ac.uk/download/pdf/4834584.pdf |archive-date=2020-06-07 |quote-page=<!-- no page numbers --> |quote=[…] SCAT is a two part assembler which in brief operates as follows: Programs written symbolically as one order per card are ingested during the first phase by the "compiler" which scans the program for symbols and outputs a condensed deck of cards (SQUOZE deck) containing tables of these symbols and the program condensed and efficiently coded. During the second phase this SQUOZE deck is ingested by the "modify and load" program which converts the object program to binary machine language which by option can either be loaded ready to run or output on absolute binary cards (23 orders per card) for loading and running at a later time. The "lister" can produce a printed version of the program at either of these stages. Symbolic corrections to a program can be inserted into the second phase along with the SQUOZE deck. […]}} (1 page)


[[Category:Executable file formats]]
[[Category:Executable file formats]]

Latest revision as of 10:22, 28 December 2023

SQUOZE (abbreviated as SQZ) is a memory-efficient representation of a combined source and relocatable object program file with a symbol table on punched cards which was introduced in 1958 with the SCAT assembler[1][2] on the SHARE Operating System (SOS) for the IBM 709.[3][4] A program in this format was called a SQUOZE deck.[5][6][7] It was also used on later machines including the IBM 7090 and 7094.

Encoding[edit]

In the SQUOZE encoding, identifiers in the symbol table were represented in a 50-character alphabet, allowing a 36-bit machine word to represent six alphanumeric characters plus two flag bits, thus saving two bits per six characters,[6][1] because the six bits normally allocated for each character could store up to 64 states rather than only the 50 states needed to represent the 50 letters of the alphabet, and 506 < 234.

SQUOZE character codes[1]
Most
significant
digits
Least significant digits
Dec +0 +1 +2 +3 +4 +5 +6 +7
Oct 0 1 2 3 4 5 6 7
Dec Oct Bin 000 001 010 011 100 101 110 111
+0 0 000 space 0 1 2 3 4 5 6
+8 1 001 7 8 9 A B C D E
+16 2 010 F G H I J K L M
+24 3 011 N O P Q R S T U
+32 4 100 V W X Y Z = # / % ) ⌑
+40 5 101 + & - - @ + & - * / $
+48 6 110 , .

Using base 50 already saves a single bit every three characters, so it was used in two three-character chunks. The manual[1] has a formula for encoding six characters ABCDEF:

For example "SQUOZE", normally 36 bits: 35 33 37 31 44 17(base 8) would be encoded in two 17-bit pieces to fit in the 34 bits as ( 0o220231 << 17 ) | 0o175473 == 0o110114575473.

A simpler example of the same logic would be how a three-digit BCD number would take up 12 bits, such as 987: 9 8 7(base 16) 1001 1000 0111(base 2), but any such value could be stored in 10 bits directly, saving two bits, such as 987: 3db(base 16) 11 1101 1011(base 2).

Etymology[edit]

"Squoze" is a facetious past participle of the verb 'to squeeze'.[5][6]

The name SQUOZE was later borrowed for similar schemes used on DEC machines;[4] they had a 40-character alphabet (50 in octal) and were called DEC RADIX 50 and MOD40,[8] but sometimes nicknamed DEC Squoze.

See also[edit]

References[edit]

  1. ^ a b c d SHARE 709 System Committee, ed. (June 1961) [1959]. "Section 02: SCAT Language; Appendix 1: Table of Permissible Characters; Appendix 3: SQUOZE Deck Format - Chapter 8: Dictionary". SOS Reference Manual - SHARE System for the IBM 709 (PDF). New York, USA: SOS Group, International Business Machines Corporation. pp. 02.00.01 – 02.00.11, 12.03.08.01 – 12.03.08.02, 12.01.00.01. X28-1213. Distribution No. 1–5. Archived (PDF) from the original on 2020-06-18. Retrieved 2020-06-18. pp. 12.03.08.01 – 12.03.08.02: […] Bit Positions Used […] Bit 0 […] Bit 1 […] Bits 2–35 […] Base 50 representation of the symbol with heading character. […] The base 50 representation of a symbol is obtained as follows: […] a. If the symbol has fewer than five characters, it is headed (by blank if it is in an unheaded region). […] b. The symbol with it[s] heading character is left-justified and any unused low-order positions are filled with blanks. […] c. Each character in the symbol is replaced by it[s] base 50 equivalent. […] d. The result is then converted by the following: if the symbol, after each character is rep[l]aced by its base 50 equivalent, is ABCDEF, its base 50 representation is (A*502+B*50+C)*217+(D*502+E*50+F). […] [1][2]
  2. ^ Salomon, David (February 1993) [1992]. Written at California State University, Northridge, California, USA. Chivers, Ian D. (ed.). Assemblers and Loaders (PDF). Ellis Horwood Series In Computers And Their Applications (1 ed.). Chicester, West Sussex, UK: Ellis Horwood Limited / Simon & Schuster International Group. ISBN 0-13-052564-2. Archived (PDF) from the original on 2020-03-23. Retrieved 2008-10-01. (xiv+294+4 pages)
  3. ^ Jacob, Bruce; Ng, Spencer W.; Wang, David T.; Rodrigez, Samuel (2008). "Part I Chapter 3.1.3 On-Line Locality Optimizations: Dynamic Compression of Instructions and Data". Memory Systems: Cache, DRAM, Disk. The Morgan Kaufmann Series in Computer Architecture and Design. Morgan Kaufmann Publishers / Elsevier. p. 147. ISBN 978-0-12-379751-3. (900 pages)
  4. ^ a b Jones, Douglas W. (2018). "Lecture 7, Object Codes, Loaders and Linkers - Final steps on the road to machine code". Operating Systems, Spring 2018. Part of the CS:3620 Operating Systems Collection. The University of Iowa, Department of Computer Science. Archived from the original on 2020-06-06. Retrieved 2020-06-06.
  5. ^ a b Boehm, Elaine M.; Steel, Jr., Thomas B. (June 1958). Machine Implementation of Symbolic Programming - Summary of a Paper to be Presented at the Summer 1958 Meeting of the ACM. ACM '58: Preprints of papers presented at the 13th national meeting of the Association for Computing Machinery. pp. 17-1–17-3. doi:10.1145/610937.610953. Archived from the original on 2020-06-06. Retrieved 2020-06-06. (3 pages)
  6. ^ a b c Boehm, Elaine M.; Steel, Jr., Thomas B. (April 1959). "The SHARE 709 System: Machine Implementation of Symbolic Programming". Journal of the ACM. 6 (2): 134–140. doi:10.1145/320964.320968. S2CID 16545134. Archived from the original on 2020-06-04. Retrieved 2020-06-04. pp. 137–138: […] There is an interesting feature related to the encoding of symbols for inclusion in the dictionary. In the usual mode of expression, symbols may be constructed from a set of 50 characters. If encoding were character by character, six bits would be required for the representation of each such character. As a symbol may contain as many as six characters, a total of 36 bits would be required for the representation of each symbol. This might seem convenient, as the length of a 709 word is exactly 36 bits, but a moment's consideration shows that it is unfortunate as it would be desirable to have a bit or two available in the same word as the symbol representation, giving a clue to the nature of the symbol. These flagging bits can be obtained. Let each character possible represent a digit in a number system having a base of fifty. Now six character symbols may be read as natural numbers in a base fifty system. If these numbers are converted to the usual base two system, only 34 bits are required for the maximum number and a gain of two flag bits has been made. This has the incidental feature of decreasing the requisite number of bits for representing the entire code, but conversion time would outweigh the saving by a significant margin were it not for the peculiar length of the 709 word. Here is a clear illustration of the critical effect the precise specifications of the machine concerned hold over the details of an encoding schema. […] (7 pages)
  7. ^ Shell, Donald L. (April 1959) [October 1958]. "The SHARE 709 System: A Cooperative Effort". Journal of the ACM. 6 (2): 123–127. doi:10.1145/320964.320966. S2CID 16476514. Archived from the original on 2020-06-17. Retrieved 2020-06-16. (5 pages)
  8. ^ "8.10 .RAD50". PAL-11R Assembler - Programmer's Manual - Program Assembly Language and Relocatable Assembler for the Disk Operating System (2nd revised printing ed.). Maynard, Massachusetts, USA: Digital Equipment Corporation. May 1971 [February 1971]. p. 8-8. DEC-11-ASDB-D. Retrieved 2020-06-18. p. 8-8: […] PDP-11 systems programs often handle symbols in a specially coded form called RADIX 50 (this form is sometimes referred to as MOD40). This form allows 3 characters to be packed into 16 bits […] [3]

Further reading[edit]

  • Williams, Al (2016-11-22). "Squoze your data". Hackaday. Archived from the original on 2020-06-06. Retrieved 2020-06-06.
  • Ehrman, John Robert; Snyder, James N. (1964-04-15). "3.3.2.1 SCAT". The PORTHOS Executive System for the IBM 7094 - User's Manual (PDF). University of Illinois, Graduate College Digital Computer Laboratory. Archived (PDF) from the original on 2020-06-07. Retrieved 2020-06-07. […] SCAT is a two part assembler which in brief operates as follows: Programs written symbolically as one order per card are ingested during the first phase by the "compiler" which scans the program for symbols and outputs a condensed deck of cards (SQUOZE deck) containing tables of these symbols and the program condensed and efficiently coded. During the second phase this SQUOZE deck is ingested by the "modify and load" program which converts the object program to binary machine language which by option can either be loaded ready to run or output on absolute binary cards (23 orders per card) for loading and running at a later time. The "lister" can produce a printed version of the program at either of these stages. Symbolic corrections to a program can be inserted into the second phase along with the SQUOZE deck. […] (1 page)