Text compression for transmission and storage

Ong, Ghim Hwee

Thesis-1989-Ong.pdf (9.22 MB)

Text compression for transmission and storage

thesis

posted on 2013-12-09, 15:06 authored by Ghim Hwee Ong

The increasing use of computers for document preparation and publishing coupled with a growth in the general information management facilities available on computers has meant that most documents exist in computer processable form during their lifetime. This has led to a substantial increase in the demand for data storage facilities, which frequently seems to exceed the provision of storage facilities, despite the advances in storage technology. Furthermore, there is growing demand to transmit these textual documents from one use to another, rather than use a printed form for transfer between sites which then needs to be re-entered into a computer at the receiving site. Transmission facilities are, however, limited and large documents can be difficult and expensive to transmit. Problems of storage and transmission capacity can be alleviated by compacting the textual information beforehand, providing that there is no loss of information in this process. Conventional compaction techniques have been designed to compact all forms of data (binary as well as text) and have, predominantly, been based on the byte as the unit of compression. This thesis investigates the alternative of designing a compaction procedure for natural language texts, using the textual word as the unit of compression. Four related alternative techniques are developed and analysed in the thesis. These are designed to be appropriate for different circumstances where either maximum compression or maximum point to point transmission speed is of greatest importance, and where the characteristics of the transmission, or storage, medium may be oriented to a seven or eight bit data unit. The effectiveness of the four techniques is investigated both theoretically and by practical comparison with a widely used conventional alternative. It is shown that for a wide range of textual material the word based techniques yield a greater compression and require substantially less processing time.

History

School

Science

Department

Computer Science

Publisher

Publication date

1989

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy of Loughborough University.

EThOS Persistent ID

uk.bl.ethos.329691

Language

en

Administrator link

https://repository.lboro.ac.uk/account/articles/9405878

Usage metrics

Keywords

untagged Information and Computing Sciences not elsewhere classified

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Text compression for transmission and storage

History

School

Department

Publisher

Publication date

Notes

EThOS Persistent ID

Language

Administrator link

Usage metrics

Categories

Keywords

Licence

Exports