In this program, you will implement an algorithm to compress and decompress text art (sometimes called ASCII art) images using a technique called Run-Length Encoding (RLE). RLE is a simple form of data compression that works particularly well for data with lots of consecutive repeated values - exactly what we find in many text art images.
Your program will work with text art that uses only two characters:
spaces and the asterisk character *
. This creates a simple
binary image. Here’s an example of what such an image looks like:
******
** **
* *
* ** ** *
* *
* * * *
* ****** *
** **
******
Run-length encoding works by replacing sequences of the same character with a count and the character. For example, the sequence “AAABBBCCC” would be encoded as “3A3B3C”. This is especially efficient for data with many repeated characters in a row.
You will implement the following functions:
encode_rle(input_file, output_file)
reads an ASCII art
image from input_file
, compresses it using RLE, and writes
the compressed data to output_file
. This function also
measures and returns the compression ratio - the ratio of the encoded
file’s length to the original file’s length.decode_rle(input_file, output_file)
reads a run-length
encoded file from input_file
, decompresses it, and writes
the original ASCII art to output_file
.We will use the following format for RLE encoding: 1. Each sequence of identical characters is represented by a number followed by the character, where the number indicates how many times the character is repeated. 3. Encode each line independently, with newline characters separating lines in the encoded file. Each line in the input should correspond to one line in the encoded output and vice versa.
For example, if a line in the image contains two asterisks, three blank spaces, then four asterisks, like this:
** ****
then the RLE encoding would be:
2*3 4*
Where 2*
means “2 asterisks”, 3
means “3
spaces”, and 4*
means “4 more asterisks”.
The encode_rle
function returns the compression ratio,
which is the ratio of the length of the encoded file to the length of
the original file. For example, a compression ratio of 0.5 would mean
that the encoded file is half the size of the original.
To keep things simple, since we are encoding line by line, do not count newlines in the length of either the input file or the output file - the length of a file should be calculated as the sum of the lengths of its lines (without the newline character).
The tests operate on the encode_rle
and
decode_rle
functions only. However, I have included
optional headers for two recommended helper functions,
encode_line
and decode_line
. You may want to
implement and manually test these functions first; this will make
encode_rle
and decode_rle
simpler to
write.
The encode function needs to find the length of a sequence of repeated characters, while the decoding function needs to read a sequence of pairs of (number, character) from the encoded file. If you need a hint about how to think about this, here’s my recommendation:
Encoding: look at the first character and start a counter at 1. While the next character matches whatever that first character was, keep increasing the counter by 1. When done, the counter stores the number of repeate characters, as well as an index into the string of the beginning of the next sequence to process.
Decoding: to read the number, we can do something similar to the encoding approach, except instead of each character matching the first, we need each character to be a digit. Python strings have an
isdigit()
method (e.g.,'9'.isdigit()
returnsTrue
) that’s useful here. Once you’ve found where the digits stop, you know how much of the string should be converted to an integer.
The first time you run the test file, it will create a folder called
P16_img
, and populate it with a number of test images.
There are six example input images, named P16_img?_raw.txt
,
and their corresponding encoded files are P16_img?_rle.txt
,
where ?
is the image number from 0 through 5. The test
program also writes additional output files that it uses for testing to
this directory.
Implement the following function:
def split_address(addr_line):
""" Split the postal address in address_line into its
component pieces. Return a tuple of strings containing:
(number, street, city, state, zip).
Precondition: the address matches the following format:
"<number> <street>, <city> <state> <zip>"
Example: split_address("516 High St, Bellingham WA 98225")
=> ("516", "High St", "Bellingham", "WA", "98225") """
Download lyrics.txt. Write a program that counts and prints the number of unique lines in the file. Be sure that the text file and your Python program are saved in the same directory.
Implement the following function:
def grep(string, filename):
""" Print all lines of the file filename that contain the given string.
Precondition: the file exists. """
Implement the following function:
def spellcheck(in_filename, out_filename, wordlist):
""" Write a spellchecked version of in_filename to
out_filename. For each word in the input file, write
it as-is to the output file if it is in the wordlist;
otherwise, write it to the output file in ALLCAPS to
indicate that it's not in the wordlist. """