Contents

bastie/be42

Compression with permutations

idea

Nearly(? or) all compression algoritm see a file as byte-stream with explicite information byte-offset and byte-value. Most of them replacing repeating sequences with references, and yes this works great.

In different this compression algorithm look different and use another view on file, the file structure as compression base. All files can be descripe as repeated part time permutation of values. This permutation ends with first repeated value. Also you can see this value as part who concat two permutations, because one end with this value and another start with this value. At the end it can be a rest.

For example (based on Nibbles instead of Bytes):

01 0A 07 04 0A 03 03 05 0A 07 04 0A 0F
 |           |     |              |
Start        |     |              |
       Repeated    |              |
      also Start   |              |
             Repeated       Repeated
            also Start      also Rest

Properties of Nibble-permutation are btw. max size is 16 (0123456789ABCDEF) but in random data the birthday hint tells us most of that are maybe 5 to 6 elements long. Also the probality of repeated value increases with next value, beginning with 1/15, 1/14, 1/13 ... 1/1. Its like a markov chain.

Note: The algorithm is based on human intelligence, but part of the code was first written by artificial intelligence (vibe coding) and than modified by human because the result is the priority.

compare

enwik8

| compressor | version | parameter | size in bytes | % | time | bytes per second | comment | | ---------- | ------- | --------- | ------------- | ----- | -------- | ---------------- | ------------------ | | ben | 0.42 | | 32.280.526 | 32.28 | 2:58.30 | 181.046 | non optimized code | | ben+xz | | xz -9ekf | 31.300.064 | 31.30 | 3:06.06 | 168.225 | | | bzip2 | 1.0.8 | -9zkf | 29.008.758 | 29.01 | 0:04.75 | 6.107.106 | | | gzip | 479 | -9kf | 36.475.811 | 36.48 | 0:03.52 | 10.362.446 | fast | | xz | 5.8.2 | -9ekf | 24.831.656 | 24.83 | 0:56.00 | 443.422 | | | zopfli | 1.0.3 | --i100 | 34.955.165 | 34.96 | 10:10.79 | 57.229 | | | zpaq | 7.15 | -m5 | 19.625.015 | 19.63 | 4:32.29 | 71.907 | best | | zstd | 1.5.7 | -k19f | 26.944.227 | 26.94 | 0:41.83 | 664.136 | |

Dependencies

No dependency for be42 library.

CLI tool ben needed swift-argument-parser and be42.

License

Apache License Version 2.0

Version

0.42.0

First public version, perhaps the 42nd attempt at a functioning implementation. Better than gzip.

Package Metadata

Repository: bastie/be42

Stars: 1

Forks: 0

Open issues: 0

Default branch: main

Primary language: swift

License: Apache-2.0

Topics: bastie, compression, compression-algorithm, nibble, nibbles, permutation, transformer

README: README.md