πFS

jamwise · 2026-06-10T20:40:24 1781124024

Reminds me of when I tried to use the library of babel as a data compression tool. It led me down a fun rabbit hole and was my first introduction to information theory.

The conclusion being that you basically need the same amount of data to represent the address of your data as the data itself, so it's not really effective at compression, just a fun thought experiment.

The cool part of this in modern times is that LLMs are basically a form of lossy compression that actually achieves the gist of what these tools fail at. Although it is lossy, and requires a massive substrate. This is related to the idea of AI/LLMs being a form of language compression.

quirino · 2026-06-10T22:39:05 1781131145

3Blue1Brown just released a viduo about this Intelligence-Compression connection.

https://youtu.be/l6DKRf-fAAM

jamwise · 2026-06-10T22:43:21 1781131401

The idea was fresh in my mind because I watched this yesterday. Great video, the illustrations and intuition-building of the compressability of information was so good! I'm so grateful for 3Blue1Brown.

janalsncm · 2026-06-10T23:36:05 1781134565

The level of compression is pretty impressive when you think about it. I wrote a comment a while back which is still true (although bytes should be bits, so in that sense it’s still wrong): https://news.ycombinator.com/item?id=39559969

Back of the envelope calculation for storing valid 4-grams (sequences of four words) is around 10 billion x 14 bits per word = 17 gb for all 10 billion. There are LLMs 100x smaller which can write coherent prose.

ithkuil · 2026-06-10T23:33:49 1781134429

You'll find this an interesting watch:

Reinventing Entropy Compression is Intelligence Part 1

3blue1brown https://youtu.be/l6DKRf-fAAM?is=ne73FCJ7ErXhzZ-v

ainch · 2026-06-10T23:06:37 1781132797

In some sense, science is the most extreme form of compression - Newtonian mechanics explains an incredible number of phenomena in a few lines of text.

emptyroads · 2026-06-10T22:01:53 1781128913

Reminds me of nsafs, the National Security Agency Filesystem ("free" because the government pays for it) - https://github.com/freedomtools/nsafs

dekhn · 2026-06-10T22:46:21 1781131581

I once interviewed for a company and the interviewer was telling me how he (a vc) funded a project to generate large streams of random numbers; you would select an index at random, share that private key with somebody, and then the subsequent text could be used as a one-time-pad. NSA would be forced to buffer/save the entire stream, which could be generated at GB/sec, if they wanted to decrypt.

It didn't seem very practical.

helterskelter · 2026-06-10T23:16:19 1781133379

I wonder if we could mess with NSA-style surveillance by having a good chunk of the population streaming lots of random data over the internet. Essentially, Alice piping her /dev/random to Bob's /dev/null over netcat or something. Make a slick looking app that does it 24/7 in the background using excess bandwidth and tell people it sticks it to the NSA.

Spy agencies would not only have to store it all in case it was something valuable, but at some point they may try to crack it because it's indistinguishable from encrypted data and waste resources on it. If enough people did it, total web surveillance could become impractical.

danielmeskin · 2026-06-10T23:31:07 1781134267

I suspect this would have an effect similar to early internet worms that caused significant strain

https://en.wikipedia.org/wiki/Melissa_(computer_virus)

adzm · 2026-06-10T20:30:27 1781123427

It is worth noting that as the length of data increases it becomes extremely unlikely that the index and length of the sequence within pi would actually be smaller than the data.

Aloisius · 2026-06-10T20:38:44 1781123924

That seems easy enough to solve. Simply record the index and length in pi of the index and length in pi.

awesome_dude · 2026-06-10T21:02:18 1781125338

See also: Recursion

cyanydeez · 2026-06-10T22:19:42 1781129982

See also: https://news.ycombinator.com/item?id=48480978#48482218

jastr · 2026-06-10T23:25:46 1781133946

Back in college, I thought I could compress my phone number by telling people its index in pi, but my 7 digit phone number is at an 8 digit index.

I didn’t have the compute to find my 10 digit number with the area code.

hatthew · 2026-06-10T23:04:38 1781132678

TFA addresses this

> Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found.

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

ithkuil · 2026-06-10T23:47:08 1781135228

Why stop at bytes? Let's split it in individual bits and then look up the bits in pi!

But Pi's binary expansion is not very practical for this purpose, since it's 11.0010...

OTOH. e is 10.1011...

Let's stick to fractional digits (the ones right of the binary point) at index 0 we have 1 and at index 1 we have 0.

So, to encode a stream of bytes so that each bit is encoded as the index of that bit in the e, all you need to do is to xor it with 0xFF

mondrian · 2026-06-10T21:21:43 1781126503

The index of your 20 line file is <20TB number>

russfink · 2026-06-10T22:48:20 1781131700

Unless, in turn, you locate the index itself in pi at a much smaller index. And so on...

Find k candidate indices for your data, then locate each of them. If the smallest one is a significantly smaller index space, repeat.

akoboldfrying · 2026-06-10T23:27:40 1781134060

Can't tell if you're in on the joke or not, but for anyone who is genuinely wondering whether this might work: Consider that there are at most 256 different indexes that could be represented by a 1-byte index value, but if you're trying to store 9 bits of data, there are already 512 different possible things it could be that each need to be represented by a different index value, otherwise you won't be able to tell them apart. Those pigeons aren't gonna fit.

12_throw_away · 2026-06-10T20:43:46 1781124226

yes I believe that's the joke

jwpapi · 2026-06-10T21:17:42 1781126262

He’s aware, he just added some curious information.

liamYC · 2026-06-10T22:36:07 1781130967

Point taken about the index potentially being really long. Why would the length be longer than the data? Don’t you need to find the right sequence?

dang · 2026-06-10T21:08:49 1781125729

Related. Others?

πfs – A data-free filesystem - https://news.ycombinator.com/item?id=36357466 - June 2023 (107 comments)

πfs – A data-free filesystem - https://news.ycombinator.com/item?id=28699499 - Sept 2021 (30 comments)

PiFS – The Data-Free Filesystem - https://news.ycombinator.com/item?id=26208704 - Feb 2021 (1 comment)

Πfs: Never worry about data again - https://news.ycombinator.com/item?id=21359338 - Oct 2019 (1 comment)

The π Filesystem for FUSE: Store Your Data in π - https://news.ycombinator.com/item?id=19223032 - Feb 2019 (1 comment)

pifs - Avoid disk space usage by saving your files in the digits of Pi - https://news.ycombinator.com/item?id=18687275 - Dec 2018 (1 comment)

πfs – A data-free filesystem - https://news.ycombinator.com/item?id=13869691 - March 2017 (105 comments)

Πfs: Stores your data in π - https://news.ycombinator.com/item?id=10856108 - Jan 2016 (1 comment)

Πfs: Never worry about data again - https://news.ycombinator.com/item?id=10847693 - Jan 2016 (1 comment)

File system that stores location of file in Pi - https://news.ycombinator.com/item?id=8018818 - July 2014 (98 comments)

100% Compression Using Pi - https://news.ycombinator.com/item?id=6698852 - Nov 2013 (32 comments)

(Reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers)

Levitating · 2026-06-10T21:15:14 1781126114

How are you generating these lists

programjames · 2026-06-10T21:17:31 1781126251

If you click the website's name to the right of the title, it pulls up all the submissions from the same site:

https://news.ycombinator.com/from?site=github.com/philipl

Levitating · 2026-06-10T21:23:29 1781126609

Even then I don't see a direct way to extract a list like this.

ChrisMarshallNY · 2026-06-10T21:36:00 1781127360

I think it's safe to assume that dang has access to tools that we mortals are unable to comprehend, without being driven to madness.

lukan · 2026-06-10T22:02:53 1781128973

For this use case, of finding related threads, I thought he wrote not special tools, but rather uses just

https://hn.algolia.com/

Levitating · 2026-06-10T22:17:18 1781129838

Even using algolia, I don't see a way to generate a list in this exact format.

I think ChrisMarshallNY is right, dang has access to eldritch powers.

jwpapi · 2026-06-10T21:16:34 1781126194

He’s the mod hero from HN

gnaritas99 · 2026-06-10T21:41:53 1781127713

[flagged]

LoganDark · 2026-06-10T21:53:40 1781128420

Citation needed

dang · 2026-06-10T22:20:06 1781130006

See https://news.ycombinator.com/item?id=44861185 and the links back from there.

layer8 · 2026-06-10T23:29:54 1781134194

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

Considering each individual bit separately would be even more performant: you only need the indexes 2 and 33, and there is an efficient mapping of those to the bits in storage.

MisterTea · 2026-06-10T20:26:19 1781123179

Reminds me of: https://www.spronck.net/sloot.html

Further reading: https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System

ndiddy · 2026-06-10T22:48:36 1781131716

I looked into this a bit a while ago, what Sloot did was at least a bit novel. Basically the way his encoding scheme actually worked was that it would store each line of video into a database, encode each video frame as a series of line lookups, and then store that encoded frame into another database. Then each video is a series of frame lookups. When you hear accounts of him being able to demo smooth playback of 16 videos at once on late 90s hardware, this is how he did it. Because each frame is a series of line lookups, splitting the screen horizontally 16 times and playing 16 videos at once is not any more taxing than playing a single video fullscreen. Similarly, he was able to fast-forward and rewind smoothly because each frame is individually decoded, it's not like traditional video compression where you have to calculate differences from each keyframe. Playing at 2x speed was not any more taxing than 1x speed. Of course he never would have been able to store a video file in 8KB or whatever, but this meant that (for example) if you had a whole season of a TV show in your database, the opening and ending credits would only be stored once.

Levitating · 2026-06-10T21:36:29 1781127389

> The SDCS is only possible if keys are allowed to become infinite, or the data store is allowed to become infinite (...) This would, of course, make the idea useless.

But Pi is infinite. And thus this genius contraption will work as long as we have Moore's law on our side :)

beng-nl · 2026-06-10T21:42:39 1781127759

I have very fond memories of reading that book.

giancarlostoro · 2026-06-10T20:35:38 1781123738

Never heard of that one, that's amazing! Love it.

windward · 2026-06-10T22:31:27 1781130687

>One of the properties that π is conjectured to have is that it is normal

conjectured

Glad to see one of my pet points of pedantry come up. No non-constructed irrational number has never been proven to be normal or disjunctive.

oofbey · 2026-06-10T22:40:39 1781131239

That’s a lot of negatives!

mikestew · 2026-06-10T22:49:07 1781131747

One of which probably needs to go away.

windward · 2026-06-10T23:46:00 1781135160

bobim · 2026-06-10T20:23:18 1781122998

This is disturbing to realize that pi then contains all the past and future knowledge, including when I'll pass away.

mike_hock · 2026-06-10T20:32:56 1781123576

So does every other random infinite sequence of bits. The unintuitive part comes from infinity, not pi.

It also doesn't contain all past and future knowledge because it also contains all possible falsehoods about the past and future in a way that's indiscernible from the truth.

Encoding information as an offset into a pseudorandom sequence is no more storage efficient than storing the information directly.

smaudet · 2026-06-10T23:45:16 1781135116

Keyword is conjectured.

Infinities of random sequences exist that can be shown not to contain all data, 0-8 (base 10) is one such random sequence that is trivially proven to never contain 9...

There are no known patterns to pi, but, (I am legitimately curious about this), are there any known sequences e.g. of 1 million 0s and a single other digit within the decimal sequence of pi?

Given how it (pi) looks, I'm of the strong suspicion is that the answer is "no". But of course, proving that requires that some property of the randomness is provable. Which it does feel as if, given there are different infinities, there are also different randomnesses, hence the conjecture is ill-formed and probably incorrect...

sph · 2026-06-10T21:08:09 1781125689

Are you aware this is meant as a joke, right?

LoganDark · 2026-06-10T21:54:25 1781128465

Jokes can be educational too.

nosioptar · 2026-06-10T20:34:06 1781123646

The worst part is that it contains Star Wars 4-6 from an alternate timeline where Disney did a reboot casting Chris Pratt as Han Solo.

(Fun fact: "Chrispratt" is an ancient Californian word that means "Joel McHale didn't want the role.")

Yokohiii · 2026-06-10T21:38:35 1781127515

Around here it just means chrisp ratt.

1attice · 2026-06-10T20:35:51 1781123751

Thank you for this Prattfall

layer8 · 2026-06-10T23:32:12 1781134332

It also contains all past and future fake news, and you don’t know which is which.

arialdomartini · 2026-06-10T22:13:09 1781129589

You will love reading Jorge Borges The Library of Babel.

https://dn760100.eu.archive.org/0/items/TheLibraryOfBabel/ba...

Yokohiii · 2026-06-10T21:40:32 1781127632

The person who starts reading ahead into pi will always gets the freshest numbers.

Perfect crypto!

xp84 · 2026-06-10T21:27:05 1781126825

If it makes you feel better, consider that it also contains all plausible and implausible falsehoods about your demise as well.

skulk · 2026-06-10T20:26:54 1781123214

this statement is equivalent to "pi is a normal number." While most real numbers are normal and pi is suspected to be so, it isn't known.

https://en.wikipedia.org/wiki/Normal_number

cadamsdotcom · 2026-06-10T20:34:19 1781123659

Fear not! It’s probably so deep in pi that you’d pass away listening to someone tell you where!

OkayPhysicist · 2026-06-10T20:43:46 1781124226

So does a calendar, if you you buy them enough years in advance.

nighthawk454 · 2026-06-10T20:46:32 1781124392

And also all the days you don’t, so, by itself not very meaningful. Especially since you can’t tell which one is right in advance. In some sense, so does a calendar

thih9 · 2026-06-10T21:17:37 1781126257

It also contains all possible falsehoods and comes with no way to distinguish what's true from what isn't.

vadansky · 2026-06-10T21:44:42 1781127882

But enough about LLMs

koolala · 2026-06-10T20:42:21 1781124141

It isn't actually proven true.

anthonj · 2026-06-10T20:54:33 1781124873

So does a random number generator

keithnz · 2026-06-10T23:31:41 1781134301

isn't this relying on properties that aren't proven about pi? it needs to be disjunctive or normal, and neither of those are proven

aidenn0 · 2026-06-10T20:59:14 1781125154

I vaguely remember an entry to a compression-benchmark that gamed the benchmark by treating the filename as part of the input to the decompression-algorithm, thus beating the metric that only measured the size of the file.

nyc_pizzadev · 2026-06-10T22:21:12 1781130072

Just a heads up, this is writing 16 bits for every 8 bits of input:

https://github.com/philipl/pifs/blob/fded8bf7b8f4fc64233e37b...

partsch · 2026-06-10T20:14:57 1781122497

Finally, someone is doing something about the rising prices of storage!

bilsbie · 2026-06-10T22:41:45 1781131305

I’d guess even the index in pi for my phone number would be more digits than the phone number.

So not really a compression scheme.

Lalabadie · 2026-06-10T20:04:23 1781121863

Love it! This feels very much in the spirit of Tom7's Harder Drive [1]

[1] https://www.youtube.com/watch?v=JcJSW7Rprio

thangalin · 2026-06-10T20:37:38 1781123858

https://cs.stackexchange.com/a/53737/1704

> Matches that occur early enough in π to attain significant compression will not be varied. That is, it isn't possible to use π to compress interesting, real-world data because real-word strings are unlikely to arise early.

Levitating · 2026-06-10T21:21:59 1781126519

> Since the file is 128 bits long, one would expect this place to be around the 2*128th bit.

> Calculate the number of bits to encode that value using log2(938933556), which is ~29.8

Can someone explain these two statements to me?

csunoser · 2026-06-10T21:43:47 1781127827

for > Calculate the number of bits to encode that value using log2(938933556), which is ~29.8

This is roughly same as saying: "If you rewrite 938933556 as a binary number / usize, it will need 30 bits".

Sanity check: 1101111111|0110111111|0100110100 (| delimits every 10 bigits).

> Since the file is 128 bits long, one would expect this place to be around the 2*128th bit.

This statement is a bit more subtle. As a first ord approximation, we can see pi sort of as a RNG.

If we write pi (ignore the decimal point), as a binary number, we get: 11011001111111011110010101011110001010101111101101110001001100001...

You can... kind of squint and pretend this is a random sequence of 1s and 0s.

Now, if you had a file that is 128 bits (so lots of intermingling 0s and 1s), and each next digit of pi is effectively a coin flip. Pretend 1s are heads, and 0s are tails. You basically have to get the exact 128 consecutive coin flips of the same result as your file to get your file back.

Imagine now, PI not as a number, but a sequence of experiments of flipping the coin 128 times.

  - (11011..01000)(10000...00100)....
  - ^attempt 1     ^attempt 2

You have to try, on expectation, quite a few times to win this game! Now, you could easily get lucky for sure. But on average, your chance of winning per attempt is roughly 0.5^128! So, how many times do you have to try to win this game? Something like 2^128 times - and you have to consider that each attempt uses 128 bits as well. So more like 2^135. But you don't have to start fresh in each attempt, you can see it as like this:

  - 11011................00100...
  - (       128 flips     )
  -  (  another 128        )
  -   (                     )
  -     ... so on and so on

That's where the 2^128 number came from.

Levitating · 2026-06-10T22:13:35 1781129615

Thank you!!!

csunoser · 2026-06-10T22:16:45 1781129805

np :-)

giancarlostoro · 2026-06-10T20:26:05 1781123165

I... I can't tell if this is an elaborate troll or pure genius. I love it.

pokstad · 2026-06-10T20:48:16 1781124496

Both.

z3t4 · 2026-06-10T22:26:07 1781130367

Someone should make a service "where in the pi am I" then you could use it as a short link. Then there will be hardware accelerated pi chips. All computers will come with pi preinstalled.

tptacek · 2026-06-10T20:03:46 1781121826

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

koolala · 2026-06-10T20:37:05 1781123825

Short Storage Number - SSN

0x123456789ABCDEF0

use this number as a shorter nibble storage alternative...

chris_sn · 2026-06-10T22:28:04 1781130484

Funnily enough I’m reading Service Model and just got to the bit in the Library Archive, which has a very similar vibe to this project. Love it

yassi_dev · 2026-06-10T22:22:51 1781130171

I built something with a similar spirit for Pi day: https://pi.yassi.dev/

woah · 2026-06-10T22:27:03 1781130423

I've simplified it and made it more flexible

3._1_415926535897932384626433832795_0_288419716939

actusual · 2026-06-10T21:37:40 1781127460

This is why I got pi tattooed. It's a tattoo of all tattoos.

dekhn · 2026-06-10T22:58:46 1781132326

yes, but can you get a tattoo of all tattoos that do not contain themself?

hnlmorg · 2026-06-10T20:30:21 1781123421

This is probably a dumb question, but do we actually know that pi has an infinite number of decimal digits or are we assuming that it does because we haven’t developed a sufficiently powerful computer to calculate the last digit of pi?

I’m guessing this is something that could be formally proven?

hasteg · 2026-06-10T20:32:08 1781123528

Here is a one page proof that pi is irrational - https://heuklyd.github.io/papers/pdf/Niven-1947.pdf

simonreiff · 2026-06-10T22:33:53 1781130833

For a superb explanation of Niven's proof (which leaves more questions than answers when you first read it), I like Michael Penn's video: https://youtu.be/dFKbVTHK4tU?is=d2DbV5HDP0IpP9tA ....notwithstanding the length of the proof, this is quite a hard problem.

partsch · 2026-06-10T20:37:39 1781123859

Thanks for the PDF. I feel like I understand even less now than I did before.

hnlmorg · 2026-06-10T20:38:33 1781123913

Thanks for sharing. That’s a nice read. I’m glad I asked :)

stackghost · 2026-06-10T20:36:08 1781123768

It's amazing how inscrutable calculus can be when you return to reading it after not doing so for a period of time, much like lisp or forth. I don't think I've actually done an integral or taken a derivative in years. I can see the elegance of that proof but I'll be damned if I can actually follow the mathematics from one step to the next.

mike_hock · 2026-06-10T20:37:58 1781123878

We definitely know that Pi is irrational, we just don't know if it's normal (i.e. if the PiFS joke even works).

pixel_popping · 2026-06-10T20:31:40 1781123500

Well, that should get GPT-5.5 extended thinking going for a few weeks.

adzm · 2026-06-10T20:27:26 1781123246

I'm intrigued that π was capitalized to Π presumably automatically in the HN headline.

cbm-vic-20 · 2026-06-10T20:36:05 1781123765

    jshell> "πfs".toUpperCase()
    $1 ==> "ΠFS"

    Welcome to Node.js v26.3.0.
    Type ".help" for more information.
    > "πfs".toUpperCase()
    'ΠFS'

    Python 3.14.5 (main, May 10 2026, 10:21:34) [Clang 21.0.0 (clang-2100.0.123.102)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> "πfs".upper()
    'ΠFS'

    echo 'πfs' | awk '{print toupper($0)}'
    ΠFS

noman-land · 2026-06-10T21:13:13 1781125993

Why does your Python terminal report May 10th? Today is June 10th.

atvrager · 2026-06-10T21:28:42 1781126922

It's the build date of their Python binary

Yokohiii · 2026-06-10T21:35:17 1781127317

He prepared the comment a month ago.

danlitt · 2026-06-10T21:27:59 1781126879

Probably daylight savings

amluto · 2026-06-10T21:23:06 1781126586

> Why is this thing so slow? It took me five minutes to store a 400 line text file!

> Well, this is just an initial prototype, and don't worry, there's always Moore's law!

Seriously? They're only storing individual bytes in pi:

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

So the whole transformation should be trivially reducible to a 256-element lookup table from source byte to location in pi and a similar table used to convert back the other way. Maybe a fancy formula could be used for the (never actually encountered) case in which a byte is encoded by one of the infinite available noncanonical encodings.

glitchc · 2026-06-10T20:32:46 1781123566

At what point is the metadata larger than the actual file?

wavemode · 2026-06-10T20:57:15 1781125035

Part of the joke is that, in this implementation, the metadata is guaranteed to be larger than the file:

> Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found.

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

mike_hock · 2026-06-10T20:40:51 1781124051

Half the time it should be larger, right?

charles_f · 2026-06-10T20:49:58 1781124598

Posted many times before: https://news.ycombinator.com/from?site=github.com/philipl

My favourite issue being about GDPR compliance https://github.com/philipl/pifs/issues/56

anon291 · 2026-06-10T22:04:01 1781129041

It is actually not proven that the decimal expansion (or any rational base expansion) of pi contains all possible sequences of numbers. It sounds like it intuitively would be since the expansion is infinite, but it is not necessarily true. For example, the number 0.101001... (i.e., decimal formed by concatenating N zeros and then 1 for all N 0 to infinity) is infinite, never-ending, and irrational but does not contain every sequence of numbers.

j3th9n · 2026-06-10T21:19:24 1781126364

Why would anyone need πfs, since you can already build such a system yourself quite trivially on Linux.

Levitating · 2026-06-10T20:04:42 1781121882

absolutely genius

leephillips · 2026-06-10T20:34:29 1781123669

What a brilliant idea! Of course, of course, it’s not in the repository so I can’t apt-get install it. Debian...always so far behind.

mzelling · 2026-06-10T21:38:03 1781127483

Looked at the repo but it says NOTHING about what value this project offers.

I mean, I get that it's "fun" to store information within the digits of pi. But is this just amusement, or is there a value prop for production use here?

(Speaking as a math major, by the way. I'm sympathetic to the cause.)

windward · 2026-06-10T22:34:39 1781130879

It's a(n IMO weak) argument raised when discussing illegal files/numbers.

This project makes clear the counter-argument: the input that gets you the file out of π is a badly compressed version of the file.

tcoff91 · 2026-06-10T21:39:20 1781127560

I think it's pretty clearly for amusement. And it would kind of spoil the amusement if it were to explicitly mention that it's a joke...

mherkender · 2026-06-10T21:39:08 1781127548

It's a joke.

spchampion2 · 2026-06-10T21:37:20 1781127440

This is interesting, but I feel like my use cases would better align with a different irrational number. Could I get an option to do this with e instead? /s