acm-header
Sign In

Communications of the ACM

Research Highlights

Indistinguishability Obfuscation from Well-Founded Assumptions


light amidst binary code, illustration

Credit: Shutterstock

At least since the initial public proposal of public-key cryptography based on computational hardness conjectures, cryptographers have contemplated the possibility of a "one-way compiler" that translates computer programs into "incomprehensible" but equivalent forms. And yet, the search for such a "one-way compiler" remained elusive for decades. We examine a formalization of this concept with the notion of indistinguishability obfuscation (iO). Roughly speaking, iO requires that the compiled versions of any two equivalent programs (with the same size and running time) be indistinguishable from any efficient adversary. Finally, we show how to construct iO in such a way that we can prove the security of our iO scheme based on well-studied computational hardness conjectures in cryptography.

Back to Top

1. Introduction

Consider the polynomial f1(x, y) ∈ ℤ[x, y] that is computed as follows:

ueq01.gif

Alternatively, contemplate the polynomial f2(x, y) ∈ ℤ[x, y] that is computed via:

ueq02.gif

A calculation shows that f1 and f2 are in fact the same polynomial, computed in two different ways. Indeed, the expressions f1 and f2 above are special cases of arithmetic circuits, which precisely represent "ways to compute a polynomial."

What if we wanted to hide all implementation choices made when creating such an arithmetic circuit for a particular polynomial? An easy way to do that would be to first convert our polynomial into a canonical form and then implement the canonical form as an arithmetic circuit. Indeed, the description of f2 above can be seen as a canonical representation of the polynomial as a sum of monomials with regard to a natural monomial ordering. However, as this example illustrates, canonical forms can be substantially more complex than other implementations of the same polynomial. For polynomials in n variables, the loss in efficiency can be exponential in n. This would often make computing the canonical form—or indeed, even writing it down—infeasible.

* 1.1. A pseudo-canonical form

Given that computing, canonical forms can be infeasible, what is there to do? Here, following,7 we draw an analogy to the notion of pseudo-randomness. When truly random values are not available, we can instead aim to produce values that "look random" by means of a pseudo-random generator. That is, we require that no efficient algorithm can distinguish between truly random values and the output of our pseudo-random generator.

Now, for two arithmetic circuits g1 and g2 that compute the same underlying polynomial, a true canonical form Canonical(g1) would be identical to the canonical form of Canonical(g2). Instead, we would ask that a pseudo-canonical form PseudoCanonical(g1) would simply be indistinguishable from the pseudo-canonical form PseudoCanonical(g2), to all efficient algorithms that were given g1 and g2 as well. Observe that unless there are actual efficiently computable canonical forms for all arithmetic circuits—which we do not believe to be true—it must be that such a PseudoCanonical operator is randomized, and outputs a probability distribution over arithmetic circuits computing the same polynomial.

* 1.2. The computing lens

The classic theory of computation tells us that general computer programs can be converted into equivalent polynomials (albeit over finite fields, which we will focus on implicitly in the sequel). So the pseudo-canonicalization question posed above is equivalent to the pseudo-canonicalization question for general computer programs. Indeed, the question of hiding implementation details within a computer program has a long history, dating at least as far back as the groundbreaking 1976 work of Diffie and Hellman13 introducing the concept of public-key cryptography. Historically, this problem has been called "program obfuscation," albeit it was typically discussed in an ill-defined form. Discussed in these vague terms, it was folklore that truly secure program obfuscation would have revolutionary applications to computing, especially for securing intellectual property. The work of Barak et al.7 gave a formal treatment of this problem and proved the impossibility of strong forms of general-purpose program obfuscation. This work also formalized the pseudo-canonicalization problem discussed above via the notion of indistinguishability obfuscation (iO). Writing now in the language of Boolean circuits, we define the problem below:

Definition 1. (INDISTINGUISHABILITY OBFUSCATOR (IO) FOR CIRCUITS.7) A probabilistic polynomial-time algorithm (iO) is called a secure indistinguishability obfuscator for polynomial-sized circuits if the following holds:

  • Completeness: For every λ ∈ ℕ, every circuit C with input length n, every input x ∈ {0, 1}n, we have that

ueq03.gif

  • Indistinguishability: For every two ensembles {C0, λ}λ∈ℤ+ and {C1, λ}λ∈ℤ+ of polynomial-sized circuits that have the same size, input length, and output length, and are functionally equivalent, that is, ∀λ ∈ ℤ+, C0, l(x) = C1, l(x) for every input x, the distributions iO(1λ, (70,λ) and iO(1λ, C1,λ) are computationally indistinguishable, denoted as:

ueq04.gif

It means or every efficient polynomial-time algorithm D, for every constant c > 0, there exists a constant λ0 ∈ ℤ+, such that for all l > l0, we have:

ueq05.gif

As we discuss in Section 1.4, indeed iO as a formalization of pseudo-canonicalization lived up to the folklore promise of software obfuscation: there was, and still is, a large research community studying novel applications of iO.

In contrast, demonstrating the feasibility of constructing iO proved far more challenging. Often one expects that theory will lag behind the practice, and given the folklore promise of software obfuscation, one might expect that over the years perhaps clever programmers had come up with heuristic approaches to software obfuscation that resisted the attack. The reality is the opposite. Indeed, in 2021 the third annual White Box Cryptography contest was held to evaluate heuristic methods for software obfuscation, and every one of the 97 submitted obfuscations was broken before the contest ended.1

A large body of theoretical work, starting with the pioneering work of Garg et al.14 has attempted to construct iO using mathematical tools. However, before the result18 by the authors of this article, all previous mathematical approaches to constructing iO relied on new, unproven mathematical assumptions, many of which turned out to be false (see Jain et al.18 for a list of prior constructions.)

We would like to build iO whose security rests upon cryptographic hardness assumptions that have stood the test of time, have a long history of study, and are widely believed to be true. The main result of our works18,19 is the construction of an iO scheme from three well-studied assumptions. We discuss this in more detail next.

THEOREM 1. Under the following assumptions18,19,a:

  • the Learning Parity with Noise (LPN) assumption over general prime fieldsp with polynomially many LPN samples and error rate 1/kδ, where k is the dimension of the LPN secret, and δ > 0 is any constant;
  • the existence of a Boolean Pseudo-Random Generator (PRG) in NC0 with stretch n1+t, where n is the length of the PRG seed, and t > 0 is any constant;
  • the Decision Linear (DLIN) assumption on symmetric bilinear groups of prime order.

Indistinguishability obfuscation for all polynomial-size circuits exists.

The three assumptions above (discussed further in Section 1.3) are based on computational problems with a long history of study, rooted in complexity, coding, and number theory. Further, they were introduced for building basic cryptographic primitives (such as public key encryption), and have been used for realizing a variety of cryptographic goals that have nothing to do with iO.

* 1.3. Assumptions in more detail

We now describe each of the assumptions we need in more detail and briefly survey their history.

The DLIN Assumption: The Decisional Linear assumption (DLIN) is stated as follows: For an appropriate l-bit prime p, two groups G and GT are chosen of order p such that there exists an efficiently computable nontrivial symmetric bilinear map e : G × G → GT. A canonical generator g for G is also computed. Following the tradition of cryptography, we describe the groups above using multiplicative notation, even though they are cyclic. The DLIN assumption requires that the following computational indistinguishability holds:

ueq06.gif

This assumption was first introduced in the 2004 work of Boneh et al.10 and instantiated using appropriate elliptic curves. Since then DLIN and assumptions implied by DLIN have seen extensive use in a wide variety of applications throughout cryptography, such as Identity-Based Encryption, Attribute-Based Encryption, Functional Encryption for degree 2 polynomials, Non-Interactive Zero-Knowledge, etc.

The Existence of PRGs in NC0: The assumption of the existence of a Boolean Pseudo-Random Generator PRG in NC0 states that there exists a Boolean function G : {0, 1}n{0, 1}m where m = n1+t for some constant t > 0, and where each output bit computed by G depends on a constant number of input bits, such that the following computational indistinguishability holds:

ueq07.gif

Pseudorandom generators are a fundamental primitive in their own right and have vast applications throughout cryptography. PRGs in NC0 are tightly connected to the fundamental topic of Constraint Satisfaction Problems (CSPs) in complexity theory and were first proposed for cryptographic use by Goldreich16 20 years ago. The complexity theory and cryptography communities have jointly developed a rich body of literature on the cryptanalysis and theory of constant-locality Boolean PRGs.

LPN over large fields: The Learning Parity with Noise LPN assumption over finite fields ℤp is a decoding problem. The standard LPN assumption with respect to subexponential size modulus p, dimension , sample complexity n, and a noise rate r = 1/δ for some δ ∈ (0, 1), states that the following computational indistinguishability holds:

ueq08.gif

Above e ← Dr is a generalized Bernoulli distribution, i.e. e is sampled randomly from ℤp with probability 1/δ and set to be 0 with probability 1 – 1/δ. We consider polynomial sample complexity n(ℓ), and the modulus p is an arbitrary subexponential function in ℓ.

The origins of the LPN assumption date all the way back to the 1950s (e.g. Gilbert15): it has long been known that random linear codes possessed remarkably strong minimum distance properties. However, since then, very little progress has been made inefficiently decoding random linear codes under random errors. The LPN over fields assumption above formalizes this, and was introduced over ℤ2 for cryptographic uses in 1994,9 and formally defined for general finite fields and parameters in 2009,17 under the name "Assumption 2."

While in Ishai et al.17 the assumption was used when the error rate was assumed to be a constant, in fact, polynomially low error (in fact δ = 1/2) has an even longer history in the LPN literature: it was used by Alekhnovitch2 to construct public-key encryption with the field ℤ2, and used to build public-key encryption over ℤp in 2015.5 The exact parameter settings that we describe above, with both general fields and inverse polynomial error rate corresponding to an arbitrarily small constant δ > 0 was explicitly posed by Boyle,11 in the context of building efficient secure two-party and multi-party protocols for arithmetic computations.

Recently, the LPN assumption has led to a wide variety of applications. A comprehensive review of known attacks on LPN over large fields, for the parameter settings we are interested in, was given in Boyle.11,12 For our parameter setting, the running time of all known attacks is Ω(21-δ), for any choice of the constant δ ∈(0, 1) and for any polynomial number of samples n(ℓ).

Finally, we mention that except for the DLIN assumption, the other two assumptions, LPN and PRG in NC0, have search-to-decision reductions.

* 1.4. Applications of iO

The notion of iO occupies an intriguing and influential position in complexity theory and cryptography. Interestingly, if NP ⊆ BPP, then iO exists for the class of all polynomial size circuits, because if NP ⊆ BPP, then it is possible to efficiently compute a canonical form for any function computable by a polynomial-size circuit. This is in contrast to most powerful cryptographic objects, which typically imply the existence of one-way functions. A large body of work has shown that iO plus one-way functions imply a vast array of cryptographic objects, so much so that iO has been conjectured to be a "central hub"24 for cryptography. In addition, an impressive list of fascinating new cryptographic objects is only known under iO or related objects.

We highlight a small subset of these implications (see Jain et al.18 and the references therein): multiparty noninteractive key exchange in the plain model, zero-knowledge succinct noninteractive arguments (ZK-SNARGs) for any NP language, multilinear maps with bounded polynomial multilinear degrees, witness encryption for any NP language, functional encryption supporting all polynomial-sized circuits, and fully homomorphic encryption schemes (FHE) for unbounded-depth polynomial-size circuits.

* 1.5. Open problems

Our workplaces iO on firm foundations with respect to the assumptions is based on, thereby answering the main feasibility question for the primitive (until we resolve the P vs. NP question). However, there are many important open questions that remain to be answered. One of the main questions is whether there is a construction that is very simple to describe and efficient to implement in practice. Our construction is quite complex, relies on a number of steps, and is recursive in nature. As a consequence, the actual asymptotic complexity is some large polynomial. Simplifying and finding more efficient paths to building iO from well-studied assumptions is one of the most important problems that must be solved, toward the vision of using iO as a universal tool.

Back to Top

2. Overview: How to Construct iO?

In the remaining of the article, we describe a very high level overview of the main technical ideas implying iO. For simplicity of exposition, we choose the simplest path to iO that we are aware of. This overview is based on a combination of ideas from Jain et al.,18,19 which, however, would require one additional assumption to the three stated above (See Theorem 1)—namely, the Learning With Errors (LWE)23 assumption. Since the exact technical reasons for needing LWE is not important for this overview, and this assumption is actually unnecessary,19 we do not discuss it in detail here.

* 2.1. Preliminaries

Let's start with introducing some basic notation. Let size(X) indicates the length of the binary description of an object X (e.g., a string, a circuit, or truth table). Throughout, we consider Boolean functions or circuits or algorithms mapping n-bit binary strings to m-bit binary strings, for some n, m ∈ ℤ+. Let time(A, x) denotes the running time of an algorithm (or circuit) A on an input x (in the case of a circuit C, time(C, x) is the same as size(C)). We say that an algorithm or circuit is efficient if its running time is bounded by a (fixed) polynomial in the length of the input, that is, time(C, x) = size(x)c for some positive integer c ∈ ℤ+. When we only care about the existence of a constant and the concrete value is not important, we write O(1) in place of that constant, e.g., time(C, x) = size(x)O(1) (following the big-O notation in complexity theory).

Restating our goal, we want to design an efficient randomized algorithm, called the obfuscator O, that given a Boolean circuit C: {0, 1}n → {0, 1} with size snO(1), referred to as the original circuit, outputs another Boolean circuit Ĉ: {0, 1}n → {0, 1}, called the obfuscated circuit, such that the following three properties hold:

  • Correctness: the obfuscated circuit Ĉ is functionally equivalent to the original circuit C, denoted as ĈC, meaning that for every x ∈ {0, 1}n, Ĉ(x) = C(x). Correctness must hold no matter what random coins the obfuscator O used.
  • Efficiency: the obfuscator is efficient, meaning O runs in time polynomial in the size of the original circuit, namely, time(O, C) = size(C)O(1).
  • Security: the obfuscated circuit Ĉ hides the implementation details in the original circuit C. That is, for every two, equally-sized and functionally-equivalent circuits C0 and C1 (i.e., size(C0) = size(C1) and C0° C1), the obfuscated circuits Ĉ0 and Ĉ1 are computationally hard to distinguish (as defined in Definition 1; see also Jain et al.18).i

Next, we give an informal overview of how to construct iO from well-studied assumptions.

* 2.2. Simplifying the task of iO

Perhaps the simplest starting point is the following: If there is no restriction on the time the obfuscator O can take, then there is an extremely intuitive obfuscator: the obfuscated circuit is the truth table of the original circuit. The truth table TTC of a Boolean circuit C is an array indexed by inputs, where TTC[x] = C(x). It can be computed in time 2n · s if the input length is n and the circuit size is s. Perfect security comes from the fact that for any two functionally equivalent circuits C0C1, their truth tables are identical TTC0 = TTC1 and hence impossible to distinguish.

To put simply, a truth table is a canonical form of all circuits producing it. While outputting the truth table satisfies the correctness and security requirement of iO, the obvious flaw with this is that the obfuscator is far from efficient: The running time of an iO scheme should be sO(1), rather than 2n · s. The input length n of a Boolean circuit can be close to its size s, and hence the time to compute a truth table is exponentially large! This inefficiency is likely inherent, as efficient methods of finding canonical forms of circuits imply the collapse of the polynomial hierarchy, which is widely conjectured to be false.

Therefore, to improve efficiency, we seek canonical forms that fool computationally limited adversaries. Naturally, we start with a more humble goal:

ueq09.gif

Simplification 1: Obfuscation with non-trivial exponential efficiency. iO with non-trivial efficiency was studied in Lin et al.21 Surprisingly, they showed that a very modest improvement on efficiency – captured in the notion of exponential-efficiency iO or xiO for short – is actually enough to construct completely efficient (polynomial time) iO.b

  • xiO is an obfuscator O whose running time is still "trivial" (2n · s)O(1), but outputs an obfuscated circuit Ĉ of "non-trivial" size 2n(1–)·sO(1) for some > 0.

We can think of xiO (as well as iO) as a special kind of encryption method, where the obfuscated circuit Ĉ is a "ciphertext" of the original circuit C, also denoted as spCT(C) = Ĉ, satisfying that

  • the ciphertext spCT(C) hides all information about C, except that it lets anyone with access to it learn the truth table of C. This is unlike normal encryption which reveals no information about the encrypted message.
  • The size of the ciphertext spCT(C) is 2(1–n for some 0 < < 1 So it can be viewed as a (slightly) compressed version of the truth table (that reveals no other information of C to computationally limited adversaries).

The such special encryption scheme is known as functional encryption, which controls precisely which information of the encrypted message is revealed, and hides all other information. This notion is tightly connected with xiO and iO, and, in fact, the implication of xiO to iO goes via the notion of functional encryption.

When viewing spCT(C) as a compressed version of the truth table. It becomes clear why even slight compression is powerful enough to imply iO: The idea is keeping compressing iteratively until the size of the special ciphertext becomes polynomial. The works of Ananth and Jain,3 Bitansky and Vaikuntanathan,8 and Lin et al.21 turn this high-level idea into an actual proof that xiO implies iOc and allows us to focus on constructing xiO, or equivalently the special encryption described above.

Simplification 2: It suffices to obfuscate simple circuits. Unfortunately, despite the efficiency relaxation, it is still unclear how to obfuscate general Boolean circuits, which can be complex. Naturally, we ask:

ueq10.gif

It turns out that it suffices to focus on an extremely simple class of circuits C = NC0, where NC0 is the set of all circuits with constant output locality, meaning every output bit depends on a constant number of input bits. To do so, we will rely on two cryptographic tools, randomized encodings and Pseudo-Random Generators (PRGs).

Randomized encoding in NC0. A Randomized Encoding (RE) scheme consists of two efficient algorithms (RE, Decode). It gives a way to represent a complex circuit C(·), by a much simpler randomized circuit REC(·; ·) := RE(C, ·; ·), such that,

  • Correctness: for every input x, the output px of RE(C, x; r) produced using uniformly random coins r encodes the correct output: In other words, there exists an efficient decoding algorithm Decode such that Decode(p) = C(x).
  • Security: px reveals no information of C beyond the output of x to efficient adversaries.
  • Simplicity: RE is a simple circuit by some measure of simplicity. Classic works6,25 showed that RE can be simply a NC0 circuit, when assuming the existence of PRGs in NC0 like we are.

The correctness and security of the randomized encoding suggest that, instead of directly obfuscating a general circuit C, we can alternatively obfuscate a circuit D that on input x outputs an encoding px, which reveals only C(x). The potential benefit is that D depending on RE and C should be simply NC0 circuit. Hence, it would suffice to construct xiO for simple NC0 circuits!

To make the above idea goes through, there are, however, a few wrinkles to be ironed out. The issue is that the security of randomized encoding only holds if an encoding px is generated using fresh random coins. There are concrete attacks that learn information of C beyond the outputs, when px is generated using non-uniform random coins, or when two encoding px and px' are generated using correlated random coins. If the truth table of the circuit D contains an encoding px for every input x (i.e., TTD[x] = px) the random coins for generating these encodings must be embedded in D, that is,

ueq11.gif

Such a circuit D has size at least 2n. In particular, we cannot hope to "compress" the random coins r1, r2,···, r2n into 2n(1–) space, which is the target size of the obfuscated circuit. To resolve this problem, we will use a Pseudo-Random Generator.

Pseudo-random generator in NC0. Recall that a pseudo-random generator is a Boolean function PRG: {0, 1}n → {0, 1}m that takes as input a uniformly random seed sd ∈ {0, 1}n, and produces a polynomially longer output r ∈ {0, 1}m, where m = n1+r for some constant r > 0, such that, r is indistinguishable to a uniformly random m-bit string to computationally limited adversaries. There are several proposals of pseudo-random generators that can be computed by an NC0 function.

Equipped with a pseudo-random generator in NC0, we can replace uniformly random coins r1, r2,···, r2n with pseudorandom coins expanded from a much shorter seed sd of length roughly 2n/(1) = 2n(1–′) for some ′ ∈(0, 1).d This gives a variant of the circuit D above:

ueq12.gif

where PRG(sd) expands to output r1,···, r2n and rx = PRGx(sd) is the x'th chunk of the output. Thanks to the fact that both PRG and RE are in NC0, so is D¢.

Moving forward, it suffices to devise a way to encrypt C, sd into a ciphertext spCT(C, sd), from which we can expand out the truth table of , while hiding all other information of C and sd.

ueq13.gif

Note that the special encryption only needs to hide (C, sd), instead of the entire description of circuit D′ since RE and PRG are public algorithms. Moreover, the truth table of D′ can be computed from (C, sd) via a function GRE, PRG in NC0.

* 2.3. Special encryption for NC0 mappings

In our works,18, 19 we constructed the needed special encryption for general NC0 mappings G: {0, 1}l → {0, 1}m, where ciphertext spCT(X) reveals the output of G on X while hiding all other information of X, such that size(spCT(X)) ~ size(X) + m1–δ for some δ > 0. In this subsection, we describe the first half of the ideas behind our construction, which connects the special encryption with bilinear pairing groups and a new object called a structured-seed pseudo-random generator. The second half of the ideas are explained in Section 2.5, describing the construction of a structured-seed pseudo-random generator.

Connection with bilinear pairing groups. As a starting point, suppose that the function G is really simple: Simple enough so that it can be computed by a degree-2 polynomial mapping cacm6703_a.gif over the finite field ℤp for some prime modulus p. Namely,

ueq14.gif

Then, the special encryption can be implemented using bilinear pairing groups, as shown in Ananth and Sahai,4 and Lin.20

Bilinear pairing groups. At a high-level, pairing groups allows for computing quadratic polynomials over secret encoded values, and reveal only whether the output is zero or not. More specifically, it consists of cyclic groups G1, G2, and GT with generators g1, g2, and gT, respectively, and all of order p. G1 and G2 are referred to as the source groups and GT the target group. (In some instantiations, the two source groups are the same group, which is called symmetric pairing groups.) They support the following operations:

  • Encode: For every group Gi, one can compute cacm6703_b.gif for x ∈ ℤp. The group element cacm6703_b.gif is viewed as an encoding of x in the group Gi.
  • Group Operation: In each group Gi, one can perform the group operation to get cacm6703_c.gif, corresponding to "homomorphic" addition modulo p in the exponent. Following the tradition of cryptography, we write the group operation multiplicatively.
  • Bilinear Pairing: Given two source group elements cacm6703_d.gif and cacm6703_e.gif, one can efficiently compute a target group element cacm6703_f.gif, using the so-called pairing operation cacm6703_g.gif. This corresponds to "homomorphic" multiplication modulo p in the exponent. However, after multiplication, we obtain an element in the target group that cannot be paired anymore.
  • Zero Testing: Given a group element cacm6703_b.gif, one can always tell if the encoded value is x = 0, by comparing the group element with the identity in Gi. Similarly, one can do equality testing to see if cacm6703_h.gif for any c.

Combining the above abilities gives a "rudimentary" special encryption scheme that supports the evaluation of degree-2 polynomials. A ciphertext of cacm6703_i.gif includes encodings of every element Xl in both source groups, cacm6703_j.gif. Given these, one can "homomorphically" compute a quadratic mapping Q to obtain encoding of the output y = Q(X) in the target group cacm6703_k.gif (without knowing the encoded input X at all); finally, if the output y is a bit string, one can learn y in the clear via zero-testing. In summary,

ueq15.gif

This fulfills the correctness of the special encryption. What about security? The ciphertext must hide all information about X beyond what's revealed by Q(x), for which we resort to the security of pairing groups. For simplicity of this overview, let's gain some security intuition by assuming the strongest hardness assumptions pertaining to the pairing groups, known as the generic group model. Think of encoding as black boxes and the only way of extracting information (of the encoded values) is by performing (a combination of) the above four operations. In this model, given gx for a secret and random x ∈ ℤp, no efficient adversary can learn x. Extending further, if X were random (in ℤp) subject to Q(X) = y, then the encoding would reveal only y and hide all other information of X. The only issue is that X in our example is not random – it is binary. Nevertheless, the works of Ananth and Sahai,4 and Lin20 designed clever ways of randomizing an arbitrary input X, to a random input cacm6703_l.gif subject to the only condition that an appropriate quadratic mapping cacm6703_m.gif reveals the right output cacm6703_n.gif. Hence, security is attained. We refer the reader to Ananth and Sahai,4 and Lin20 for details about how such randomization works.

Challenges beyond Degree 2. Unfortunately, the mapping GRE,PRG we care about can only be computed by polynomials of a degree much larger than 2. It is known that a Boolean function with output locality l can be computed by a multilinear degree l polynomial over ℤp. However, the locality of Boolean PRG we need is at least 5,22 and known randomized encodings have locality at least 4.6

Key idea: The preprocessing model. To overcome this challenge, our first idea is to use preprocessing of inputs to help reduce the degree of computation. Instead of directly computing G(X) in one shot, we separate it into two steps: First, the input is preprocessed = pre(X; r) in a randomized way using fresh random coins r, then the output is computed from the preprocessed input y = Q(X¢). The idea is that the preprocessing can perform complex transformations on the input in order to help the computation later. The only constraint is that the preprocessing should not increase the size of its input too much, that is, size() ~ size(X)+m1–δ, for some δ > 0. As such, it suffices to encrypt the preprocessed input spCT(), from which one can recover the desired output y = G(X), by evaluating a (hopefully) simpler function Q. Unfortunately, because of the restriction on the size of , it is unclear how preprocessing alone can help.

Our second idea is to further relax the preprocessing model to allow the preprocessed input to contain a public part and a secret part = (P, S). Importantly, the public part P should reveal no information about X to computationally limited adversaries. (In contrast, P, S together reveals X completely.) Moreover, we allow the second stage computation Q(P, S) to have an arbitrary constant degree in P and only restricts its degree on S to 2, that is,

ueq16.gif

where aj,k, bk, g are constant degree polynomials over ℤp. It turns out that the techniques alluded to above for special encryption for degree 2 computations can be extended, so that given ciphertext spCT(S) and P in the clear, one can homomorphically compute Q and thereby learn the output G(X).

In Jain et al.18,19 we show how to compute any NC0 Boolean function G in such a preprocessing model, assuming the Learning Parity with Noises assumption over general fields, which completes the puzzle of iO. When applied to specific cryptographic tools, our techniques give interesting new objects. For instance, it converts any PRG in NC0 into what we call structured-seed PRG. Given a preprocessed seed (P, S) = PRG(sd; ), the structured-seed PRG expands out a polynomially longer output r = sPRG(P, S), where the computation has only degree-2 in the private seed S, and the output r is pseudo-random given the public seed P. In the next section, we describe how to do preprocessing in the context of constructing structured-seed PRG. The same ideas can be extended to handle general NC0 functions.

* 2.4. Definition of structured-seed PRG

Definition 2. (SYNTAX OF STRUCTURED-SEED PSEUDORANDOM GENERATORS (sPRG).) Let t > 1. A structured seed Boolean PRG (sPRG) with polynomial stretch t, is defined by the following PPT algorithms:

  • IdSamp(1n, p): The index sampling algorithm takes as input the input length parameter n, and a prime p. It samples a function index I.
  • SdSamp(I) jointly samples two strings, a public seed and a private seed, sd = (P, S) which are both vectors of dimension sd = O(n) over ℤp.
  • Eval(I, sd) computes a string in {0, 1}m, m = nt.

Looking ahead, the prime p, that we choose is set to the order of the bilinear group which has a bit length of nΘ(1).

Definition 3. (SECURITY OF sPRG.) A structured-seed Boolean PRG, sPRG, satisfies security if for any constant r > 0, any function p: ℕ → ℤ that takes as input a number k ∈ ℕ and outputs a kr bit prime r(k), any n ∈ ℕ, with probability 1 – o(1) over IdSamp(1n, p = p(n)) → I it holds that the following distributions are computationally indistinguishable.

ueq17.gif

where I is sampled by IdSamp(1n, p), sd by SdSamp(I), and r sampled randomly from {0, 1}m(n).

Definition 4. (COMPLEXITY AND DEGREE OF sPRG.) Let r > 0, d1, d2 ∈ ℕ be any constant, and p : ℕ → ℤ be any function that maps an integer k into a kr bit prime p(k). An sPRG has degree d1 in public seed P and degree d2 in S over ℤp, denoted as, sPRG ∈ (deg d1, deg d2), if for every I in the support of IdSamp(1n, p = p(n)), there exists efficiently generatable polynomials gI,1, ···, gI,m over ℤp such that:

  • Eval(I, (P, S)) = (gI,1(P, S), ···, gI,m (P, S)), and,
  • The maximum degree of each gI,j over P is d1, and the maximum degree of gI,j over S is d2.

We remark that the above definition generalizes the standard notion of families of PRGs in two aspects: 1) the seed consists of a public part and a private part, jointly sampled and arbitrarily correlated, and 2) the seed may not be uniform. Therefore, we obtain the standard notion as a special case.

Definition 5. (PSEUDO-RANDOM GENERATORS, DEGREE, AND LOCALITY.) A (uniform-seed) Boolean PRG (PRG) is an sPRG with a seed sampling algorithm SdSamp(I) that outputs a public seed P that is an empty string and a uniformly random private seed S ← {0, 1}n. Let d, c ∈ ℕ. The PRG has multilinear degree d if for every n ∈ ℕ and I in the support of IdSamp(1n), we have that Eval(I, sd) can be written as an m(n)-tuple of degree-d polynomials over ℤ in S. It has constant locality c if for every n ∈ ℕ and I in the support of IdSamp(1n), every output bit of Eval(I, sd) depends on at most c bits of S.

In what follows next we will give an overview of our construction of sPRG from the LPN assumption and the existence of PRG in NC0. For ease of exposition, we will actually use Goldreich's PRG candidate16 which is the most well-known conjectured PRG in NC0. See Jain et al.18 for the construction based on any PRG in NC0.

Definition 6. (GOLDREICH'S PRG.) A Goldreich PRG of locality c mapping n bits to m bits is described using a predicate f : {0, 1}c → {0, 1} and a hypergraph H = {Q1, ···, Qm} where each Qi is a randomly chosen ordered subset of [n] of size c. The index I consists of f and H. Further, on input x ∈{0, 1}n, PRG.Eval(I, x) = y, where y = (y1, ···, ym). Here each yi = f(xi1,···xic) where Qi = (i1, ···, ic).

Back to Top

2.5. SPRG overview

We start with a Goldreich PRG PRG = (IdSamp, Eval) with stretch t, associated with a d-local predicate f : {0, 1}d → {0, 1} and a randomly sampled hypergraph H, i.e., I = (f, H). Observe that on any input σ ∈{0, 1}n, y = PRG.Eval(I, s) can be computed by degree d multilinear polynomials over ℤ as f is a predicate evaluated on d bits. Our high-level strategy is to somehow "preprocess" s into two vectors (P, S) of small dimension (preferably O(n), but anything sublinear in m works) such that y ∈ {0, 1}m can be computed in (deg d, deg 2). Thereby this will have the effect of transferring the complexity of computation to the public input. To achieve this, we pre-process s into appropriate public and private seeds (P, S) and leverage the LPN assumption over ℤp (which is the input to sPRG. IdSamp) to show that the seed is hidden.

Our first idea toward this is that we can "encrypt" the seed s using LPN samples over ℤp as follows:

ueq18.gif

where is set to be the dimension for the LPN samples. It is set so that cacm6703_o.gif is a distribution that samples randomly from p with probability r, and 0 otherwise. Finally, r = ℓd.

It follows directly from LPN assumption that (A, b) is pseudorandom and hides s. Furthermore, due to the sparsity of LPN noises, the vector s + e differs from s only at a d fraction of components – thus it is a sparsely erroneous version of the seed. For i ∈ [m], let fi(s) be the locality d, degree d polynomial over ℤ such that yi = fi(s). Then, for i ∈[m] consider the following polynomial:

ueq19.gif

Above we first interpret fi over ℤ into the field ℤp by simply reducing coefficients mod p, and then compute as given. Observe that fi is a degree d polynomial in b and s, therefore its degree over b is d and over cacm6703_p.gif is two. Thus hi is a (deg d, deg 2) polynomial. Observe that cacm6703_q.gif. The main point of this is this is that if we set the polynomial map cacm6703_r.gif by letting each cacm6703_s.gif, and setting the private seed cacm6703_t.gif:

ueq20.gif

The reason G(1) is interesting because e is sparse. With probability, 1−2nΩ(1), it is non-zero at O(nℓd) locations. As a consequence of this, for any given i ∈ [m], fi(s) = fi(s + e) with all but O(ℓd) probability as fi is a d local function depending on d randomly chosen inputs. Since in the hypergraph H of the Goldreich PRG, each set Qi is chosen independently, every output is independently error-prone with probability O(ℓd). Because of this due to Chernoff style concentration bounds, out of m outputs, with probability 1−2nΩ(1) all but T = O(mℓd) outputs are error-prone.

This gives us a nice candidate for sPRG that satisfies almost all properties! The dimension of S and P is O(n) which is sublinear in m, and it can be computed by a degree (deg d, deg 2) polynomial G(1). We would be done if we could somehow force the output to be correct on all the m coordinates. For the rest of the overview, we refer to the indices i ∈[m], such that fi(s) ≠ fi(s + e), as bad indices/outputs.

To correct errors, we further modify the polynomial and include more pre-processed information in the private seeds. We describe a sequence of ideas that lead to the final correction method, starting with two wrong ideas that illustrate the difficulties we will overcome.

  • The first wrong idea is correcting by adding the difference Corr = yy¢ between the correct and erroneous outputs, y = EvalI (s) and y¢ = EvalI (s +e); we refer to Corr as the correction vector. To obtain the correct output, evaluation can compute the polynomial map cacm6703_u.gif. The problem is that Corr must be included in the seed, but it is as long as the output and would destroy expansion. Thus, we have to make use of the fact that Corr is sparse.
  • To fix expansion, the second wrong idea is adding correction only for bad outputs, so that the seed only stores non-zero entries in Corr. Recall that, Corr is sparse with at most T non-zero elements. More precisely, the j'th output can be computed as cacm6703_v.gif if output j is bad and without adding Corrj otherwise. This fixes expansion, but now the evaluation polynomial depends on the location of bad outputs. If these locations are included in public information, this would leak information about the location of LPN noises, and jeopardize security. If on the other hand the locations are included in the private seed, then it is unclear how to maintain the requirement that the polynomial map computes only a degree two polynomial in the private seed.

These two wrong ideas illustrate the tension between the expansion and security of our sPRG. Our construction takes care of both, by compressing the correction vector Corr to be polynomially shorter than the output and stored in the seed, and expanding it back during evaluation in a way that is oblivious of the location of bad output bits. This is possible thanks to the sparsity of the correction vector and the allowed degree two computation on the private seed. We first illustrate our idea with the help of a simple case.

Simple Case 1: Much fewer than cacm6703_w.gif bad outputs. Suppose hypothetically that the number of bad outputs is bounded by z which is much smaller than cacm6703_w.gif. Thus, if we convert Corr into a cacm6703_x.gif matrix,e it has low-rank z. We can then factorize Corr into two matrixes U and V of dimensions cacm6703_y.gif and cacm6703_z.gif, respectively, such that Corr = UV, and compute the correct output as follows: ∀j ∈ [m],

ueq21.gif

where (kj, lj) is the corresponding index of the output bit j, in the cacm6703_x.gif matrix. When cacm6703_aa.gif, the matrices U, V have cacm6703_ab.gif field elements, which are polynomially smaller than m = nt. As such, G(2) is expanding. Moreover, observe that G(2) has only degree 2 in the private seed and is completely oblivious to where the bad outputs are.

While the idea above works for fewer than cacm6703_w.gif bad outputs, it does not work for the case we are in. We have T = Θ(mℓd) bad outputs. Nevertheless, we show that a similar idea works for this case.

T bad outputs. The above method however cannot handle more than cacm6703_w.gif bad outputs, whereas the actual number of bad outputs can be up to T = Ω(m/ℓd), much larger than cacm6703_w.gif since d is an arbitrarily small constant. Consider another hypothetical case where the bad outputs are evenly spread in the following sense: suppose that if we divide the matrix Corr into m/ℓd blocks, each of dimension d/2 × d/2, there are at most r bad outputs in each block where r > 0 is a really small constant (say d/10). In this case, we can "compress" each block of Corr separately using the idea from case 1. More specifically, for every block i ∈[m/δ], we factor it into UiVi, with dimensions d/2 × r and r × d/2, respectively, and correct bad outputs as follows: ∀j ∈ [m],

ueq22.gif

where ij is the block that output j belongs to, and (kj, lj) ∈ [δ/2] × [δ/2] is its index within this block. We observe that G(2) is expanding, since each matrix Ui or Vi has d/2+r field elements, and the total number of elements is cacm6703_ac.gif which is polynomially smaller than m as long as d is positive and m is polynomially related to ℓ. Moreover, G(2) is oblivious to the location of bad outputs just as in case 1.

This completely solves our problem except that we need to ensure that the bad outputs are well spread out in the manner described above. Our main observation here is that this is ensured due to the fact that in a Goldreich's PRG candidate the input dependence hypergraph Q1, ···, Qm is randomly chosen. Therefore, once we fix the location of the non-zero errors inside e (where with high probability O(nℓd) locations are non-zero), in every block of d output bits, each entry j is independently non-zero with probability O(ℓd). Thus, in expectation, each block has a constant number of bad output bits. More so, due to the Chernoff bound, it can be seen that with probability 1−2Ω(1), each has at most r non-zero elements. Thus, our construction can be summarized as follows:

  • Step 1: Assign outputs. We partition the outputs into B buckets, via a mapping fbkt : [m] → [B]. The number of buckets is set to B = m/ℓd and the number of elements in each bucket is set to be d so that they exactly form a partition of m. The mapping fbkt simply divides m by B, and outputs the remainder. Since the error e is chosen to be from the LPN error distribution and the hypergraph H of the PRG is randomly chosen, by a Chernoff-style argument, we can show that in each bucket out of d output bits, at most t of them are bad, except with probability 1−2tΩ(1). We will set t = ℓr for a tiny constant r > 0.
  • Step 2: Compress the buckets. Next, we organize each bucket i into a matrix Mi of dimension d/2 × d/2 and then compute its factorization Mi = UiVi, where Ui, Vi are matrices of dimensions d/2 × t and t × d/2, respectively. To form matrix Mi, we use another mapping find : [m] → [d/2] × [d/2] to assign each output bit j to an index (kj, lj) in the matrix of the bucket ij it is assigned to. This assignment must guarantee that no two output bits in the same bucket (assigned according to fbkt) have the same index. One such way to compute find(j) is to divide j ∈ [m] by B. The remainder is set as fbkt(j), and the quotient is divided further by d/2. The quotient and the remainder from this division are set as the resulting indices (kj, lj). Once we have this, (Mi) k,l is set to Corrj if there is j such that fbkt(j) = i and find(j) = (k, l). Since every matrix Mi has at most t non-zero entries, we can factor them and compute the correct output as: ∀j ∈ [m],

ueq23.gif

  • G(2) is expanding, because the number of field elements in U.'s and Vi's are much smaller than m, namely: 2tℓδ/2 B = O(mℓδ/2+ρ) ≪ m. We set I¢ = (I, A, fbkt, find).
  • Step 3: Zeroize if uneven buckets. Finally, to deal with the low probability event that some bucket contains more than t bad outputs, we introduce a new variable called flag. If this occurs, our sPRG sets P and S as all zero vectors. In this case, the evaluation always outputs 0. This gives us our candidate sPRG. For security, observe that the polynomial map G(2) is independent of the location of LPN noises. With probability 1−2nΩ(1), the evaluation results in output y. Therefore, by the LPN over ℤp assumption, the seed s of PRG is hidden and the security of PRG ensures that the output is pseudorandom when it is not all zero (which occurs with a subexponentially small probability). We now proceed to the formal construction and proof.

This completes the overview of our construction of structured seed PRG, as well as iO; see Jain et al.18, 19 for the formal constructions and security proofs. In conclusion, iO is an extremely powerful cryptographic tool and our works have established its feasibility based on well-studied assumptions.

Back to Top

References

1. WhibOxContest21—CHES 2021 Challenge. 2021. https://contest2021.whibox.io/.

2. Alekhnovich, M. More on average case vs approximation complexity. In 44th FOCS (Oct. 2003), IEEE Computer Society Press, NY, 298–307.

3. Ananth, P., Jain, A. Indistinguishability obfuscation from compact functional encryption. In CRYPTO 2015, Part I. R. Gennaro and M. J. B. Robshaw, eds. Volume 9215 of LNCS (Aug. 2015). Springer, Heidelberg, 308–326.

4. Ananth, P., Sahai, A. Projective arithmetic functional encryption and indistinguishability obfuscation from degree-5 multilinear maps. In EUROCRYPT 2017, Part I. J.-S. Coron and J. B. Nielsen, eds. Volume 10210 of LNCS (Apr./May 2017). Springer, Heidelberg, 152–181.

5. Applebaum, B., Avron, J., Brzuska, C. Arithmetic cryptography: Extended abstract. In ITCS 2015. T. Roughgarden, ed. (Jan. 2015). ACM, NY, 143–151.

6. Applebaum, B., Ishai, Y., Kushilevitz, E. Cryptography in NC0. In FOCS (2004). IEEE Computer Society, NY, 166–175.

7. Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sahai, A., Vadhan, S.P., et al. On the (im)possibility of obfuscating programs. In CRYPTO 2001. J. Kilian, ed. Volume 2139 of LNCS (Aug. 2001). Springer, Heidelberg, 1–18.

8. Bitansky, N., Vaikuntanathan, V. Indistinguishability obfuscation from functional encryption. In 56th FOCS. V. Guruswami, ed. (Oct. 2015), IEEE Computer Society Press, NY, 171–190.

9. Blum, A., Furst, M.L., Kearns, M.J., Lipton, R.J. Cryptographic primitives based on hard learning problems. In CRYPTO'93. D. R. Stinson, ed. Volume 773 of LNCS (Aug. 1994). Springer, Heidelberg, 278–291.

10. Boneh, D., Boyen, X., Shacham, H. Short group signatures. In CRYPTO 2004. M. Franklin, ed. Volume 3152 of LNCS (Aug. 2004). Springer, Heidelberg, 41–55.

11. Boyle, E., Couteau, G., Gilboa, N., Ishai, Y. Compressing vector OLE. In ACM CCS 2018. D. Lie, M. Mannan, M. Backes and X. Wang, eds. (Oct. 2018), ACM Press, NY, 896–912.

12. Boyle, E., Couteau, G., Gilboa, N., Ishai, Y., Kohl, L., et al. Correlated pseudorandom functions from variable-density LPN. In 61st FOCS (Nov. 2020). IEEE Computer Society Press, NY, 1069–1080.

13. Diffie, W., Hellman, M.E. New directions in cryptography. IEEE Trans. Inform. Theory 22, 6 (1976), 644–654.

14. Garg, S., Gentry, C., Halevi, S., Raykova, M., Sahai, A., Waters, B. Candidate indistinguishability obfuscation and functional encryption for all circuits. In 54th FOCS (Oct. 2013). IEEE Computer Society Press, NY, 40–49.

15. Gilbert, E.N. A comparison of signalling alphabets. Bell Syst Tech. J. 31, 3 (1952), 504–522.

16. Goldreich, O. Candidate one-way functions based on expander graphs. Electron. Colloq. Comput Complexity (ECCC) 7, 90 (2000).

17. Ishai, Y., Prabhakaran, M., Sahai, A. Founding cryptography on oblivious transfer—efficiently. In CRYPTO 2008. D. Wagner, ed. Volume 5157 of LNCS (Aug. 2008). Springer, Heidelberg, 572–591.

18. Jain, A., Lin, H., Sahai, A. Indistinguishability obfuscation from well-founded assumptions. In STOC21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21–25, 2021 S. Khuller and V. V. Williams, eds (2021). ACM, NY, 60–73.

19. Jain, A., Lin, H., Sahai, A. Indistinguishability obfuscation from LPN over Fp, DLIN, and PRGs in NC0. In EUROCRYPT 2022, Part I. O. Dunkelman and S. Dziembowski, eds. Volume 13275 of LNCS (May/June 2022). Springer, Heidelberg, 670–699.

20. Lin, H. Indistinguishability obfuscation from SXDH on 5-linear maps and locality-5 PRGs. In CRYPTO 2017, Part I. J. Katz and H. Shacham, eds. Volume 10401 of LNCS (Aug. 2017). Springer, Heidelberg, 599–629.

21. Lin, H., Pass, R., Seth, K., Telang, S. Indistinguishability obfuscation with non-trivial efficiency. In PKC 2016, Part II. C.-M. Cheng, K.-M. Chung, G. Persiano and B.-Y. Yang, eds. Volume 9615 of LNCS (Mar. 2016). Springer, Heidelberg, 447–462.

22. Mossel, E., Shpilka, A., Trevisan, L. On e-biased generators in NC0. In 44th FOCS (Oct. 2003). IEEE Computer Society Press, NY, 136–145.

23. Regev, O. On lattices, learning with errors, random linear codes, and cryptography. In STOC (2005), 84–93.

24. Sahai, A., Waters, B. How to use indistinguishability obfuscation: deniable encryption, and more. In 46th ACM STOC. D. B. Shmoys, ed. (May/June 2014). ACM Press, NY, 475–484.

25. Yao, A.C.-C. How to generate and exchange secrets (extended abstract). In FOCS (1986), 162–167.

Back to Top

Authors

Aayush Jain (aayushja@andrew.cmu.edu), CMU, Pittsburgh, PA, USA.

Huijia Lin (rachel@cs.washington.edu), UW, Seattle, WA, USA.

Amit Sahai (sahai@cs.ucla.edu), UCLA, Los Angeles, CA, USA.

Back to Top

Footnotes

a. For technical reasons, we need to hardness of these assumptions to be such that no polynomial-time adversaries have beyond sub-exponentially small advantage in breaking the hardness of the underlying problems.

b. This is the step where the additional assumption of Learning With Errors (LWE) is used. However, as we showed in Jain et al.19 this assumption is unnecessary for iO.

c. This implication, however, comes with a quantitative weakening in the security.

d. More precisely, the length of sd is (2nsO(1))1/(1+r), since each rx is a sO(1)-bit string instead of a single bit. However, the dominant term is 2n/(1+r) as s is only polynomial in n.

e. Any injective mapping from a vector to a matrix that is efficient to compute and invert will do.

To view the accompanying Technical Perspective, visit doi.acm.org/10.1145/3632568

The original version of this paper was published in the Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (June 2021), 60–73.


© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Request permission to publish from permissions@acm.org

The Digital Library is published by the Association for Computing Machinery. Copyright © 2024 ACM, Inc.


 

No entries found