Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General-purpose safe unions #896

Closed
5 of 9 tasks
mihaibudiu opened this issue Nov 11, 2020 · 30 comments
Closed
5 of 9 tasks

General-purpose safe unions #896

mihaibudiu opened this issue Nov 11, 2020 · 30 comments

Comments

@mihaibudiu
Copy link
Contributor

mihaibudiu commented Nov 11, 2020

Personnel

Design

Implementation

Process

@mihaibudiu
Copy link
Contributor Author

I actually started working on the implementation, and I want to propose an alternative syntax:

union U {
   bit<32> x; // No separate tags; field names are tags too
   bit<16> y;
}

U u = U.x(0);  // Constructor that initializes the x field with value 0.
switch (u) {  // switch can be applied directly to any union
    x: { f(u.x); }  // the labels are the field names themselves
    y: { f(u.y); }
}

@jafingerhut : what do you think? This does not handle the case of using an explicit existing enum as tags, but I think that we can accommodate that case easily with a small extension.

What I like about this proposal is that is is very economic, and it can reuse many existing language mechanisms.

@mihaibudiu
Copy link
Contributor Author

In fact, we can make this even simpler: an assignment to U can be written as just u.x = 0. Then the syntax we already have is unchanged.

@jnfoster
Copy link
Contributor

jnfoster commented Nov 18, 2020

One issue is something @jafingerhut had noted earlier: if u can be assigned to in the cases of the switch, the type for u will not be right.

U u; 
u.x = 0;
switch (u) { 
    x: { u.y = 1; 
         f(u.x); // <-- type checks?
        }
    y: { }
}

@mihaibudiu
Copy link
Contributor Author

No, once assigned u should be immutable. Or, if we really want them mutable, we cannot allow changing the "tag", only the fields.

@jnfoster
Copy link
Contributor

So the expression u.x is type checked differently inside vs. outside of a switch?

@mihaibudiu
Copy link
Contributor Author

This is really orthogonal to the syntax. Also, these values cannot be uninitialized if we want to be safe. The most liberal implementation would make the "tag" immutable within a switch statement that switches on the tag, but would allow mutating tags otherwise.

@mihaibudiu
Copy link
Contributor Author

With my proposal the type is actually the same, but you will be forbidden from accessing u.x in the wrong branches of the switch.

@jnfoster
Copy link
Contributor

Oh, I misunderstood.

Let me try to restate to see if I understand:

  • u.x is an l-value that can be assigned to anywhere that u is in scope.
  • But u.x is only available as an expression if within the x case of a switch(u).

Is that right?

@mihaibudiu
Copy link
Contributor Author

With the exception that u.x cannot be assigned in a switch, but u.x.a can (if we decide that unions are not immutable).

@mihaibudiu
Copy link
Contributor Author

Of course, we are talking about a switch (u); you can potentially have nested switches.

@jnfoster
Copy link
Contributor

Right. A bit fiddly, but I get how this would work.

@mihaibudiu
Copy link
Contributor Author

This PR contains a working implementation p4lang/p4c#2615.

@mihaibudiu
Copy link
Contributor Author

Here is an example from the implementation with comments:

#include <core.p4>

struct S {
    bit<8> f;
}

union Safe {
    bit<32> b;
    bit<16> c;
    S f;
}

// Generic unions are supported
union Either<T, U> {
    T t;
    U u;
}

control c(out bool o) {
    apply {
        Either<bit<32>, bit<16>> e;

        Safe s;  // union is uninitialized now. the union maintains an internal tag that says "uninitialized"
        s.b = 4; // We can only assign to whole union fields. This sets the tag of the union.
        // bool b = s.b == 4;   // error: cannot read field except in a switch statement
        // bool b = s.f.f == 0;  // error: cannot access union sub-fields except in a switch statement
        Safe s1;
        s1.b = 2;
        switch (s) {
             // Union tags have the shape "UnionType.field". We could also go with just the field name, but I think this is cleaner.
            Safe.b: { o = (s.b == 0); } // Within a label all values in the s.b field become available.
            Safe.c: {
                // s.c = 2; // error: cannot change the "tag" within a switch statement.
                switch (s1) { // can have nested switch statements
                    Safe.b: { o = (s1.b == (bit<32>)s.c); }
                    // not all labels must exist
                    default: { o = false; } 
                }
            }
            Safe.f: { o = (s.f.f == 1); s.f.f = 2; }
            default: { 
               o = true; 
               s1.c = 2;
               // bool b = s.b == 0; // error: no union fields are available in the 'default' label 
            }
        }

        switch (e) {
            Either.t: { o = e.t == 0; }
            Either.u: { o = e.u == 0; }
        }
    }
}

control ctrl(out bool o);
package top(ctrl _c);

top(c()) main;

@liujed
Copy link
Member

liujed commented Dec 7, 2020

Does the design permit void-valued tags? (Seems like it could be useful.) If so, (1) do unions subsume enums, and (2) how do we assign a void-valued tag?

@mihaibudiu
Copy link
Contributor Author

This is an interesting idea.
These unions would subsume enums.
The only thing I can imagine is to allow an assignment of a don't care to a void field.

@mihaibudiu
Copy link
Contributor Author

However, this feature is unlikely to be valued by developers.

@liujed
Copy link
Member

liujed commented Dec 8, 2020

If we can unify this design with the existing enums, I think we'd end up with a (c)leaner language. Maybe something like the following. This optimistically assumes that we can turn uninitialized reads into compile-time errors. I've also written union declarations as enum, but if it's more natural, we can offer union as a synonym for (unserializable) enum declarations.

#include <core.p4>

struct S {
    bit<8> f;
}

enum Safe {
    B(bit<32>);
    C(bit<16>);
    F(S);
}

// Generic unions are supported
enum Either<T, U> {
    T(T);
    U(U);
}

control c(out bool o) {
    apply {
        Either<bit<32>, bit<16>> e;

        Safe s;  // Here, s is uninitialized. Reads of possibly uninitialized
                 // unions result in a compile-time error.
        s = B(4); // This sets the tag of the union. `Safe.B` would be better,
                  // but this is incompatible with existing syntax for enums.
        bool b1 = s == B(4);  // True
        bool b2 = s == B(5);  // False: s is a B, but 4 != 5
        bool b3 = s == C(4);  // False: s is not a C

        Safe s1;
        s1 = B(2);
        switch (s) {
            B: {
                // Within a label, the union has a `value` field that can be
                // read and modified, but whose type cannot change. Here,
                // `s.value` is a bit<32>, as declared by the `B` label of
                // `Safe`.
                o = (s.value == 0);
            }
            C: {
                switch (s1) { // can have nested switch statements
                    B: { o = (s1.value == (bit<32>)s.value); }
                }
            }
            F: { o = (s.value.f == 1); s.value.f = 2; }
        }

        switch (e) {
            T: { o = e.value == 0; }
            U: { o = e.value == 0; }
        }
    }
}

control ctrl(out bool o);
package top(ctrl _c);

top(c()) main;

@mihaibudiu
Copy link
Contributor Author

The language can be leaner, but it will look less familiar to C users. Moreover, we already have a bunch of code that deals with enums, including in P4Runtime. We would have to generalize that code.
Please note that this is how Rust works - enums are just a special case of unions; you can have in a union fields with no additional data ("void").

@liujed
Copy link
Member

liujed commented Dec 8, 2020

I'd like to highlight that there is no syntax for unsafe operations in the example I've written above, which makes for a cleaner design.

  • Outside of a switch, you can only assign into an enum/union, or compare its label and value. There is no syntax for an unsafe field access like s.b or s.f.
  • Within a switch, you can modify the enum/union's value, but not its label. There is no syntax for an unsafe assignment like Safe.b: { s.c = 0; }, as currently exists in the proposal.

@mihaibudiu
Copy link
Contributor Author

The original proposal should be equally safe. You can only compare labels using a switch, and you cannot change the tag within a switch. You can only access fields that depend on a tag only if the tag is present.

@mihaibudiu
Copy link
Contributor Author

BTW: the proposal is fully implemented, feel free to try it.

@mihaibudiu
Copy link
Contributor Author

unfortunately this implementation is flawed, as pointed out in p4lang/p4c#2615:

The problem is that one can surreptitiously modify the union that is being accessed, e.g.,
switch (u) { Option.some: action(); }, where the action will modify u

@mihaibudiu
Copy link
Contributor Author

The way this is solved in other programming languages is either to make the unions immutable once assigned, or to have the switch perform a "pattern matching" operation copying the safe union fields into a temporary storage area. This removes the requirement for the union type to change in the rest of the block.

@mihaibudiu
Copy link
Contributor Author

mihaibudiu commented Jan 11, 2021

Here is a possible way one could implement "pattern matching"; I am not particularly fond of this syntax, but we can use it as a basis for discussion:

struct S {
    bit<8> f;
}

union Safe {
    bit<32> b;
    bit<16> c;
    S f;
}

union Either<T, U> {
    T t;
    U u;
}

control c(out bool o) {
    apply {
        Either<bit<32>, bit<16>> e;

        Safe s;
        s.b = 4;
        Safe s1;
        s1.b = 2;
        switch (s) {
            b: { o = (b == 0); }
            c: {
                switch (s1) {
                    b: { o = (b == (bit<32>)sc); }
                    default: { o = false; }
                }
            }
            f: { o = (f.f == 1); f.f = 2; s.f = f; }
            default: { o = true; s1.c = 2; }
        }

        switch (e) {
            t: { o = t == 0; }
            u: { o = u == 0; }
        }
    }
}

@mihaibudiu
Copy link
Contributor Author

Note that the switch labels have now become assignment statements that are constrained to have the RHS expression be exactly one of the union fields.

@liujed-intel
Copy link

Possibility for renaming the variable for switch cases:

switch (e) {
  t as myT: { o = myT == 0; }
  u: { o = u == 0; }
}

@mihaibudiu
Copy link
Contributor Author

my <- t

@mihaibudiu
Copy link
Contributor Author

mihaibudiu commented Feb 1, 2021

Current proposal: the only assignments to e (in the previous example) can assign one of its fields.
You can only read a field of e within the label of a switch statement on e.

@mihaibudiu
Copy link
Contributor Author

Here is an example with the latest syntax:

#include <core.p4>

struct S {
    bit<8> f;
}

union Safe {
    bit<32> b;
    bit<16> c;
    S f;
}

union Either<T, U> {
    T t;
    U u;
}

control c(out bool o) {
    apply {
        Either<bit<32>, bit<16>> e;

        Safe s;
        s.b = 4;
        Safe s1;
        s1.b = 2;
        switch (s) {
            s.b as b: { o = (b == 0); }
            s.c as sc: {
                switch (s1) {
                    s1.b as s1b: { o = (s1b == (bit<32>)sc); }
                    default: { o = false; }
                }
            }
            s.f as f: { o = (f.f == 1); f.f = 2; s.f = f; }
            default: { o = true; s1.c = 2; }
        }

        switch (e) {
            e.t as et: { o = o && et == 0; }
            e.u as eu: { o = o && eu == 0; }
        }
    }
}

control ctrl(out bool o);
package top(ctrl _c);

top(c()) main;

@jnfoster
Copy link
Contributor

In the interest of tidying up the set of active issues on the P4 specification repository, I'm marking this as "stalled" and closing it. Of course, we can always re-open it in the future if there is interest in resurrecting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants