[PATCH, BPF 5/5] BPF: Add 32-bit and pattern


Richard Henderson <rth@...>
 

We can represent a 64-bit AND with unsigned immediate
with a 32-bit AND opcode.

Signed-off-by: Richard Henderson <rth@...>
---
lib/Target/BPF/BPFInstrInfo.td | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/lib/Target/BPF/BPFInstrInfo.td b/lib/Target/BPF/BPFInstrInfo.td
index 33481b9..62c2dd8 100644
--- a/lib/Target/BPF/BPFInstrInfo.td
+++ b/lib/Target/BPF/BPFInstrInfo.td
@@ -273,6 +273,21 @@ let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
} // isMoveImm
}

+let Constraints = "$dst = $srcd", isAsCheapAsAMove = 1, isCommutable = 1 in {
+ def AND_ru
+ : F_COF<4 /* BPF_ALU */, 0x5 /* BPF_AND */, 0 /* BPF_K */,
+ (outs GPR:$dst), (ins GPR:$srcd, i64imm:$imm),
+ "andwi\t$dst, $imm",
+ [(set GPR:$dst, (and GPR:$srcd, i64immZExt32:$imm))]> {
+ bits<4> dst;
+ bits<32> imm;
+ let BPFDst = dst;
+ let BPFSrc = 0;
+ let BPFOff = 0;
+ let BPFImm = imm;
+ }
+}
+
def FI_ri
: InstBPF<(outs GPR:$dst), (ins MEMri:$addr),
"lea\t$dst, $addr",
--
2.5.5


Alexei Starovoitov
 

On Wed, Jun 15, 2016 at 2:37 PM, Richard Henderson via iovisor-dev
<iovisor-dev@...> wrote:
We can represent a 64-bit AND with unsigned immediate
with a 32-bit AND opcode.

Signed-off-by: Richard Henderson <rth@...>
---
lib/Target/BPF/BPFInstrInfo.td | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/lib/Target/BPF/BPFInstrInfo.td b/lib/Target/BPF/BPFInstrInfo.td
index 33481b9..62c2dd8 100644
--- a/lib/Target/BPF/BPFInstrInfo.td
+++ b/lib/Target/BPF/BPFInstrInfo.td
@@ -273,6 +273,21 @@ let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
} // isMoveImm
}

+let Constraints = "$dst = $srcd", isAsCheapAsAMove = 1, isCommutable = 1 in {
+ def AND_ru
+ : F_COF<4 /* BPF_ALU */, 0x5 /* BPF_AND */, 0 /* BPF_K */,
+ (outs GPR:$dst), (ins GPR:$srcd, i64imm:$imm),
+ "andwi\t$dst, $imm",
+ [(set GPR:$dst, (and GPR:$srcd, i64immZExt32:$imm))]> {
nice!
Do you have further optimizations that take advantage of 32-bit
subregisters and zero extension?
Should it be added in more generic way instead of pattern match?


Richard Henderson <rth@...>
 

On 06/15/2016 10:41 PM, Alexei Starovoitov wrote:
Do you have further optimizations that take advantage of 32-bit
subregisters and zero extension?
Should it be added in more generic way instead of pattern match?
This is the last of the operations that can be implemented with just 64-bit
operands.

A full and proper implementation of 32-bit operations takes quite a bit more
effort. I started on that one evening last week before realizing quite how
much, and had to put it aside for now.

More important is probably to get signed division working instead of emitting
an error. I expect it ought to be similar to how Select is expanded, with
multiple blocks:


if (a < 0)
a2 = -a
a3 = phi(a, a2)
if (b < 0)
b2 = -b
b3 = phi(b, b2)
r = a3 / b3
if ((a ^ b) < 0)
r2 = -r
r3 = phi(r, r2)


r~


Alexei Starovoitov
 

On Thu, Jun 16, 2016 at 10:42 AM, Richard Henderson <rth@...> wrote:
On 06/15/2016 10:41 PM, Alexei Starovoitov wrote:
Do you have further optimizations that take advantage of 32-bit
subregisters and zero extension?
Should it be added in more generic way instead of pattern match?
This is the last of the operations that can be implemented with just 64-bit
operands.

A full and proper implementation of 32-bit operations takes quite a bit more
effort. I started on that one evening last week before realizing quite how
much, and had to put it aside for now.
great. pls share whenever it's ready.
For most cases native 32-bit arithmetic should boost performance.
Currently there are too many <<32 >>32 ops generated.

More important is probably to get signed division working instead of emitting
an error. I expect it ought to be similar to how Select is expanded, with
multiple blocks:
interesting idea. we decided not to introduce signed div insn,
since classic bpf doesn't have it and there wasn't a single case
where sdiv couldn't be replaced with udiv in C code,
but such compiler support is certainly nice.
Probably makes sense to add warn_once, so the user
is suggested to tweak the code manually, since these
extra branches not going to help performance.
btw we've been talking about introducing signed/unsigned <, <= ops.
That should clean up llvm side a bit and performance will improve,
but verifier need to work harder, since it pattern matches >
for packet access.
Speaking of verifier... there is a todo item to add register liveness
to improve search pruning.



if (a < 0)
a2 = -a
a3 = phi(a, a2)
if (b < 0)
b2 = -b
b3 = phi(b, b2)
r = a3 / b3
if ((a ^ b) < 0)
r2 = -r
r3 = phi(r, r2)


r~