-
Notifications
You must be signed in to change notification settings - Fork 15
ARC-V RHX-100 upstream patch series #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
The Synopsys RMX-100 has a short, three-stage, in-order execution pipeline. The option -mmpy-option was added to control which version of the MPY unit the core has and what the latency of multiply instructions should be similar to ARCv2 cores (see gcc/config/arc/arc.opt:60). Authored-by: Artemiy Volkov <artemiy@synopsys.com> Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>
The Synopsys RHX-100 has a 10-stage, dual-issue, in-order execution pipeline. It has support for instruction fusion, which will be addressed by subsequent patches.
MichielDerhaeg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to split up the commits in something sensible. Didn't check whether they can be built individually though.
gcc/config/riscv/riscv.cc
Outdated
| return store_data_bypass_p (out_insn, in_insn); | ||
| } | ||
|
|
||
| /* Implement one boolean function for each of the values of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the declarations in the header be moved to arcv.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I originally placed the declarations in arcv.h, but looking at the existing code structure, it may be a better fit to keep them in riscv-protos.h
EDIT: See d479345
| case SIGN_EXTRACT: | ||
| if (TARGET_XTHEADBB && outer_code == SET | ||
| if ((TARGET_ARCV_RHX100 || TARGET_XTHEADBB) | ||
| && outer_code == SET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this was added for the bit-extract fusion.
| (define_insn "*zero_extract_fused" | ||
| [(set (match_operand:SI 0 "register_operand" "=r") | ||
| (zero_extract:SI (match_operand:SI 1 "register_operand" "r") | ||
| (match_operand 2 "const_int_operand") | ||
| (match_operand 3 "const_int_operand")))] | ||
| "TARGET_ARCV_RHX100 && !TARGET_64BIT | ||
| && (INTVAL (operands[2]) > 1 || !TARGET_ZBS)" | ||
| { | ||
| int amount = INTVAL (operands[2]); | ||
| int end = INTVAL (operands[3]) + amount; | ||
| operands[2] = GEN_INT (BITS_PER_WORD - end); | ||
| operands[3] = GEN_INT (BITS_PER_WORD - amount); | ||
| return "slli\t%0,%1,%2\n\tsrli\t%0,%0,%3"; | ||
| } | ||
| [(set_attr "type" "alu_fused")] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this fusion was never implemented as a define_insn_and_split. Might not be trivial to force these exact instructions after a split.
gcc/config/riscv/riscv.md
Outdated
| (define_insn "madd_split_fused" | ||
| [(set (match_operand:SI 0 "register_operand" "=&r,r") | ||
| (plus:SI | ||
| (mult:SI (match_operand:SI 1 "register_operand" "r,r") | ||
| (match_operand:SI 2 "register_operand" "r,r")) | ||
| (match_operand:SI 3 "register_operand" "r,?0"))) | ||
| (clobber (match_scratch:SI 4 "=&r,&r"))] | ||
| "TARGET_ARCV_RHX100 | ||
| && !TARGET_64BIT && (TARGET_ZMMUL || TARGET_MUL)" | ||
| { | ||
| if (REGNO (operands[0]) == REGNO (operands[3])) | ||
| { | ||
| return "mul\t%4,%1,%2\n\tadd\t%4,%3,%4\n\tmv\t%0,%4"; | ||
| } | ||
| else | ||
| { | ||
| return "mul\t%0,%1,%2\n\tadd\t%0,%0,%3"; | ||
| } | ||
| } | ||
| [(set_attr "type" "imul_fused")] | ||
| ) | ||
|
|
||
| (define_insn "madd_split_fused_extended" | ||
| [(set (match_operand:DI 0 "register_operand" "=&r,r") | ||
| (sign_extend:DI | ||
| (plus:SI | ||
| (mult:SI (match_operand:SI 1 "register_operand" "r,r") | ||
| (match_operand:SI 2 "register_operand" "r,r")) | ||
| (match_operand:SI 3 "register_operand" "r,?0")))) | ||
| (clobber (match_scratch:SI 4 "=&r,&r"))] | ||
| "TARGET_ARCV_RHX100 | ||
| && (TARGET_ZMMUL || TARGET_MUL)" | ||
| { | ||
| if (REGNO (operands[0]) == REGNO (operands[3])) | ||
| { | ||
| return "mulw\t%4,%1,%2\n\taddw\t%4,%3,%4\n\tmv\t%0,%4"; | ||
| } | ||
| else | ||
| { | ||
| return "mulw\t%0,%1,%2\n\taddw\t%0,%0,%3"; | ||
| } | ||
| } | ||
| [(set_attr "type" "imul_fused")] | ||
| ) | ||
|
|
||
| (define_insn "*zero_extract_fused" | ||
| [(set (match_operand:SI 0 "register_operand" "=r") | ||
| (zero_extract:SI (match_operand:SI 1 "register_operand" "r") | ||
| (match_operand 2 "const_int_operand") | ||
| (match_operand 3 "const_int_operand")))] | ||
| "TARGET_ARCV_RHX100 && !TARGET_64BIT | ||
| && (INTVAL (operands[2]) > 1 || !TARGET_ZBS)" | ||
| { | ||
| int amount = INTVAL (operands[2]); | ||
| int end = INTVAL (operands[3]) + amount; | ||
| operands[2] = GEN_INT (BITS_PER_WORD - end); | ||
| operands[3] = GEN_INT (BITS_PER_WORD - amount); | ||
| return "slli\t%0,%1,%2\n\tsrli\t%0,%0,%3"; | ||
| } | ||
| [(set_attr "type" "alu_fused")] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these patterns used to be a defined_insn_and_split. In case we can't get rid of them, let's try to go back to define_insn_and_split (check the downstream commit history) and see what the performance looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we revert madd_split_fused1 back to a define_insn_and_split we get 0.898% improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran embench before and after the revert to define_insn_and_split.
There were some regressions and improvements but the numbers were mostly stable except for the matmul benchmark.
Biggest regression was edn with a 2% drop.
Largest improvement was matmult with a 12% improvements. So I think we can keep this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
madd_split_fused_extended also still needs to be converted to a define_insn_and_split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
madd_split_fused_extendedalso still needs to be converted to adefine_insn_and_split.
Reverted in: 191282c
Signed-off-by: Luis Silva <luiss@synopsys.com>
Signed-off-by: Luis Silva <luiss@synopsys.com>
f61d7aa to
d479345
Compare
This patch adds an improvement of 0.898%. Signed-off-by: Luis Silva <luiss@synopsys.com>
This reverts commit d479345.
Signed-off-by: Luis Silva <luiss@synopsys.com>
Signed-off-by: Luis Silva <luiss@synopsys.com>
…X-100 series Signed-off-by: Luis Silva <luiss@synopsys.com>
…00 series Signed-off-by: Luis Silva <luiss@synopsys.com>
Signed-off-by: Luis Silva <luiss@synopsys.com>
Signed-off-by: Luis Silva <luiss@synopsys.com>
No description provided.