nvme-apple: serialize the controller-global command tag space on t8015#2
nvme-apple: serialize the controller-global command tag space on t8015#2yhavry wants to merge 12 commits into
Conversation
… absent Apple NVMe controllers require tags of pending commands to not be shared across admin and IO queues. However, on Apple A11 without linear SQ, it is not possible for the IO queue to skip over some tags and must go from 0 to the configured maximum before wrapping around. As a result, in order to prevent tag collision, dynamic tag reservation while a command is submitted becomes neccessary. In this context, there is no reason to limit the admin queue's tag space, as it is not helpful in preventing tag collision. Signed-off-by: Nick Chan <towinchenmi@gmail.com>
On t8015 (A11), the admin queue uses command tag 0 and the IO tagset is given no reserved tags, so the admin and IO queues can have tag 0 outstanding at the same time. When that happens the controller reports a duplicate-tag error for tag 0. We have no documentation for ANS2's tag handling on this generation, but the failure strongly suggests it treats the command tag as a single controller-wide resource rather than a per-queue one: making the tag unique across both queues makes the error go away. Enforce that with a controller-wide bitmap of active tags and: - reserve the tag before hardware submission; - if it is already active, return BLK_STS_RESOURCE so blk-mq retries rather than busy-waiting; - release it only after the CQ head has been acknowledged to ANS; - clear stale tag state on controller reset. Signed-off-by: Yuriy Havrylyuk <yhavry@gmail.com> Signed-off-by: yhavry <34289148+yhavry@users.noreply.github.com>
bc64db6 to
da64d58
Compare
| return BLK_STS_OK; | ||
|
|
||
| out_release_tag: | ||
| apple_nvme_t8015_release_cid(anv, cmnd->common.command_id); |
There was a problem hiding this comment.
No need for a separate label here since there was only one goto out_free_cmd, so just put this line after out_free_cmd:
There was a problem hiding this comment.
Fixed, removed the extra label. Map failure now releases the reserved t8015 tag inline, then jumps to out_free_cmd.
| "NVMMU TCB invalidation failed\n"); | ||
| } | ||
|
|
||
| static bool apple_nvme_t8015_reserve_tag(struct apple_nvme *anv, |
There was a problem hiding this comment.
For consistency with existing function put _t8015 at the end of the function name instead of in the middle.
There was a problem hiding this comment.
Fixed, renamed it with _t8015 at the end for consistency with the existing style.
| { | ||
| u16 tag; | ||
|
|
||
| if (anv->hw->has_lsq_nvmmu) |
There was a problem hiding this comment.
Do not check for anv->hw->has_lsq_nvmmu in functions marked as t8015. Instead check at the call site.
There was a problem hiding this comment.
Fixed, reserve_tag_t8015() no longer checks has_lsq_nvmmu; the caller handles that now.
| return !test_and_set_bit(tag, &anv->t8015_active_tags); | ||
| } | ||
|
|
||
| static void apple_nvme_t8015_release_cid(struct apple_nvme *anv, |
| { | ||
| u16 tag; | ||
|
|
||
| if (anv->hw->has_lsq_nvmmu) |
|
|
||
| if (!anv->hw->has_lsq_nvmmu) { | ||
| writel(q->cq_head, q->cq_db); | ||
| readl(q->cq_db); |
There was a problem hiding this comment.
Dropped it. I added it based on an ordering assumption, but testing didn’t show it was needed, so simpler is better here.
| } | ||
|
|
||
| if (found) | ||
| if (found && anv->hw->has_lsq_nvmmu) |
There was a problem hiding this comment.
This function apple_nvme_poll_cq() may complete multiple commands in one invocation, is it possible to set a bitfield in the apple_nvme_cqe_pending() while loop, and only release tags and write doorbell according to the bitfield after the loop is completed?
There was a problem hiding this comment.
Done, poll_cq() now drains CQEs first, collects completed t8015 tags locally, writes the CQ doorbell once after the loop, then releases the tags.
Signed-off-by: yhavry <34289148+yhavry@users.noreply.github.com>
|
Applied, thanks! |
On A11, ANS2 apparently treats the firmware-visible command tag as
controller-global rather than queue-local. The admin queue uses tag 0
and the IO queue is not given reserved tags, so both can submit tag 0
concurrently, which the controller reports as a duplicate-tag error.
Track active t8015 tags in a controller-wide bitmap:
rather than busy-waiting;
Signed-off-by: Yuriy Havrylyuk yhavry@gmail.com