Skip to content

Conversation

@ChALkeR
Copy link
Member

@ChALkeR ChALkeR commented Dec 18, 2025

Tracking: #61041

This significantly improves both utf8 TextDecoder and Buffer#toString() performance on ASCII

And removes the main reason why import { TextDecoder } from '@exodus/bytes/encoding.js' beats both Node.js TextDecoder and Node.js Buffer on Node.js on utf-8 🙃

See #61041 (comment)

Using https://github.com/lemire/jstextdecoderbench by @lemire:

Buffer#toString() - utf8

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 17.60 GiB/s 0.005 ms
Arabic lipsum 79.771 KiB 0.26 GiB/s 0.294 ms
Chinese lipsum 68.203 KiB 0.32 GiB/s 0.212 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.74 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.280 ms
Chinese lipsum 68.203 KiB 0.33 GiB/s 0.199 ms

TextDecoder, loose

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 17.90 GiB/s 0.005 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.286 ms
Chinese lipsum 68.203 KiB 0.33 GiB/s 0.200 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.61 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.279 ms
Chinese lipsum 68.203 KiB 0.33 GiB/s 0.208 ms

TextDecoder, fatal

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 15.17 GiB/s 0.006 ms
Arabic lipsum 79.771 KiB 0.26 GiB/s 0.292 ms
Chinese lipsum 68.203 KiB 0.32 GiB/s 0.206 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.57 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.286 ms
Chinese lipsum 68.203 KiB 0.31 GiB/s 0.207 ms

cc @nodejs/performance

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Dec 18, 2025
@ChALkeR ChALkeR force-pushed the chalker/decoder/ascii/0 branch 4 times, most recently from 0bfc0ec to 4f82c5a Compare December 18, 2025 21:54
@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.53%. Comparing base (4f24aff) to head (42b4ff4).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/encoding_binding.cc 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61119      +/-   ##
==========================================
- Coverage   88.53%   88.53%   -0.01%     
==========================================
  Files         703      703              
  Lines      208546   208555       +9     
  Branches    40217    40215       -2     
==========================================
+ Hits       184634   184638       +4     
+ Misses      15926    15923       -3     
- Partials     7986     7994       +8     
Files with missing lines Coverage Δ
src/string_bytes.cc 69.74% <100.00%> (-0.77%) ⬇️
src/encoding_binding.cc 55.33% <92.30%> (+0.72%) ⬆️

... and 43 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. It is simple, likely quite safe.

@ChALkeR ChALkeR force-pushed the chalker/decoder/ascii/0 branch from 4f82c5a to 866781f Compare December 19, 2025 05:11
@ChALkeR
Copy link
Member Author

ChALkeR commented Dec 19, 2025

Upd: removed useless reinterpret_cast, otherwise unchanged

@ChALkeR
Copy link
Member Author

ChALkeR commented Dec 19, 2025

@lemire I notice that !simdutf::validate_ascii_with_errors().error is consistently faster than simdutf::validate_ascii()
(36 GiB/s vs 34-35 GiB/s on ASCII-utf8-to-text conversion)

Why is that and is there a reason to prefer the latter?
Code around seems to be using the former, should I just switch to validate_ascii_with_errors?

Upd, ah: #46271 (comment), updated PR

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 19, 2025
@github-actions github-actions bot added request-ci-failed An error occurred while starting CI via request-ci label, and manual interventon is needed. and removed request-ci Add this label to start a Jenkins CI on a PR. labels Dec 19, 2025
@github-actions
Copy link
Contributor

Failed to start CI
   ⚠  Commits were pushed since the last approving review:
   ⚠  - src: improve StringBytes::Encode perf on ASCII
   ⚠  - src: use validate_ascii_with_errors
   ✘  Refusing to run CI on potentially unsafe PR
https://github.com/nodejs/node/actions/runs/20370786084

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 19, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 19, 2025
@nodejs-github-bot
Copy link
Collaborator

@RafaelGSS RafaelGSS added the performance Issues and PRs related to the performance of Node.js. label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js. request-ci-failed An error occurred while starting CI via request-ci label, and manual interventon is needed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants