WIP: Demo using Simple Case Folding by iSazonov · Pull Request #8515 · PowerShell/PowerShell

iSazonov · 2018-12-21T19:54:32Z

PR Summary

Related #8120.

The PR is only demo how we could use Simple Case Folding to speed up PowerShell engine.

Current code uses 2-level translation table to fold Unicode strings. The table is compact: full table is 128 Kb, the compressed table is ~10Kb.

New string compare works in 3-4 faster than standard ignore case comparer.
With using the new comparer in only one place you can also see a marked improvement in startup time.
Perhaps we could use the comparer in other places.

I push alternative code too so you could experiment. Tools and temporary benchmarks is also pushed.

PR Checklist

iSazonov · 2018-12-21T19:58:53Z

src/System.Management.Automation/engine/MshMemberInfo.cs

This is the only place of new code injection.

Use 32-bit sparsed array with BinarySearch Create init tests Got fast SpanFold and improve perf tests Improve CompareFolded Move files to new folders by role Enable perf test Add simple folded string comparer Use IgnoresAccessChecksToAttribute Configure visibility internal API Add comments. Fix for surrogates Update names and comments Add SimpleFoldedStringComparer constructor Use SimpleFoldedStringComparer in PSMemberInfoInternalCollection

Add AssertExtensions from CoreFX Use namespace System.Management.Automation.Unicode.Tests Add namespace in CharTests Remove duplicate tests from CharTests

Remove unneeded tests for comparer

Remove unneeded IndexOfFolded() tests

….dll'

TravisEz13 · 2019-01-04T00:07:55Z

src/System.Management.Automation/utils/unicode/UnicodeSimpleCaseFolding-g2.cs

+        /// <returns>
+        /// Returns folded string.
+        /// </returns>
+        //[MethodImpl(MethodImplOptions.AggressiveInlining)]


There is a lot of commented code and attributes in this file. Did you write this all yourself?

I started with zero experience in Unicode internals and c# in-depth optimizations. For the last two months I have been moving in small steps so that the code contains a lot of commented code which reflects my attempts to find fastest code. I'll remove its after the code will become stable.
All this code is written by me except for gen.cs that come from @tarekgh's gist.

TravisEz13 · 2019-01-04T00:09:06Z

src/System.Management.Automation/utils/unicode/CaseFolding-g2.cs

+    internal static partial class SimpleCaseFolding
+    {
+        private static readonly ushort[] L1 =
+        {


Can you add some comments about what L1 and L2 represent?

An idea is to use an array for mapping (simple case folding) a source char to target char. A problem is that SimpleCase.txt contains very small account (1.5Kb) of Unicode code points from Plane0 (SMP) and Plane1 - it is 128 Kb that's a lot for the mapping array. In related Dotnet CoreFXlab issue dotnet/corefxlab#2610 @tarekgh suggested to use the 3-level (8-4-4) mapping technique (and published a gen.cs code in gist). It is ~6Kb. This turned out to be much slower than a single-level array and I tried a two-level array. It is ~10Kb.
Yesterday I found that I can use 1-level mapping for chars < 0x5ff and 2-level mapping for other chars. It is ~10Kb too and fastest case I have found. I'll push the commit today.
I guess the current code still has a lot of bugs and I consider it as "alfa". Following I plan to fix surrogates.

TravisEz13 · 2019-01-04T00:11:50Z

What is the startup time difference you see?

iSazonov · 2019-01-04T09:21:09Z

@TravisEz13 My first measurement with a single-level array showed a speedup of ~8 percent. That's why I continued working on it. Of cause this result is not reliable although the standard powershell (Pester) tests pass successfully (only my xUnit tests fail currently).
I also believe that there are other places where we could inject this code to get a performance win and I need a help to find its.

TravisEz13 · 2019-01-04T18:52:07Z

Talking to @daxian-dbw his assumption is the actual folding code would go into DotNet.

tarekgh · 2019-01-04T18:58:05Z

@TravisEz13

Talking to @daxian-dbw his assumption is the actual folding code would go into DotNet.

There is no plan or decision that this will happen. it depends on the case and DotNet team will approve it. My thoughts so far is to create this in a separate NuGet package and if we see a demand on this functionality we can consider moving it to the core.

iSazonov · 2019-01-04T19:23:54Z

I agree that this should be in CoreFX. At best, it will be CoreFX 3.1 and that means that PowerShell will get this advantage only a year and a half or two years from today. This is too long to wait. So I think it is best to work here and in CoreFX at the same time.
My last perf test shows (sorry I still do not push the commit) that new comparer is faster than CoreFX OrdinalIgnoreCase:

                Method |         StrA |         StrB |     Mean |     Error |    StdDev | Ratio |
---------------------- |------------- |------------- |---------:|----------:|----------:|------:|
         CoreFXCompare | CaseFolding1 | cASEfOLDING2 | 41.77 ns | 0.1741 ns | 0.1454 ns |  1.00 |
 SimpleCaseFoldCompare | CaseFolding1 | cASEfOLDING2 | 37.34 ns | 0.2286 ns | 0.1909 ns |  0.89 |
                       |              |              |          |           |           |       |
         CoreFXCompare | ЯяЯяЯяЯяЯяЯ1 | ЯяЯяЯяЯяЯяЯ2 | 87.32 ns | 0.4460 ns | 0.4172 ns |  1.00 |
 SimpleCaseFoldCompare | ЯяЯяЯяЯяЯяЯ1 | ЯяЯяЯяЯяЯяЯ2 | 37.68 ns | 0.5356 ns | 0.4748 ns |  0.43 |

I hope this works correctly.
In this case, CoreFX team may be more interested in speeding up this work.

TravisEz13 · 2019-01-04T22:04:46Z

What locale are those stats for?

iSazonov · 2019-01-05T08:34:59Z

@TravisEz13 In the test I used Russian chars. Really the result is true for chars with codepoints < 0x5ff (1-level mapping is used) that is most of Latin and European codepoints. For codepoint above a 2-level mapping is used and result is ~68 ms (vs 37 ms) that is still faster than CoreFX OrdinalIgnoreCase.

stale · 2019-02-04T15:10:30Z

This PR has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs within 10 days.
Thank you for your contributions.
Community members are welcome to grab these works.

iSazonov · 2019-02-05T06:11:48Z

Now we have a PR in corefxlab repo and I hope we get the experimental package soon.

stale · 2019-02-15T07:10:49Z

This PR has been automatically closed because it is stale. If you wish to continue working on the PR, please first update the PR, then reopen it.
Thanks again for your contribution.
Community members are welcome to grab these works.

iSazonov requested review from SteveL-MSFT and daxian-dbw December 21, 2018 19:54

iSazonov requested review from BrucePay, PaulHigin, TravisEz13, adityapatwardhan and anmenaga as code owners December 21, 2018 19:54

iSazonov commented Dec 21, 2018

View reviewed changes

src/System.Management.Automation/engine/MshMemberInfo.cs Outdated

Copy link

Collaborator Author

iSazonov Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place of new code injection.

iSazonov force-pushed the add-unicode4-2level-cache branch 4 times, most recently from 0727391 to c09b5cd Compare December 26, 2018 08:48

iSazonov mentioned this pull request Dec 28, 2018

Unicode Case Folding dotnet/corefxlab#2610

Closed

iSazonov force-pushed the add-unicode4-2level-cache branch from a98fe70 to aff75e4 Compare December 28, 2018 14:54

iSazonov added 16 commits December 29, 2018 08:32

Add SimpleFoldedStringComparerTests (come from CoreFX)

8ba8581

Add new test files from CoreFX and move files in Unicode folder

c6f23a9

Add AssertExtensions from CoreFX Use namespace System.Management.Automation.Unicode.Tests Add namespace in CharTests Remove duplicate tests from CharTests

Add tests for comparer

62e88df

Remove unneeded tests for comparer

Add GetHashCode() in comparer

7a64a26

Add GetHashCode NotEqual tests

1dd98a0

Make Fold() method private

2430fdd

Fix comment

661dcf0

Add UnicodeData.11.0.txt

c3c322f

Auto accept minor

fe9c1ec

Add IndexOfFolded tests

c5298ae

Add IndexOfFolded() for ReadOnlySpan<char>

f273aa3

Update IndexOfFolded() for string

7ff9ec2

Add tests for IndexOfFolded() span

d9ebeb8

Remove unneeded IndexOfFolded() tests

Load UnicodeData.11.0.txt and CaseFolding.txt

438a3d0

Add Fold_Char test

93c666f

iSazonov added 2 commits December 29, 2018 08:32

Fix SimpleCaseFold() method

94e2bee

Fix CompareUsingSimpleCaseFolding

a03ec72

iSazonov force-pushed the add-unicode4-2level-cache branch from 9939013 to a03ec72 Compare December 29, 2018 03:33

iSazonov added 6 commits December 29, 2018 10:19

Update tests

fc8fdad

Optimize on in csproj

0fc2d4e

Improve comparer

53088ce

Update benchmark test

e44d012

Fix style issues

a4cf807

Remove namespace prefixes

c56fe88

iSazonov force-pushed the add-unicode4-2level-cache branch from d1da272 to 311ccd7 Compare December 29, 2018 12:15

iSazonov added 3 commits December 29, 2018 17:25

Test on Unix expected loading 'System.Runtime.CompilerServices.Unsafe…

e470871

….dll'

Refactor csprojs of xUnit tests

621c99a

Step to run all xUnit tests

57873c6

iSazonov force-pushed the add-unicode4-2level-cache branch from 311ccd7 to 57873c6 Compare December 29, 2018 13:49

iSazonov mentioned this pull request Dec 31, 2018

Change hashtable to use OrdinalIgnoreCase to be case-insensitive in all Cultures #8566

Merged

11 tasks

TravisEz13 reviewed Jan 4, 2019

View reviewed changes

stale bot added the Stale label Feb 4, 2019

stale bot closed this Feb 15, 2019

iSazonov deleted the add-unicode4-2level-cache branch October 16, 2021 05:36

Comments

Conversation

iSazonov commented Dec 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Uh oh!

iSazonov Dec 21, 2018

Choose a reason for hiding this comment

Uh oh!

TravisEz13 Jan 4, 2019

Choose a reason for hiding this comment

Uh oh!

iSazonov Jan 4, 2019

Choose a reason for hiding this comment

Uh oh!

TravisEz13 Jan 4, 2019

Choose a reason for hiding this comment

Uh oh!

iSazonov Jan 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TravisEz13 commented Jan 4, 2019

Uh oh!

iSazonov commented Jan 4, 2019

Uh oh!

TravisEz13 commented Jan 4, 2019

Uh oh!

tarekgh commented Jan 4, 2019

Uh oh!

iSazonov commented Jan 4, 2019

Uh oh!

TravisEz13 commented Jan 4, 2019

Uh oh!

iSazonov commented Jan 5, 2019

Uh oh!

stale bot commented Feb 4, 2019

Uh oh!

iSazonov commented Feb 5, 2019

Uh oh!

stale bot commented Feb 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iSazonov commented Dec 21, 2018 •

edited

Loading

iSazonov Jan 4, 2019 •

edited

Loading