-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed and optimized split methods #11
base: dev
Are you sure you want to change the base?
Fixed and optimized split methods #11
Conversation
Looks good at first glance. I'll review, slightly adapt the style to better match the project, and merge it. |
The methods look good to me, but I would still prefer not to merge the yet, since I'd like to conduct some unit tests to compare any any potential differences. Maybe waiting until the test suite is finished?? |
Sure. The only failing tests is with an empty delimiters span, since we haven't implemented that yet. Though, as you said, the tests are a bit lacking still. Also, fyi |
Also, I have looked a bit more to the tests you've written and I believe, we should probably merge them into a feature branch |
I have a local branch with uptodate commits from the |
I have a branch in this repo, called "feature-unit-testing", in which I have adapted your |
Also, unless I have gotten something dramatically wrong, we could refactor the monster that is public static StringSplitOptions[] GetAllStringSplitOptions()
{
return (StringSplitOptions[])Enum.GetValues(typeof(StringSplitOptions));
} |
I'll look into them, and probably just merge them into the As for the Try running this: var options = (StringSplitOptions[])Enum.GetValues(typeof(StringSplitOptions));
Console.WriteLine(string.Join(", ", options));
var options2 = GetAllStringSplitOptions().Select(o => o.ToString().Replace(",", " |"));
Console.WriteLine(string.Join(", ", options2)); The output should look like this:
While The one I wrote was the simplest I could find, since the alternative would be trying to generate all permutations of the enums returned by If you have an alternative, don't hesitate to suggest. |
How about just hard-coding the values into an array?? They are unlikely to soon be changed and the current method seems overkill for such a simple purpose for a test suite. |
An alternative would be to manually hardcode all possible values, but you'd have to remember to update them if a new one was ever added, and if a lot of values were added, the number of all possible combinations would explode in size. I wanted to make something that (hopefully) doesn't need to ever be modified again. The only reason why that code wouldn't work anymore (as stated in the comment next to the If you prefer hardcoding those values, I'm fine with changing that. |
Then let's leave it that way |
25689c3
to
68d321b
Compare
e3ad377
to
8bb8994
Compare
4a5dacf
to
3993f24
Compare
Run into another edge case when expanding the fuzzing with parameterized ( If the separator span is empty in Looking at if (separator!.Length == 0)
{
count = 1;
goto ShortCircuit;
}
else
{
return SplitInternal(separator, count, options);
} Edit: Wait, I tested that against the old implementation (before this PR), give me a sec xd |
The current implementation tests for the delimiterLength in every iteration of |
I know that, and that's what I do in the ones that have When delimiter is empty, we need to shorcircuit, return the whole span and exit. The Edit: If The ones that have If you can come up with something, I am all ears. |
One could at least do the comparision once in the constructor and the have a field like |
That is already basically what I do though? DelimiterLength = Delimiter.Length; // cached in constructor
...
if(delimiterIndex == -1 || DelimiterLength == 0) The only difference is that instead of a boolead, I am comparing an int to 0, but considering booleans are basically an int that is compared to 0 to check state, there's no real difference. |
Yes, but in theory your version should trigger an invocation of the comparision operator and thereby the As you can see here, in jit asm a Here are the screenshots top to bottom, on the first is the approach with (DelimiterLength) and the second DelimiterIsEmpty: And yes that is micro optimization, but so is the mere concept of Spans in C#. |
8486faf
to
2d377c0
Compare
Looking at the some of the changes you made recently on dev, you renamed I had also done that in this PR, as I had mentioned previously, and was hoping for you input on that. Personally, instead of The way
So, since contain and include are synonyms, I felt like it was appropriate. As for However, this isn't really that important, so if you say so, I'll update to the names you came up with. |
I see; I agree it isn't that important now. I get the idea of the antonyms; however I felt and do still feel that 'append' is more descriptive as to where the remaining elements are actually included (they're appended to the last element). The downside of this approach is that the antonym is not necessarilly a good fit for the contrary behaviour. There is no need to Update the pull request right now as the differences are growing so complex that I will need to merge it into a helper branch and then resolve many conflicts. The final decision on the name is going to be made before the release of version 1.3.0, because I do not want to introduce a breaking change later on. |
There's 1 more thing missing in the current implementation. All the Span Split Enumerators work and return char[] source = "Lorem ipsum dolor sit amet...";
// Split source by space and capitalize the first letter of every word
foreach (var word in source.AsSpan().Split(' '))
{
if (word.Length == 0) continue;
word[0] = word[0].ToUpper(); // impossible, since word is a ReadOnlySpan<char>
} To fix this, we'd need to duplicate every split enumerable and make it work with normal mutable spans, and update all extension methods and tests. We could do that now, or maybe try to make a source generator first to not have to do all that manually? |
That seems like a job for the to be source generator |
64e12ac
to
bb825aa
Compare
bb825aa
to
591df77
Compare
I also added a check for validity of |
We should decide what are the final names for |
I am pretty set on |
Ok, on it |
I also replaced the custom
InvalidCountExceedingBehaviourException
with the builtinArgumentException
since that is whatstring.Split
throws whenStringSplitOptions
is invalid. If that is unacceptable, the commit can be dropped.I also renamed the
CountExceedingBehaviour
options to, what I think, better explains them. Also reworded the summary comments. Again, if you disagree with that, you may drop that commit.There's 1 more gatcha in
string.Split
: Ifseparators
(delimiters
inSplitAny
in our case) is an empty collection, the function automatically assumes that we wish to split the string with white spaces:Sigh... So there are several ways to handle this:
char.IsWhiteSpace()
. Though I am assuming this is slower than the option above, otherwise why would .NET do it differently.