[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit Tests #9

Closed
wants to merge 52 commits into from
Closed

Unit Tests #9

wants to merge 52 commits into from

Conversation

Guiorgy
Copy link
Contributor
@Guiorgy Guiorgy commented Jan 26, 2024

TODO:

  • ReadOnlySpan Tests
    • Linq Tests
      • todo
    • Split Tests
      • Fuzzing Tests
        • Array Source
          • Single Delimiter
          • Single Delimiter With Count and CountExceedingBehaviour
          • Any Delimiter
          • Any Delimiter With Count and CountExceedingBehaviour
          • Delimiter Sequence
          • Delimiter Sequence With Count and CountExceedingBehaviour
        • String Source
          • Single Delimiter With StringSplitOptions
          • Single Delimiter With Count, StringSplitOptions and CountExceedingBehaviour
          • Any Delimiter With StringSplitOptions
          • Any Delimiter With Count, StringSplitOptions and CountExceedingBehaviour
          • Delimiter Sequence With StringSplitOptions
          • Delimiter Sequence With Count, StringSplitOptions and CountExceedingBehaviour
      • Facts
        • Enumeration Returns the Correct Span Type
        • Empty Source Results in Empty Span
        • Consecutive Delimiters Result in an Empty Span
        • Delimiter at the Start Results in an Empty Span
        • Delimiter at the Start With RemoveEmptyEntries Option Doesn't Result in an Empty Span
        • Delimiter at the End Results in an Empty Span
        • Delimiter at the End With RemoveEmptyEntries Option Doesn't Result in an Empty Span
        • Default CountExceedingBehaviour Option is AppendLastElements
        • Delimiter at the End With Count Equal to Delimiter Count Results in a Span at the End With a Delimiter
        • By Default Count Equal to Delimiter Count Results in Span at the End With Everything After and Including Last Delimiter
        • Count Equal to Delimiter Count with CutLastElements Option Results in Everything After and Including Last Delimiter Being Cut
        • Consecutive Delimiters With RemoveEmptyEntries Option Doesn't Result in an Empty Span
        • Consecutive Delimiters at the End With Count Equal to Delimiter Count With RemoveEmptyEntries Option Doesn't Result in a Span With Delimiter
        • Trim Entries Option Should Trim Last Span
        • White-Space Span With TrimEntries and RemoveEmptyEntries Options Should Return Nothing
        • Consecutive Delimiters at the End With RemoveEmptyEntries Option Doesn't Result in Empty Spans
        • Count Equal to 0 Returns Nothing
        • Negative Count Throws an ArgumentOutOfRangeException
        • Count Equal 1 With RemoveEmptyEntries Option Returns Nothing if Source is Empty
        • Count Equal 1 With RemoveEmptyEntries Option Doesn't Recursively Remove Empty Spans
        • Count Greater Than 1 With RemoveEmptyEntries Option Recursively Removes Empty Spans
        • Count Equal 1 With RemoveEmptyEntries and TrimEntries Options Returns Nothing if Source is White Space
        • Count Equal 1 With RemoveEmptyEntries and TrimEntries Options Doesn't Recursively Remove White Space Spans
        • Count Greater Than 1 With RemoveEmptyEntries and TrimEntries Options Recursively Removes White Space Spans
        • Undefned CountExceedingBehaviour Option Throws an ArgumentException
        • Undefned StringSplitOptions Option Throws an ArgumentException
        • Delimiter Span With Length Zero Should Return Source Span Unchanged
        • Delimiter Span With Length Zero Should Assume Count is Equal One
    • String Tests
      • todo
  • Span Tests
    • Linq Tests
      • todo
    • Split Tests
      • Same as ReadOnlySpan Tests
    • String Tests
      • todo

@dragon7307
Copy link
Member
dragon7307 commented Jan 26, 2024

We probably would first need to agree on a testing framework to use and this is when some creative differences will come into play. So, I personally use XUnit for my tests, since I find it the most intuitive framework. You seem to be using MSTest if I am not mistaken - a testing framework I've never used.

As a quick note, I would also rename the folder Tests to tests for consistency in naming with src.

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Jan 26, 2024

I don't really have much of a preference, MSTest is just what I've defaulted to use. I haven't personally used XUnit, but can't imagine it being much different, so I'll just migrate the project.

Edit: With XUnit I'm getting
Warning NU1701 Package 'xunit.runner.visualstudio 2.5.6' was restored using '.NETFramework,Version=v4.6.1, .NETFramework,Version=v4.6.2, .NETFramework,Version=v4.7, .NETFramework,Version=v4.7.1, .NETFramework,Version=v4.7.2, .NETFramework,Version=v4.8, .NETFramework,Version=v4.8.1' instead of the project target framework 'net5.0'. This package may not be fully compatible with your project.
when including net5.0 in targets. Either we remove that as a target, or get stuck with an older version of XUnit.

I'll also change the name no problem.

More importantly, while writing tests for ReadOnlySpan<char>, I assumed you were trying to make the Split(char, StringSplitOprions, int) methods compatible to string.Split(char, int, StringSplitOprions). If this is the case, then there's a problem. The inmplementation in string, as the docs state, if the string has already been split count - 1 times, but the end of the string has not been reached, then the last string in the returned array will contain this instance's remaining trailing substring, untouched.. Your implementation on the other hand cuts at the count-th delimiter. For example:

string @string = new(['2', 'x', 'r', 'b', '7', '6', 'p', 'w', 'd', 'r', '6', '4', 'g', 't', 'i', '6', '5', 'c', 'm', 'r', 't', '4', '7', 'v', '3', 'y', '6', '5', 'n', 'b', 'n', 't', 'p', '4', 'r', 'w', '6', '1', 'x', 'a', 't', 'i', 'i', 'o', 'o', 'q', '4', '0', 'q', 'i', '6', 'd', 'x', 'i', '9', '6', 'd', 'w', 'e', '2', 'x', 's', 'a', '9', 't', 'g', 'p', 'o', 'k', 'o', 'm', 'e', 'p', '5', 'k', '7', 's', '3', '0', 'c', 's', 'z', 'b', '2', 'j', '5', 'b', 's', 'u', 'h', 'c', '8', 'i', 'w', 'y', 'c', '8', 'u', 'f', 'f']);
char delimiter = 'o';
ReadOnlySpan<char> charSpan = @string;

var expected = string.Join(", ", @string.Split(delimiter, 3, StringSplitOptions.None));
var actual = string.Join(", ", ToSystemEnumerable(charSpan.Split(delimiter, StringSplitOptions.None, 3)).Select(a => string.Join("", a)));

/*
Results:
string: 2xrb76pwdr64gti65cmrt47v3y65nbntp4rw61xatiiooq40qi6dxi96dwe2xsa9tgpokomep5k7s30cszb2j5bsuhc8iwyc8uff
expexted: 2xrb76pwdr64gti65cmrt47v3y65nbntp4rw61xatii, , q40qi6dxi96dwe2xsa9tgpokomep5k7s30cszb2j5bsuhc8iwyc8uff
actual: 2xrb76pwdr64gti65cmrt47v3y65nbntp4rw61xatii, , q40qi6dxi96dwe2xsa9tgp
*/

@dragon7307
Copy link
Member
dragon7307 commented Jan 26, 2024

This is interesting, I was not aware string.Split with count is working that way. This is a design consideration for which I'll need some time to settle on a conclusion. Not quite sure yet, what the solution should be, maybe one could pass an optional boolean parameter to determine whether or not the end should be appended or cut.
I'll try both approaches and will hopefully be back with a conclusion...

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Jan 26, 2024

For now I'll write them assuming compatibility with string.Split, we can change this later.

Edit: Found another inconcistency. If the last character in a string is the delimiter, string.Split returns an empty array, wheres your implementation drops it. Consecutive delimiters result in an empty array/span in both cases though.
Edit3: SplitAny has the same anomaly.

Edit2: I've migrated the project to XUnit. Please tell me if the format is acceptable, or the changes you'd like to see.

Edit4: Also, is there a particular reason your implementations of ReadOnlySpan<char>.Split(delimiter, options, count) has the count argument after options, when string.Split(separator, count, options) has them the other way? Personally, I would've tried to be drop in replacement as much as possible.

Right now if you want to change code that uses string.Split to use spans instead, just addind .AsSpan() is not enough:

// before
var splits = myString.Split(delimiter, maxCount, options);
// after
var splits = myString.AsSpan().Split(delimiter, maxCount, options);

That won't compile. You'd need to switch those arguments:

// before
var splits = myString.Split(delimiter, maxCount, options);
// after
var splits = myString.AsSpan().Split(delimiter, options, maxCount);

Which just seems needless...

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Jan 27, 2024

The reason why empty sequences are dropped when the delimiter is at the end of the source sequence, is because of the assumption that the remaining span is empty only when it's been exhausted:

public bool MoveNext()
{
    ReadOnlySpan<T> span = Span;
    if(span.IsEmpty) // wrong
    {
        return false;
    }
    int index = span.IndexOf(Delimiter);

    if(index == -1 || index >= span.Length)
    {
        Span = ReadOnlySpan<T>.Empty; // 1 way it can become empty
        Current = span;
        return true;
    }
    Current = span[..index];
    Span = span[(index + 1)..]; // second way it can become empty
    return true;
}

We probably have to store an extra flag to indicate the end:

private bool done;
public bool MoveNext()
{
    ReadOnlySpan<T> span = Span;
    if(done)
    {
        return false;
    }
    int index = span.IndexOf(Delimiter);

    if(index == -1 || index >= span.Length)
    {
        done = true; // only here it can become true
        Current = span;
        return true;
    }
    Current = span[..index];
    Span = span[(index + 1)..]; // done will always be false here
    return true;
}

The name can be agreed on.

Edit: I'll open a separate PR for this

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Jan 27, 2024

It's a bit unfortunate that those tests for ReadOnlySpan and Span are basically exact copies of each other. In the future if there's time, might consider making a source generator that takes tests for ReadOnlySpan and just replaces ReadOnlySpan with Span, if you are interested in that :P

@dragon7307
Copy link
Member

So,
In fact I was trying to make the Span and String Methods at least similiar - I have swapped the order of the count and the stringSplitOptions parameters on the dev-branch.
I've fixed the problem with the end of the Span being cut off after count has been reached, by implementing an enum called CountExceedingBehaviour, which is passed as an optional argument to the count methods, and is defaulted to CountExceedingBehaviour.AppendLastElements, which essentially mirrors the string version. Optionally one can get the old behaviour back with the cut, by passing CountExceedingBehaviour.CutLastElements. The fix unfortunately requires concatenating two Spans, what I am currently doing with an intermediate array. This however is not an ideal solution since the enumerators are supposed to be allocation free - so I'll be looking for an optimized version in the future.

A Source Generator really sounds like something to investigate in. That will be the future though.
For now I'll resolve the merge conflicts in your PR...

Also I unfortunately won't be available tomorrow, e.g. in the next 24 hours, but after that I'll be going through the test suite and all the other concerns.

@dragon7307
Copy link
Member

Another curious issue I found, is that when splitting a span consisting of delimiters only, and specifiyng the length of the span as count, it returns count blank spans. However the string implementation returns count - 1 blank lines and one time the delimiter itself.

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Jan 27, 2024

The tests aren't done, I've only implemented ones for the Split and SplitAny methods. I'll also be updating them with the new changes.. I'll tell you when they are ready to be merged.

I noticed that you swapped StringSplitOptions and count, so I've already updated those tests.

From what I can see, as I mentioned in my comment under the commit, the default behaviour is CountExceedingBehaviour countExceedingBehaviour = CountExceedingBehaviour.CutLastElements, so the old behaviour, not the new one.

@dragon7307
Copy link
Member

Right, It is indeed. The behaviour shouldn't be that way though. I'll make the new one the default.

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Jan 28, 2024

About the string split issue you mentioned, the string implementation is correct. Here's an example:

var src = "aaa"; // when split by 'a' we get ['', 'a', '', 'a', '', 'a', ''] 4 empty around 3 'a'-s
src.Split('a'); // ['', '', '', ''] the result of removing 'a'-s from the above array
src.Split('a', 4); // ['', '', '', ''] 
src.Split('a', 3); // ['', '', 'a'] where 'a' is '' + 'a' + '' (3rd empty and after)
src.Split('a', 2); // ['', 'aa''] where 'aa' is '' + 'a' + '' + 'a' + '' (second empty and after)

or another example to better visualize:

var src = "bababab";
src.Split('a'); // ['b', 'b', 'b', 'b']
src.Split('a', 4); // ['b', 'b', 'b', 'b']
src.Split('a', 3); // ['b', 'b', 'bab']
src.Split('a', 2); // ['b', 'babab'']

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Jan 28, 2024

Thinking about it some more, this assumption is also wrong when options is TrimEntries:

ReadOnlySpan<char> span = Span;
int index = span.IndexOf(Delimiter);

if(index == -1)
{
    enumerationDone = true;
    Current = span;
    return true;
}

Since this shortcircuits, it never checks if trimming is necessary, for example:

" a b a ".AsSpan().Split('b', StringSplitOptions.TrimEntries);

results in [['a'], [' ', 'a', ' ']] (last span was not trimmed).
Edit: Sape problem goes if source span is empty with RemoveEmptyEntries option, or source span with only spaces and RemoveEmptyEntries | TrimEntries options.

I'll rewrite the methods and include comments if you are ok with that.

Edit: Here's what I ended up with for SpanSplitStringSplitOptionsEnumerator:

public bool MoveNext()
{
    if(enumerationDone)
    {
        return false;
    }

    while(true) // if RemoveEmptyEntries options flag is set, repeat until a non-empty span is found, or the end is reached
    {
        int delimiterIndex = Span.IndexOf(Delimiter);

        if(delimiterIndex == -1)
        {
            enumerationDone = true;
            Current = Span;

            if(TrimEntries)
            {
                Current = Current.Trim();
            }

            return !(RemoveEmptyEntries && Current.IsEmpty);
        }

        Current = Span[..delimiterIndex];
        Span = Span[(delimiterIndex + DelimiterLength)..];

        if(TrimEntries)
        {
            Current = Current.Trim();
        }

        if(RemoveEmptyEntries && Current.IsEmpty)
        {
            continue;
        }

        return true;
    }
}

Things that I changed:

  • changed from a recursive call (in case of RemoveEmptyEntries) to a loop, which I think is easier to implement, but more importantly should be faster (let's avoid recursion if performance is important)
  • I cached TrimEntries and RemoveEmptyEntries options into boolean readonly variables inside the enumerator struct. This is so calls to the HasFlag method is only performed once during intitialization, instead of on every iteration.
  • replaced if(index == -1 || index >= span.Length) with just if(index == -1), since IndexOf should never return an invalid index, except for -1.

And here's one with StringSplitOptions:

    public bool MoveNext()
    {
        if(EnumerationDone)
        {
            return false;
        }

        while(true) // if RemoveEmptyEntries options flag is set, repeat until a non-empty span is found, or the end is reached
        {
            int delimiterIndex = Span.IndexOf(Delimiter);

            if(delimiterIndex == -1 || CurrentCount == 1)
            {
                EnumerationDone = true;

                if(delimiterIndex != -1 && RemoveEmptyEntries && CountExceedingBehaviour == CountExceedingBehaviour.AppendLastElements) // skip all empty (after trimming if necessary) entries from the left
                {
                    do
                    {
                        ReadOnlySpan<char> beforeDelimiter = Span[..delimiterIndex];

                        if(TrimEntries ? beforeDelimiter.IsWhiteSpace() : beforeDelimiter.IsEmpty)
                        {
                            Span = Span[(delimiterIndex + DelimiterLength)..];
                            delimiterIndex = Span.IndexOf(Delimiter);

                            continue;
                        }

                        break;
                    }
                    while(delimiterIndex != -1);

                    Current = Span;
                }
                else
                {
                    Current = delimiterIndex == -1 || CountExceedingBehaviour == CountExceedingBehaviour.AppendLastElements ? Span : Span[..delimiterIndex];
                }

                if(TrimEntries)
                {
                    Current = Current.Trim();
                }

                return !(RemoveEmptyEntries && Current.IsEmpty);
            }

            Current = Span[..delimiterIndex];
            Span = Span[(delimiterIndex + DelimiterLength)..];

            if(TrimEntries)
            {
                Current = Current.Trim();
            }

            if(RemoveEmptyEntries && Current.IsEmpty)
            {
                continue;
            }

            CurrentCount--;
            return true;
        }
    }
}

The changes include the ones above, plus I moved the check for an invalid CountExceedingBehaviour into the constructor.

PS. string.Split on an invalid StringSplitOptions throws an ArgumentException with a message Value of flags is invalid. (Parameter 'options'). I'm not sure we need a custom exception just for that one thing.

@Guiorgy Guiorgy marked this pull request as draft January 29, 2024 13:09
@Guiorgy
Copy link
Contributor Author
Guiorgy commented Jan 29, 2024

Turns out you can have a Draft PR (didn't know :D) that can't be accidentally merged until ready.

@dragon7307
Copy link
Member

:D

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Feb 3, 2024

I run into another weird difference:

var str = "abcdababcd";

Console.WriteLine(str.Split("ab", 1, StringSplitOptions.RemoveEmptyEntries)); // abcdababcd
Console.WriteLine(str.AsSpan().Split("ab", 1, StringSplitOptions.RemoveEmptyEntries)); // cdababcd
Console.WriteLine(str.Split("ab", 2, StringSplitOptions.RemoveEmptyEntries)); // cd, cd
Console.WriteLine(str.AsSpan().Split("ab", 2, StringSplitOptions.RemoveEmptyEntries)); // cd, cd

What's weird is that when count is more than 1, and RemoveEmptyEntries is set, on the last split it will remove all preceding delimiters:

  1. abcdababcd
  2. '' - cdababcd
    2.1. remove empty: cdababcd
  3. cd - abcd
    3.1. last split, remove preciding delimiters: cd, '' - cd = cd, cd

But when initial count is 1, it ignores that and just returns the string as is, even though I'd expect it to do the same:

  1. abcdababcd
    1.1. last split, remove preciding delimiters: '' - cdababcd = cdababcd

Edit: Looked at the string.Split source again, and as I suspected, when count is 1 they shortcircuit:

if (count <= 1 || Length == 0)
{
    // Per the method's documentation, we'll short-circuit the search for separators.
    // But we still need to post-process the results based on the caller-provided flags.
    return CreateSplitArrayOfThisAsSoleValue(options, count);
}

private string[] CreateSplitArrayOfThisAsSoleValue(StringSplitOptions options, int count) {
    // ...
    if ((options & StringSplitOptions.RemoveEmptyEntries) == 0 || candidate.Length != 0)
    {
        return new string[] { candidate };
    }
}

As you can see, they only chech if the whole string (after trimming if necessary) is empty.

Where's when count is at least 2:

// ...
if (arrIndex == count - 1)
{
    // The next iteration of the loop will provide the final entry into the
    // results array. If needed, skip over all empty entries before that
    // point.
    if ((options & StringSplitOptions.RemoveEmptyEntries) != 0)
    {
        while (++i < numReplaces)
        {
            thisEntry = this.AsSpan(currIndex, sepList[i] - currIndex);
            if ((options & StringSplitOptions.TrimEntries) != 0)
            {
                thisEntry = thisEntry.Trim();
            }
            if (!thisEntry.IsEmpty)
            {
                break; // there's useful data here
            }
            currIndex = sepList[i] + (lengthList.IsEmpty ? defaultLength : lengthList[i]);
        }
    }
    break;
}

thisEntry = this.AsSpan(currIndex); // takes everything after the found non empty split

In the above, when number of splits is count - 1, they continue to split and drop empty entries until they find one that is not empty, and take the rest of the string after that point.

This is inconsistent. -_-

For now, I'll change tests and the fix branch to follow this ¯\_(ツ)_/¯

I submited a question to the .NET team, let's see what they say.

@dragon7307
Copy link
Member
dragon7307 commented Feb 4, 2024

I agree, this behaviour seems oddly inconsistent.
That however raises the question once again, whether it is even a good idea to try to mimick string.Split, or whether it would be better to allow some creative differences for the better.

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Feb 4, 2024

Whether we should try to mimick string.Split depends on the goal, is it to be a drop in replacement to exising (string.Split) methods that are (hopefuly) faster (needs testing, my next goal are benchmarks), or is it something else? If it's the former, then we should try to make the behaviour identical, at least by default. It's fine to add optional features. At least, that's my opinion.

@dragon7307
Copy link
Member
dragon7307 commented Feb 4, 2024

Wait with the benchmarking for now. I have already started writing benchmarks on a local branch, which I will push, and the Span.Split methods are faster by quite a margin and more importantly allocation-free - well mostly, since the behaviour for the AppendLastElements option, currently requires some allocation. I am working on fixing that.

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Feb 21, 2024

I think I'm done with the fuzzing tests for Split extensions.

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Feb 23, 2024

Once I'm done with the unit tests for split extensions, do you want to merge this and #11 right then, or do you prefer to wait until the rest are also done? To be honest, I'm thinking about putting the rest on hold and trying to make the source generators first. I've already done a generator that copied the source of one class into another once, so I'll probably use that as the base.

@dragon7307
Copy link
Member
dragon7307 commented Feb 24, 2024

Yeah, merging them as soon as they are done is my plan.
It may really be good to focus on the source generators first.

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Feb 24, 2024

I am almost done, I think. However, I just realized that I missed unit tests for Sequence Delimiter variants. I'll add them tomorow.

fixed ConsecutiveDelimitersAtTheEndWithCountEqualDelimiterCountWithRemoveEmptyEntriesOptionResultInNoSpanWithDelimiter
added EmptyDelimiterSpanResultsSameAsCountEqualOne
@Guiorgy Guiorgy marked this pull request as ready for review February 25, 2024 10:35
@Guiorgy
Copy link
Contributor Author
Guiorgy commented Feb 25, 2024

I'm done with Fuzzing and Unit Tests for Split extensions.

A few things to note:

  • Currently (assuming Fixed and optimized split methods #11 is merged), there are 2 tests that fail:
    • When an empty char array is passed to string.Split any whitespace character is used as the delimiter. We currently don't yet do that. The tests could temporarity be dissabled either by commenting it out, or by adding a filter to the dotnet test line in the actions configuration.
    • I also added tests to ensure the extensions returns the right type, i.e. ReadOnlySpan.Split returns ReadOnlySpan and Span.Split returns Span. This is currently not the case. Until the source generators are done we could either keep them as is, or, as I mentioned above, temporarily disable them.
  • The names of CountExceedingBehaviour in this PR, including not only code, but also test names and comments, are still the old ones (AppendLastElements). Before we merge this, we should decide their final naming, I'll then fix them in both PRs to make it easier to merge.

Edit: LOL, just noticed .NET 8 added ReadOnlySpan extensions for Split and SplitAny. However, our advantage is that we support both Span and SpanExtensions, and we return an enumerable of spans, wheres they write results in the specified Span<Range> destination, which fails if there's not enought space in it. Either way, might be worth adding some tests to compare results agains those extensions, intead of string.Split (For ReadOnlySpan only).

kept xunit.runner.visualstudio on version 2.4.5 because later versions drop support for .NET 5
@dragon7307 dragon7307 changed the base branch from dev to review-tests February 27, 2024 18:23
@dragon7307
Copy link
Member

merged into review-tests!!

@dragon7307 dragon7307 closed this Feb 27, 2024
@Guiorgy
Copy link
Contributor Author
Guiorgy commented Feb 28, 2024

why change the branch, and why squash merge? I tried to make every commit message meaningful, oh well XD

Other than that, looks good, except that, the StringSplitOptions options and int count parameters in Span extensions need to be swapped, for example:

SplitAny(this Span<char> source, ReadOnlySpan<char> delimiters, StringSplitOptions options, int count, ...)

Running a find and replace operation of "StringSplitOptions options, int count` -> "int count, StringSplitOptions options" with "Current Project" scope fixes the problem, everything builds and runs (well, sort of...).

Note: As I've stated in #11 currently there are several bugs in the split implementations (that the PR fixes), including those that result in an infinite loop! More specifically, do not run SplitSequence and SplitStringSequence tests, as the tests will never end, and RAM usage will also keep going up indefinitely eventually spilling over into your swap.

On that note, it might be worth looking at a way to detect infinite loops and memory leaks in the future.

Actually, since we always turn the Span Enumerables into IEnumerables with our extension methods, we could check that the enumeration count is below some threshold, and throw an exception if it goes above that, for example:

public static IEnumerable<IEnumerable<T>> ToSystemEnumerable<T>(this SpanSplitEnumerator<T> spanEnumerator, int maxCount = int.MaxValue) where T : IEquatable<T>
{
    List<T[]> list = [];

    foreach(ReadOnlySpan<T> element in spanEnumerator)
    {
        list.Add(element.ToArray());

        if (list.Count >= maxCount)
        {
            throw new IndexOutOfRangeException($"Enumeration exceeded {maxCount}.");
        }
    }

    return list;
}

I tested it, and it fixed the problem, I'll open a new PR for that.

@dragon7307
Copy link
Member

Well the merge conflicts were quite huge and I didn't want to break anything while refactoring so i created an extra branch off of dev as an additional security layer. I have already swapped the parameters and have also adapted the code style on my local copy. I am just running a visual studio code cleanup as is configured on my device.
I have indeed run into the issue with the infinite loops already and had my PC freeze out.

@dragon7307
Copy link
Member

Furthermore, I am also currently questioning whether we actually need the tests that verify the type of Span returned, as this should not be a unit test, but is part of static analysis.
Also that is a one time fix. So I don't believe this is a scenario for unit tests - after all, the statically enabled warnings exist for a reason

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Feb 28, 2024

already and had my PC freeze out.

oops, sorry 😅

I am also currently questioning whether we actually need the tests that verify the type of Span returned

If you believe they are not needed, you are free to remove them. However, keep in mind that the cast of Span to ReadOnlySpan is implicit, as such, a problem like that could escape a developer, and not everyone even looks at warnings. I included them just in case a future PR contributor made a mistake in copy-pasting code and forgot to change some types, though it's up to you.

@dragon7307
Copy link
Member

I have commented them out for now!!

@dragon7307
Copy link
Member
dragon7307 commented Feb 28, 2024

Also is there a reason options is not used in the following foreach loop??

[Fact]
public void WhiteSpaceCharactersAssumedWhenDelimitersSpanIsEmpty()
{
    foreach(StringSplitOptions options in stringSplitOptions)
    {
        AssertEqual(
            [['a'], ['b'], ['c'], ['d']],
            "a b c d".AsSpan().SplitAny([], StringSplitOptions.None).ToSystemEnumerable(maxCount: 100)
        );
    }
}

@Guiorgy
Copy link
Contributor Author
Guiorgy commented Feb 28, 2024

Nope, that's an oversight 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants