Author Topic: Proper "infinite" string scan function on C  (Read 2100 times)

0 Members and 2 Guests are viewing this topic.

Offline L0aD1nG

  • Peasant
  • *
  • Posts: 83
  • Cookies: 6
  • NeverFear1isHere
    • View Profile
Proper "infinite" string scan function on C
« on: January 24, 2015, 06:15:27 pm »
Hello Evilzone crowd!

I decided to share a function that I've created many months ago, which I tried to customize it before some days but I concluded to leave it as it was. I think its much useful for C programmers, at least for me it solved all my scan issues. Replaced almost every single operation about scanning input from the keyboard. It works properly.

I also thought, giving a sample of usage on main function would be cool so...
Here we go:
Code: (C) [Select]
#include <stdio.h>
#include <stdlib.h>

// Function Declaration
char* scan(unsigned int size);

// Main
int main(void)
{
    char* string = scan();
    printf("\n%s\n", string);
    free(string);
    return 0;
}

// Scan string input
char* scan(unsigned int size)
{
    // Declarations
    char ch;
    int index=0;
    char* buffer = NULL;
 
    // Setting the appropriate buffer
    buffer = malloc(size);
   
    // Loop until line feed
    while ((ch = getchar()) != '\n')
    {
        // Save the char and continue
        buffer[index++] = ch;
        // Check if 'stack' is full
        if( index == size )
        {
            // Update the size and the number
            size += size;
            buffer = realloc(buffer, size);

            //Check overflow
            if (!buffer)
                fprintf(stdout, "Not enough memory space to allocate.");
        }
    }
    // Add the '\0' lately.
    buffer[index++] = '\0';
   
    // Set final size
    if (index < size)   
        buffer = realloc(buffer, index);

    // Return the start of the string.
    return buffer;
}

I would love to get some opinions/thoughts.
Cheers,
L0aD1nG
« Last Edit: February 04, 2015, 02:40:49 pm by L0aD1nG »

Offline ArkPhaze

  • Peasant
  • *
  • Posts: 136
  • Cookies: 20
  • null terminated
    • View Profile
Re: Proper "infinite" string scan function on C
« Reply #1 on: January 29, 2015, 02:09:30 am »
A few things I've noticed. You don't need to cast the return of malloc() in C. You also don't check that the getchar() function even succeeds, or check for EOF, and it's not guaranteed that you will find '\n' which can result in an infinite loop... You also declare the pointer as static, which makes this function unusable in multi-threaded scenarios I believe. Also, why not parametrize the default allocation size? A single size for all seems very silly because reallocations are expensive. You may require 1024 for one buffer and only 3 chars for another...
« Last Edit: January 29, 2015, 02:17:02 am by ArkPhaze »
sig=: ArkPhaze

[ J/ASM/.NET/C/C++ - Software Engineer ]

Offline L0aD1nG

  • Peasant
  • *
  • Posts: 83
  • Cookies: 6
  • NeverFear1isHere
    • View Profile
Re: Proper "infinite" string scan function on C
« Reply #2 on: January 29, 2015, 12:06:10 pm »
First of all thanks a lot for the reply and suggestions! I really appreciate this. 

A few things I've noticed. You don't need to cast the return of malloc() in C.
Yea right, I did this to be more "properly" written. There is not a real reason as the conversion between void* and char* is implicit in C... Do you think it costs much?

  You also don't check that the getchar() function even succeeds, or check for EOF, and it's not guaranteed that you will find '\n' which can result in an infinite loop...
I use it mostly to get input from the user via the keyboard, I just need to capture the '\n' which for sure I will get when he presses the <Enter> right? Now I didn't specify this instead I said that it solved all my scan issues, pardon me. I didn't test this to file scanning, though its not built for file scanning otherwise I should use a FILE* and I think I should even put it as an parameter right? But when its about getting input from the keyboard I don't think its a possibility of infinity loop. What do you think?

  You also declare the pointer as static, which makes this function unusable in multi-threaded scenarios I believe.
Haven't done any multi-threading yet in C, I also wasn't aware of this. Thanks for this information.

Also, why not parametrize the default allocation size? A single size for all seems very silly because reallocations are expensive. You may require 1024 for one buffer and only 3 chars for another...
Now this is the only thing I kinda disagree cause if its about specifying the size each time, it won't be "infinite" right? Plus that the reallocations won't be that many, cause I double the size its time. But yea sure if its about file scanning it would help to start from a bigger size, though I still didn't use it for file scanning. 

After all, I will customize-update the function.
Thank you so much for all that important notes mate!

Offline ArkPhaze

  • Peasant
  • *
  • Posts: 136
  • Cookies: 20
  • null terminated
    • View Profile
Re: Proper "infinite" string scan function on C
« Reply #3 on: February 02, 2015, 01:40:52 am »
[quote author=L0aD1nG link=topic=18444.msg99273#msg99273 date=1422529570
Now this is the only thing I kinda disagree cause if its about specifying the size each time, it won't be "infinite" right? Plus that the reallocations won't be that many, cause I double the size its time. But yea sure if its about file scanning it would help to start from a bigger size, though I still didn't use it for file scanning. 

After all, I will customize-update the function.
Thank you so much for all that important notes mate!
[/quote]

Huh? :S... I'm talking about a default allocation size. Why would that make it non-"infinite" by parameterizing a default allocation size to start with before any reallocation calls?

You can't say that the reallocations won't be many because you don't know what the size of the input will be. Reallocation is expensive too, so if you need a large buffer, and you allocate 16 by default, then you're going to require multiple reallocations in order to reach a sufficient buffer size, but in other parts of your code perhaps you only need a couple bytes. Even to satisfy both cases in the best way and cut the default allocation to a point in the middle of the smallest allocation size you need and the largest, doesn't make any sense, because now you oversize a buffer, and still require reallocation for the larger buffer. I don't see any issue with parameterizing this to solve this problem IMHO. Additionally, heap allocation is slower than stack allocation.

Ideally allocation should be a one time thing. Anything that requires reallocation makes it easier to use in this case, but very inefficient. It would be better to determine the allocation size required before populating an allocated buffer and determining when it needs to be reallocated into a larger storage space.

My comment about malloc() too, in this case does not serve any purpose, but helps avoid issues when malloc() is not properly prototyped (by inclidng <stdlib.h>). The implicit casts are defined by the standard, there's really no benefit to being explicit about them (literally not a single benefit). It can actually be very bad casting the return of functions like malloc() that return void.

edit: That reallocation to fit the buffer size is really not necessary either, and it's actually not proper since you are setting it to the size of the 0-based index, and not the number of elements (including the null terminator). You essentially realloc() to cut the null terminator off... That is wrong. The final reallocation should be a buffer with size index + 1, not index.

Here is a good error checking function: http://stackoverflow.com/questions/4023895/how-to-read-string-entered-by-user-in-c/4023921#4023921
« Last Edit: February 02, 2015, 02:06:46 am by ArkPhaze »
sig=: ArkPhaze

[ J/ASM/.NET/C/C++ - Software Engineer ]

Offline L0aD1nG

  • Peasant
  • *
  • Posts: 83
  • Cookies: 6
  • NeverFear1isHere
    • View Profile
Re: Proper "infinite" string scan function on C
« Reply #4 on: February 02, 2015, 12:08:38 pm »
Everything you said above are pretty nice recommendations and you explained well this time, I totally agree on that all. Now for the char* casting I made it for the "shake of completeness" I was aware that initializing a void* to a char* variable is okay in C, and there is not a problem. Though I will remove this too. And I will apply the parameter. And I will update the function and let you know that I did. :)

Again thanks a lot I appreciate this.


Now in the other hand:
edit: That reallocation to fit the buffer size is really not necessary either, and it's actually not proper since you are setting it to the size of the 0-based index, and not the number of elements (including the null terminator). You essentially realloc() to cut the null terminator off... That is wrong. The final reallocation should be a buffer with size index + 1, not index.  Here is a good error checking function: http://stackoverflow.com/questions/4023895/how-to-read-string-entered-by-user-in-c/4023921#4023921
I think you kinda confused here... the last loop of the while (the time before the one that it is about to get the '\n') has the index already incremented by one, that is why I do after "buffer[index] = '\0';" , then I realloc to the exact size so there would be no empty space for no reason. So the index has already incremented and not used by +1 on the last loop, and there is not a need for reincrementing it.
« Last Edit: February 02, 2015, 12:10:15 pm by L0aD1nG »

Offline ArkPhaze

  • Peasant
  • *
  • Posts: 136
  • Cookies: 20
  • null terminated
    • View Profile
Re: Proper "infinite" string scan function on C
« Reply #5 on: February 04, 2015, 01:20:28 am »
Now in the other hand:I think you kinda confused here... the last loop of the while (the time before the one that it is about to get the '\n') has the index already incremented by one, that is why I do after "buffer[index] = '\0';" , then I realloc to the exact size so there would be no empty space for no reason. So the index has already incremented and not used by +1 on the last loop, and there is not a need for reincrementing it.

Yes.. to the last character BEFORE the null terminator. and you realloc() to cut the null terminator off. Do you even know what you're talking about? Please review your code. Here's some fake demo code to help you understand what you are doing:
Code: [Select]
#include <stdio.h>
#include <stdlib.h>

int stdin_index = 0;
#define fake_getchar() fake_stdin[stdin_index++]

int main(void)
{
  char fake_stdin[] = "1234$";

  int index = 0;
  char ch;
  char buf[234] = { 0 };
  while ((ch = fake_getchar()) != '$')
  {
    buf[index++] = ch; // comparable to 'index' in terms of your code
  }
  printf("buf: %s\n", buf);
  printf("char (%d) @ index %d\n", buf[index], index);
  // realloc(buf, index) == realloc(buf, 4) <--- 4 characters, including a null terminator, and the buffer is 4 characters + 1 null terminator.
  return 0;

  // "1234" is 4 elements, you set the null terminator essentially to index 4, which is the 5th element, then resize to index, and thus the reallocation resized the buffer to 4 elements, and not 5, which only includes "1234" and excludes the null terminator after the character '4'.
}

Quote
that is why I do after "buffer[index] = '\0';" , then I realloc to the exact size so there would be no empty space for no reason.

I have to say you don't know what you're talking about at all... buffer[index] is the (index+1)'th character (the null terminator in this case). You realloc to index, which is exactly 1 less than it SHOULD be, thus you are cutting off the null terminator.

realloc() calls for the NUMBER OF ELEMENTS. An index is 0-based and (last)index+1 is equal to the number of elements. The null terminator at [index] is being set after index characters.

If that doesn't make sense still...
Code: [Select]
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
  char buffer[] = "ABCDEF";
  int index = 2;
  buffer[index] = 0; // set null terminator in replace of 'C'
  realloc(buffer, index); // resize the allocated bufer to a size of 2 elements, which is AB and not AB\0
  // this realloc() SHOULD realloc for 3 elements, to include AB, and the null terminator, NOT 2.
}
^ You are doing this exact same thing. You set the null terminator to the index, and cut it off with the reallocation using realloc().

You are clearly confused here.. Resizing a buffer to the last index (the null terminator index) will always cut off the null terminator. This mistake resembles realloc(buffer, strlen(buffer)). strlen() does NOT include the null terminator.
« Last Edit: February 04, 2015, 01:33:05 am by ArkPhaze »
sig=: ArkPhaze

[ J/ASM/.NET/C/C++ - Software Engineer ]

Offline L0aD1nG

  • Peasant
  • *
  • Posts: 83
  • Cookies: 6
  • NeverFear1isHere
    • View Profile
Re: Proper "infinite" string scan function on C
« Reply #6 on: February 04, 2015, 02:32:53 pm »
You were right, I am sorry for insisting on this...But the thing is that the "printf("%s", string);" on the return values was always working properly. Thats what made me to be sure that the '\0' is still there. It seems that it was an "undefined behavior" or something...

Nevermind, I made all the fixes. I updated the function... take a look.
Again thanks for this notes Ark, the damn correct prints confused me.