Author Topic: "Find 'nth' prime program" - Also, is this a good way to detect bad input? [C++]  (Read 3797 times)

0 Members and 2 Guests are viewing this topic.

Offline ArkPhaze

  • Peasant
  • *
  • Posts: 136
  • Cookies: 20
  • null terminated
    • View Profile
Some constructive criticism & a few pointers to help improve the code:

First, try not to use std::cout from a function outside of main() unless it is specifically intended for I/O.  I know this doesn't seem to matter much to people these days but it's a design concept that is much preferred, especially for professional-level code.  Another note, for the same design reasoning, is that you should always try to 'return' from your code, even if it's void.  A simple 'return;' statement is sufficient and lets various code-scanning tools(and documentation tools) better handle the function.

Second, your flow logic could be improved.  Since both the true and false forks of your conditional check result in a 'loc' increment, you should keep it outside the condition.  Even better, since the true fork does nothing but the increment, you can reverse the logic and eliminate it altogether.

Code: (c++) [Select]
while (loc < rvPrimes.size()) {
  if (rvPrimes[loc] != 0) {  // OR if (rvPrimes[loc]) since it's a boolean check against value
    pVal = rvPrimes[loc];

    for (int i = 1; (loc + (i * pVal)) < rvPrimes.size(); ++i)
      rvPrimes[loc + (i * pVal)] = 0;
  }

  ++loc;
}

Third, since 'loc' and 'i' are both used for indexes, their value should never be negative.  Thus, you would be best to use 'unsigned int' for these variables.  Further, note that std::vector<T>.size() returns an unsigned value(size_type is unsigned) and thus (int + (int * int)) should return an int and int < unsigned int should technically throw a compiler warning.  The reason for this is that it is possible to have a vector that has more than 2,147,483,647 members(though not necessarily wise).  If this is the case, then (loc + (i * pVal)) could potentially ALWAYS be < rvPrimes.size() and you're stuck with an infinite loop.

Fourth, your inner iterator could potentially be implemented as a register allowing for a tighter loop.  The compiler may do this automatically but not always.  Don't abuse the register keyword, as there are only a few registers available at one time, but rather use it sparingly where you can potentially reap a large benefit.  A single inner loop that runs many times over each element of a list that can measure a million or more in size is a very good reason to use a register.

Finally, rather than leaving values in the vector as 0, it might be best to remove the entry completely to keep a cleaner and smaller vector for speed and memory efficiency.  This may not be important to this function specifically, or even this test-case, but if you're planning to use this function in other code where you want a list of primes to pass around, it might be best to use as little memory as possible.  Further, if you're appending to the vector in a separate thread(using intelligent queuing with proper lock handling; vectors by themselves are not necessarily thread-safe!!) then its size can obviously grow quite large rather quickly.

An example including some of the aforementioned suggestions:
Code: (c++) [Select]
#include <iostream>
#include <vector>

// shortcut macro for uncluttered calculation
#define C(l, i, p) (l + (i * p))

void primeListing(std::vector<int> &rvPrimes) {
  unsigned int loc = 0, sz = rvPrimes.size();
  int pVal;  // could pVal legitimately be negative???

// shortcut macro using local variables for uncluttered calculation
#define D(i) C(loc, i, pVal)

  while (loc < sz) {
    if (rvPrimes[loc]) {  // true if rvPrimes[loc] has non-zero value
      pVal = rvPrimes[loc];

      for (register unsigned int i = 1; D(i) < sz; ++i)
        rvPrimes[D(i)] = 0;
    }

    ++loc;
  }

  // optional; eliminates 0 values, leaving a clean list of only primes
  for (
    std::vector<int>::iterator it = rvPrimes.begin();
    it != rvPrimes.end();
    ++it
  )  // I know this looks weird but some still use 80-column terminals
    if (!*it) rvPrimes.erase(it--);  // post-decrement

  return;
}

int main(void) {
  std::vector<int> primes;

  for (unsigned int i = 2; i < 1000; ++i) primes.push_back(i);

  primeList(primes);

  for (
    std::vector<int>::iterator it = primes.begin();
    it != primes.end();
    ++it
  )  // again, looks weird but it's easy to read and understand in 80 cols
    std::cout << *it << std::endl;

  return 0;
}

Overall, this is a good exercise and a worth-while pursuit.  I would recommend continuing to work on improving this code to see how efficient & elegant you can make it.  If you have questions, just voice them and I'm sure someone will assist.

Being required to have a return statement for a void function is both extraneous, and not necessary. The purpose you point out here is questionable. What code scanning tools are you talking about? IMO to have a (redundant) return before the closing brace of a void function just to have it there is silliness, and any tool that requires you to bloat your code with a bunch of useless 'return;' statements should be avoided. This is 2014, not 1980.

I would say that a macro with minimal verbosity or a least self-documenting name if you MUST use a macro in C++ such as 'C' and 'D' is poor to have in your code too.

I would've avoided pushing back all of the values and causing such re-allocations to take place, just to have a bunch of 0's in a vector that will eventually get removed anyways, that is far from effective and efficient here.

Also, auto-vectorization is enabled by default if using MSVC. There's no need to be using the 'register' keyword and in fact it's even considered deprecated in C++11 (http://en.cppreference.com/w/cpp/language/storage_duration). There's no guarantee that the compiler will listen to you and many cases where it won't if you use it. It's not necessary and should be avoided. In cases where auto-parallelization may not occur, there's several #pragma directives that can be used too as a hint to the compiler that you want parallelization over a loop too, if you know how and when to use it appropriately.
« Last Edit: December 08, 2014, 05:38:33 am by ArkPhaze »
sig=: ArkPhaze

[ J/ASM/.NET/C/C++ - Software Engineer ]

Offline Xires

  • Noob Eater
  • Administrator
  • Knight
  • *
  • Posts: 379
  • Cookies: 149
    • View Profile
    • Feed The Trolls - Xires
When working on code for the US Navy's Seawolf class submarines(for example), using a custom compiler for a different architecture(TRON VLSI), the code screener(which is target-required to be run prior to compilation) spits out an error and halts the compilation process for any function which does not explicitly return.  Return statements for void functions are also useful for certain debugging processes when working with Intel's TBB for multithreaded code.  Typically, a compiler will just insert a 'ret' and that's that, but occasionally another tool is less intelligent and doesn't make the automatic correlation that the 'ret' instruction indicates function termination & return to calling procedure without a corresponding command in the source file.  In such a circumstance, a 'return;' is both harmless and helpful.  In any case, adding a return statement to such functions harms nothing and has potential for benefits whereas not having it could be fine but could also cause some problems, however small, especially in conjunction with other tools; thus, it is best to always use it.

Macros should definitely be more verbose or have a self-documenting name; you are correct.  However, it was used for demonstration and I did not see it as important to give it a more descriptive name.  That is an oversight on my part and unhelpful in instruction.  Therefore, in future, I will try to do better.  Thank you for pointing it out.

Yes, there are better ways to have done it, but I was attempting to not change the foundation of the original code.  Though a heavily optimized structure could have been implemented, it would not have been as easy for a newcomer to understand.  I wanted to demonstrate a few minor modifications that could be easily understood by the OP without changing the OPs original concept of the solution.  The memory saved by eliminating 0'd values in the vector could, potentially, be very large.  In addition, the whole point was that it was less important for the function itself than for whatever might decide to use the resulting vector.  A smaller vector which contained only the 'confirmed' prime numbers and not a bunch of empty values would be easier on memory, permit storage of a larger list of found primes and be easier & faster to parse for other purposes.  The result is better memory & speed efficiency using the same datatype with which the OP is familiar.

I do not personally use MSVC much but I do tend to use multiple compilers.  Different compilers will treat the 'register' keyword differently.  Some may ignore it, some may see it merely as a suggestion and others consider it an explicit demand.  In cases where it is properly used, its use does not make a program worse whilst it has potential to make things better.  If the compiler ignores it, that's fine; no error.  If the compiler considers it a suggestion, it's fine; no problem.  If the compiler considers it an explicit demand, it's fine; benefits to be reaped and no problems.  I did specifically caution that it should be used sparingly as it could mess things up.  Hopefully this would cause a reader(OP or otherwise) to research it better before deciding to take advantage of it.

The use of #pragma tends to be very preprocessor-specific.  Some are also library-specific.  This means that, in many cases, use of a #pragma will cause an error because the #pragma found is specific to a different preprocessor/library and the developer was, at the time, unaware.  I would grant that it would be an exercise in learning & betterment of said developer but in some cases portability is prevented.  So, it becomes relatively simple as a case of logic; use of a #pragma could cause errors whilst avoiding use of a #pragma does not cause errors.  Thus, NOT using it is still generally the best course of action.  It's fine to know compiler/preprocessor-specific tricks but I cannot advise their use for general-purpose, multi-platform code.  This is different from the use of the 'register' keyword because it is a language entity that is handled by the compiler implementation; it can ignore it or use it but its proper use does not cause errors.
-Xires

Offline ArkPhaze

  • Peasant
  • *
  • Posts: 136
  • Cookies: 20
  • null terminated
    • View Profile
The problem there is the compiler which does not see fit to conform to language standards, not anything to do with simply omitting a return (where it is still, and should be, valid to do so, if not preferred). The ONLY way I could see using a return for a void function is in the case of generics for conformity with other return types.

Code: [Select]
template <typename T>
T f()
{
  return T();
}

int main()
{
  f<void>();
}

'RET' is one of the most common instructions out there. If a tool doesn't know what it should do in that case, it's obviously a very poor one; resort to a better tool. Having to add a return to avoid an error which should never occur is like a bandaid over the real problem at hand. With the return there, I don't see how it should change the way the code is compiled, and the tool evaluating the instructions is taking the wrong approach if it fixes something with code analyzing.

You post 'potential benefits' as a buzzword for a fix to some less common compiler which does not relate to the majority of the ones other professional developers use. IMO that compiler should be fixed, and same with the tools you're speaking about. There's no benefits here, this is a 'fix' to an issue that should've never existed in the first place, and in a very specific case with a very specific tool/compiler.

And with that rant over, I can see your point with not changing the foundation of the original code to keep it more of a representation of his original work.

As with the #pragma preprocessor directive, you are right, but it's project or team environment specifics... Most people working on a project will never switch over to another compiler. If you start writing the next Firefox or some other large project, chances are if you're using Visual Studio, you'll stick with it, and the conventions that all other developers must follow will be partially compiler specific. In such cases '#pragma once' may be preferred over the alternative.
« Last Edit: December 13, 2014, 09:27:47 pm by ArkPhaze »
sig=: ArkPhaze

[ J/ASM/.NET/C/C++ - Software Engineer ]

Offline Xires

  • Noob Eater
  • Administrator
  • Knight
  • *
  • Posts: 379
  • Cookies: 149
    • View Profile
    • Feed The Trolls - Xires
Okay, first, the logical process: if performing a task is never bad, but could be good; then it follows that it is beneficial to perform that task.  Likewise, if NOT performing a task is often fine, but sometimes bad; then it follows that it is not necessarily beneficial to perform that task.  It is safe to conclude that where a task follows both of these statements, then it is best to adopt the habit of always performing said task.



Language standards for C & C++ do not dictate implementations in ASM.  They do not state what assembly instructions should be the result of compiling any language feature.  Doing so would negate the concept of portability to other platforms.  Part of C's success is the very fact that it can be ported to compile code run natively on another architecture.  Cross-compilation toolchains are specifically intended for this process.  This is part of what makes C & C++ great.

Not every architecture is the same.  Not every ASM procedure uses 'ret'.  Occasionally coroutines are implemented this way.  Some architectures instead use different instructions that do not return but rather wait or even make a call to a parent.  Even on x86 you cannot expect that a function will always return, particularly if something happens which causes it to abort.  Occasionally this is even an intended execution path.  Also consider that it is possible that an architecture to not, in fact, even have the 'ret' instruction in the first place.

Tools are designed for multiple purposes.  Especially in the case of architecture-specific tools, it is unwise(even ridiculous) to think that they would be poorly designed because they do not conform to the expectations of someone unfamiliar with their use or the reasons that they might generate warning, error, or fault messages.  Using a 'return' statement in a function denoted to return 'void' may matter little to the compiler but it could end up meaning quite a bit to some additional, external tools.  If the use of a tool ends up causing you to adapt to become a better & more consistent programmer by providing more explicit instructions in your source, then I do not see that as a fault in the tool.



Please note that Firefox is designed to be cross-platform and is therefore not dependent upon a compiler that it only available on one platform.  Further, most cross-platform projects target non-Windows systems first and are later ported to Windows.  A 'solution' file provided for VisualStudio is not provided because development took place using VisualStudio or its compiler but because others who are not part of the primary development team may desire to use it for their own development & compilation.  This is a specific case, though, and probably not your intention.

I do not mean to nit-pick but only point out that there are many, many large projects that, due to multiplatform requirements, must remain compiler-neutral.  In fact, most very large projects, especially those that have a diverse development team spread across the world, with different native languages & timezones, cannot rely upon any specific environment configuration and many do not rely upon a single compiler.  This covers pretty much every FOSS project worldwide, which constitutes more code than every proprietary corporate project world wide.  It is, in fact, only the minority proprietary projects which adhere to your ideal of the use of #pragma as a preference.

Please remember that the reliance upon #pragmas is reliance upon something which cannot be standardized.  Because many are compiler-specific, it locks you into use of that compiler.  That is to say; sometimes the choice of compiler for a large project is not because the project started with that compiler but because some feature essential to its proper compilation & functionality is specific to that compiler.  This effectively traps the development team(who will often complain, loudly).  Dependence upon #pragma is a dangerous thing as it is akin to designing in a feature which is reliant upon the existence of a bug.  Such a practice is indeed far more common on Windows, as it is far more reliable like manner, but it is not a practice that should be taught to others.



You are an intelligent person, ArkPhaze, and I have enjoyed reading your many posts on these forums for quite some time.  However, I am beginning to wonder what kind of education is being forced upon you outside of this forum.  Any educational instructor who preaches such things should be beaten with their own text books, repeatedly.  Any authors who write text books with similar information should likewise be beaten with them, also repeatedly.  If indeed this is the kind of things that you are being taught, I should very much like to have a word with your instructors as I take it as a great insult to teach students information which is, at best, misleading and at worst, wholly incorrect and potentially dangerous to their future, careers included.
-Xires

Offline ArkPhaze

  • Peasant
  • *
  • Posts: 136
  • Cookies: 20
  • null terminated
    • View Profile
Okay, first, the logical process: if performing a task is never bad, but could be good; then it follows that it is beneficial to perform that task.  Likewise, if NOT performing a task is often fine, but sometimes bad; then it follows that it is not necessarily beneficial to perform that task.  It is safe to conclude that where a task follows both of these statements, then it is best to adopt the habit of always performing said task.



Language standards for C & C++ do not dictate implementations in ASM.  They do not state what assembly instructions should be the result of compiling any language feature.  Doing so would negate the concept of portability to other platforms.  Part of C's success is the very fact that it can be ported to compile code run natively on another architecture.  Cross-compilation toolchains are specifically intended for this process.  This is part of what makes C & C++ great.

Not every architecture is the same.  Not every ASM procedure uses 'ret'.  Occasionally coroutines are implemented this way.  Some architectures instead use different instructions that do not return but rather wait or even make a call to a parent.  Even on x86 you cannot expect that a function will always return, particularly if something happens which causes it to abort.  Occasionally this is even an intended execution path.  Also consider that it is possible that an architecture to not, in fact, even have the 'ret' instruction in the first place.

Tools are designed for multiple purposes.  Especially in the case of architecture-specific tools, it is unwise(even ridiculous) to think that they would be poorly designed because they do not conform to the expectations of someone unfamiliar with their use or the reasons that they might generate warning, error, or fault messages.  Using a 'return' statement in a function denoted to return 'void' may matter little to the compiler but it could end up meaning quite a bit to some additional, external tools.  If the use of a tool ends up causing you to adapt to become a better & more consistent programmer by providing more explicit instructions in your source, then I do not see that as a fault in the tool.



Please note that Firefox is designed to be cross-platform and is therefore not dependent upon a compiler that it only available on one platform.  Further, most cross-platform projects target non-Windows systems first and are later ported to Windows.  A 'solution' file provided for VisualStudio is not provided because development took place using VisualStudio or its compiler but because others who are not part of the primary development team may desire to use it for their own development & compilation.  This is a specific case, though, and probably not your intention.

I do not mean to nit-pick but only point out that there are many, many large projects that, due to multiplatform requirements, must remain compiler-neutral.  In fact, most very large projects, especially those that have a diverse development team spread across the world, with different native languages & timezones, cannot rely upon any specific environment configuration and many do not rely upon a single compiler.  This covers pretty much every FOSS project worldwide, which constitutes more code than every proprietary corporate project world wide.  It is, in fact, only the minority proprietary projects which adhere to your ideal of the use of #pragma as a preference.

Please remember that the reliance upon #pragmas is reliance upon something which cannot be standardized.  Because many are compiler-specific, it locks you into use of that compiler.  That is to say; sometimes the choice of compiler for a large project is not because the project started with that compiler but because some feature essential to its proper compilation & functionality is specific to that compiler.  This effectively traps the development team(who will often complain, loudly).  Dependence upon #pragma is a dangerous thing as it is akin to designing in a feature which is reliant upon the existence of a bug.  Such a practice is indeed far more common on Windows, as it is far more reliable like manner, but it is not a practice that should be taught to others.



You are an intelligent person, ArkPhaze, and I have enjoyed reading your many posts on these forums for quite some time.  However, I am beginning to wonder what kind of education is being forced upon you outside of this forum.  Any educational instructor who preaches such things should be beaten with their own text books, repeatedly.  Any authors who write text books with similar information should likewise be beaten with them, also repeatedly.  If indeed this is the kind of things that you are being taught, I should very much like to have a word with your instructors as I take it as a great insult to teach students information which is, at best, misleading and at worst, wholly incorrect and potentially dangerous to their future, careers included.

This redundant use of 'return' to me just seems like a waste of line space. I still try to write code with no longer lines than ~80 chars. In addition to horizontal workflow, I think vertical is still an important component to think about.

"Language standards for C & C++ do not dictate implementations in ASM" - No they do not, but the compiler which interprets the C++ code to be compiled still needs to follow the language standards. How they do that is implementation specific, but there's really no other obvious and suitable alternative to ret for a void function.

"Even on x86 you cannot expect that a function will always return, particularly if something happens which causes it to abort." - Yes, but this is talk about runtime. The compiler is never going to know for sure what happens, (it doesn't need to,) and it doesn't go ahead to make these kinds of assumptions. It's still going to explicitly return for you behind the scenes if you use any common compiler that would be used for the majority of large scale projects for the public, whether the path of execution reaches the end of the function or not.

"Also consider that it is possible that an architecture to not, in fact, even have the 'ret' instruction in the first place." - Well then if the compiler in such a case follows language standards, it would be platform specific, but the result of keeping or omitting the 'return' at the end should remain of a consistent behavior. If an architecture doesn't have a 'ret' keyword, then the 'return' keyword specified within the C++ language must still have meaning, although it would be very architecture specific, even if that means just to ignore it. If it didn't have the 'ret' instruction, then what are you hinting at for what a 'return' would do in a void function that the compiler wouldn't similarly do from omitting it?

I wrote a dummy function and viewed the disassembly for both adding a return and omitting it. The result is one and the same:
Code: [Select]
push        ebp 
mov         ebp,esp 
sub         esp,0C0h 
push        ebx 
push        esi 
push        edi 
lea         edi,[ebp-0C0h] 
mov         ecx,30h 
mov         eax,0CCCCCCCCh 
rep stos    dword ptr es:[edi] 
pop         edi 
pop         esi 
pop         ebx 
mov         esp,ebp 
pop         ebp 
ret 

Other compilers tested exhibit the same result.

Textbooks that preach good practices are nice and all, but they get partially thrown out the window when you actually start a career with development/programming, and become less useful. They are a guideline at best, and what you do in the real world isn't always ideal. If you're a Windows developer, you'll probably be much more biased to cross-platform development, so those people don't really care. You're talking about a theoretical programmer vs. a real world programmer here.

Even the fact that you bring up Firefox here, I don't see in their sources that they follow this idea of putting a 'return' in all void functions (if any) to keep neutral. They have other more useful and meaningful strategies to achieve what they want.

In a case where a function wouldn't and shouldn't return:
Code: [Select]
#include <iostream>

[[ noreturn ]] void f() { std::cout << "Hello"; }

int main()
{
  f();
}

C++ has this covered. You would probably use such an attribute to throw an exception.

Code: [Select]
[[ noreturn ]] void f() { throw "Something went wrong."; }
« Last Edit: December 17, 2014, 06:30:16 am by ArkPhaze »
sig=: ArkPhaze

[ J/ASM/.NET/C/C++ - Software Engineer ]

Offline Darkvision

  • EZ's Fluffer
  • VIP
  • Royal Highness
  • *
  • Posts: 755
  • Cookies: 149
  • Its not a bug, It's a Chilopodas.
    • View Profile
arkphaze what xires is trying to say is that it is a best practice. To change areas real quick; It is a best practice to use a anti-static wrist strap when working on the innards of a computer. However it is not necessarily needed if the person is experienced enough to know how to always stay grounded so that a static discharge can not occur. Basically you use the wrist-strap(return) till you are experienced/knowledgeable enough to know that the lack of one can not hurt what you are doing. Is it annoying to use a wrist-strap(return)? sure. But you do it anyway to prevent potential problems that can occur from your lack of knowledge/experience.
The internet: where men are men, women are men, and children are FBI agents.

Ahh, EvilZone.  Where networking certification meets avian fecal matter & all is explained, for better or worse.

<Phage> I used an entrence I never use

Offline ArkPhaze

  • Peasant
  • *
  • Posts: 136
  • Cookies: 20
  • null terminated
    • View Profile
arkphaze what xires is trying to say is that it is a best practice. To change areas real quick; It is a best practice to use a anti-static wrist strap when working on the innards of a computer. However it is not necessarily needed if the person is experienced enough to know how to always stay grounded so that a static discharge can not occur. Basically you use the wrist-strap(return) till you are experienced/knowledgeable enough to know that the lack of one can not hurt what you are doing. Is it annoying to use a wrist-strap(return)? sure. But you do it anyway to prevent potential problems that can occur from your lack of knowledge/experience.

I know what he's trying to say, but there is no 'best practice' reason for always having return statement in a void function, just because some very uncommon compilers and tools (supposedly) require it.

In this case, it seems a bit different than your portrayal of safe vs unsafe in terms of ESD here. Whether you have a return or not, the majority of the common compilers (if not all I've ever worked with) still infer similar behavior of that of adding an explicit return statement. So I'm not sure if he's talking about pre-compilation tools that require the return there because the code is read as text and not instructions, but otherwise, it makes no sense because it's redundancy; there is no difference in the output binary, and for the added hassle and added bulk to the source code written, I don't see a benefit, even if it's just to support some uncommon compiler.

Chances are if you are using such an uncommon compiler, you'll be sticking with it for project specific reasons; why should your code support all other common compilers? From my experience, anything out of the ordinary was written to be compiler specific because generic-ness causes a performance loss in some cases. You can't just turn it around to say that it's good practice because one very uncommon compiler requires you to do as such, just to make the most of it, and/or any other tools you use with it.

People don't write PS3 games that can be compiled to run on an XBox1 just because they have the idea that supporting multiple platforms and/or architectures is a good idea. Aside from business interference with programmer theory, you may never intend on writing it for the XBox1 anyways; why write code that helps you transfer it over to XBox and have a bunch of precompiler code branches that compile different parts of code when you don't need any of it?

edit: I will accept the fact that he has his own opinion about this, and under certain circumstances, it might matter for what he's talking about, but to generalize it, makes it much more of an opinion based matter, rather than something to do with good/bad practice.
« Last Edit: December 20, 2014, 02:02:21 am by ArkPhaze »
sig=: ArkPhaze

[ J/ASM/.NET/C/C++ - Software Engineer ]