[IPOL discuss] Best Practices for Scientific Computing

Pascal Getreuer getreuer at gmail.com
Tue Oct 9 03:12:18 CEST 2012


Hi Miguel,

You are right that much existing C++ code uses low-level constructs
such as pointers and explicit memory management.  However, it is
possible to do nearly everything without them through the strategic
use of STL and other data container classes.

If I understand correctly, the principalComponents() has the effect of
creating an LWImage<float> object from some predefined data stored in
a static array "pca".  Your objection is that the trick

   num_elements = sizeof(array) / sizeof(array*)

is too low level.  The problem is that static arrays themselves are
low level antiquities from C.  In C++, the common advice is to use STL
vector instead.

But then how to initialize a vector with predefined data?  Several
methods are discussed here:

   http://stackoverflow.com/questions/2236197/c-easiest-way-to-initialize-an-stl-vector-with-hardcoded-elements

Another way to go is loading the data from a file---no pointers or
explicit memory management needed:

std::ifstream f("data.txt");

if(!f.good())
   // throw or abort---failed to open file

int npcskept, psize;
f >> npcskept >> psize;

if(bIgnoreContrast)
    --npcskept;

LWImage<float> PCs(npcskept, psize);

for(int i=0; i < psize; i++) {
    if(bIgnoreContrast) {   // skip first column if ignoring contrast
        float dummy;
        f >> dummy;
    }
    for(int j=0; j < npcskept; j++)
        f >> PCs[i][j];
}

Best,
Pascal


On Mon, Oct 8, 2012 at 8:16 AM, Miguel Colom <colom at cmla.ens-cachan.fr> wrote:
> I agree, you can hide the use of pointers, but trying to avoid the low-level
> characteristic of the language doesn't turns it into high-level.
>
> Let's look at some real IPOL C/C++ source code. For example, the
> Quasi-Euclidean Epipolar Rectification article.
>
> If we look, for example, at the MissStereo/src/stereoAC/main.cpp at function
> principalComponents, the code is the following:
>
> /// Matrix mapping patch to principal components
> static LWImage<float> principalComponents() {
> #include "dataStereo/pca_basis.dat"
>     int psize = sizeof(pca)/sizeof(*pca);
>     int npcskept = sizeof(pca[0])/sizeof(*pca[0]);
>     if(bIgnoreContrast)
>         --npcskept;
>     LWImage<float> PCs = alloc_image<float>(npcskept, psize);
>     float* pc = PCs.data;
>     for(int i=0; i < psize; i++)
>         for(int j=0; j<npcskept; j++)
>             *pc++ = pca[i][j+(bIgnoreContrast? 1: 0)];
>     return PCs;
> }
>
> The line "int psize = sizeof(pca)/sizeof(*pca);" includes:
> - Getting the size of a pointer.
> - Dereferencing the pointer and computing its size.
> - Make a division to know the size of the pointer.
>
> I think this is quite low-level!
> The same for "int npcskept = sizeof(pca[0])/sizeof(*pca[0]);"
>
> Moreover, if we look carefully at the function, we see that it creates an
> object  PCs that is returned at the end. But inside a function, this local
> object should be destroyed after the return, and therefore it shouldn't be
> valid anymore outside the function.
>
> It works because the function is declared to be "static", so the variables
> at the heap memory will be still allocated when it finished.
>
> So, finally this function needs:
> - Non-evident pointer arithmetic.
> - Knowledge about how the heap memory works.
> - Knowledge about the static functions and how they're stored in the memory.
> - Dynamic memory allocation.
> - Keep in mind that the allocated memory should be destroyed afterward,
> because of the leak of a garbage collector.
>
> This function is absolutely correct and of course it work perfectly. But
> it's clear that you need some low-level details to understand it or try to
> modify it.
>
> And I think that is very easy to find examples like that in most of the
> C/C++ code.
>
>
> Best,
> Miguel
>
> Quoting Pascal Monasse <monasse at imagine.enpc.fr>:
>
>> No, in many situations you do not have to manipulate pointers in C++. One
>> example:
>>
>> #include <vector>
>> #include <iostream>
>> int main() {
>>         std::vector<int> vec;
>>         for(int i=0; i<10000; i++)
>>                 vec.push_back(i);
>>         for(int i=0; i<vec.size(); i++)
>>                 std::cout << vec[i] << std::endl;
>> }
>>
>> It builds an array of 10000 integers and I do not have to care about
>> allocation, delete, bound-checking...
>>
>> Pascal
>>
>> On Monday, October 08, 2012 12:07:19 PM Miguel Colom wrote:
>>>
>>> Well, nowadays the terms "high-level" and "low-level" are relative and
>>> they're redefined with the characteristics of the new languages.
>>>
>>> Of course, when C was developed, it was considered a very high-level
>>> language, because the alternative was to use assembler code. Even
>>> assembler was considered once high-level compared to coding directly
>>> with ones and zeros using punched cards!
>>>
>>> It's true that it's possible to use compiler flags to check the bounds
>>> of the arrays, but you have still to deal with memory pointers and
>>> memory allocation.
>>>
>>> A bad pointer allocation in a compiled C/C++ can happen easily and
>>> errors in the memory allocation are also common (a double free, a bad
>>> pointer in the free, ...)
>>> Since most operations involve pointers, there's plenty of situations
>>> in which they can lead to bad behavior, sometimes in a silent and
>>> difficult to trace way.
>>>
>>> Of course, there're compiler flags to check array bounds, like Pascal
>>> mentioned,  and also to check the memory assignment/freeing/region
>>> overlapping/etc, such as Valdrind and others.
>>>
>>> But, in my opinion, the very existence of these tools means that the
>>> language is low-level. And even the most simple programs need to use
>>> pointer arithmetic and memory allocation.
>>>
>>> However, I agree that for us in IPOL C/C++ is the best alternative,
>>> for the reason that have been already discussed.
>>>
>>> Best,
>>> Miguel
>>>
>>> Quoting Pascal Monasse <monasse at imagine.enpc.fr>:
>>> > Hi all,
>>> >
>>> > I agree totally with Pascal, it is a good summary of the constraints we
>>> > have in the choice of language.
>>> >
>>> > I also think that C++ permits high-level programming, depending on
>>> > your choice
>>> > of structures. For the most important structure, an array, you can use
>>> > it
>>> > as in C, with explicit memory allocation and deletion, but you can also
>>> > use the vector class of STL, which has an optional bound checking
>>> > (though
>>> > the way to enable it is not in the standard):
>>> > - automatic in debug mode under Windows with Visual C++
>>> > - with compiler flag -D_GLIBCXX_DEBUG with gcc
>>> > These options are not on by default, because you always have to choose:
>>> > 1. bound-checking (slow, safe)
>>> > 2. no bound-checking (fast, dangerous)
>>> > Whatever the language, you cannot have fast+safe. It is true that most
>>> > of
>>> > the time option 2 is chosen, but the programmer *can* choose option 1.
>>> > In
>>> > this sense, C++ can be used as a high level language.
>>> >
>>> > Best,
>>> > Pascal
>>> >
>>> > On Friday, October 05, 2012 08:12:53 PM Pascal Getreuer wrote:
>>> >> Dear all,
>>> >>
>>> >> > Write code in the highest-level language possible.: Are we doing
>>> >> > that?
>>> >>
>>> >> Reading between the lines, Jean-Michel's question highlights a
>>> >> limitation on IPOL.  By requiring C/C++, we prevent authors from
>>> >> choosing the highest-level language for the application (MATLAB,
>>> >> Mathematica, Java, or Python, for example).  We are pretty much making
>>> >> the choice for them, C++ notwithstanding.
>>> >>
>>> >> It would be really awesome if some day IPOL could accept code in more
>>> >> languages.  There are challenges though.
>>> >>
>>> >> Foremost, languages should preferably be well-known, portable, and
>>> >> free.  If a language is not free (MATLAB...) or not portable
>>> >> (Microsoft VBA...), it locks out potential IPOL readers from using it.
>>> >>
>>> >>  And if the language is less well-known, it would be harder to find
>>> >>
>>> >> reviewers.
>>> >>
>>> >> There are also technical considerations.  Languages need to be stable
>>> >> and efficient on the demo servers.  Specific maintenance to individual
>>> >> programs is not possible.  This reminds me of the discussions we have
>>> >> had about libraries :-\.
>>> >>
>>> >> So the answer is "no", we do not live up to the ideal of allowing "the
>>> >> highest-level language possible."  It is still a pretty good balance
>>> >> of the philosophies.  The Software Guidelines advises C/C++ by default
>>> >> and to consider other languages only if they are ISO-standardized---I
>>> >> agree this is the way to go.
>>> >>
>>> >> Best,
>>> >> Pascal
>>> >>
>>> >> On Fri, Oct 5, 2012 at 4:23 PM, Miguel Colom
>>> >> <colom at cmla.ens-cachan.fr>
>>> >
>>> > wrote:
>>> >> > Quoting Pierre Moulon <pmoulon at gmail.com>:
>>> >> >> And for me C++ is already a High-Level language.
>>> >> >> It offer high level programming option (concepts, traits, C++11:
>>> >> >> functors,
>>> >> >> lambda), but if you want you can go to the deep/dark side
>>> >> >> (intrinsic,
>>> >> >> pointer arithmetic...).
>>> >> >
>>> >> > Well, for me it's not a high-level language, because:
>>> >> > - The programmer has to deal with the calls of the memory
>>> >> > allocation/freeing and pointers to actual memory zones are returned.
>>> >> > -
>>> >> > The references to the objects/data are just pointers that contain
>>> >> > actual
>>> >> > memory addresses.
>>> >> >
>>> >> > In my opinion, this is enough to say C/C++ is low-level.
>>> >> >
>>> >> > Think, for example, in the case of declaring an array of 100
>>> >> > positions.
>>> >> > If you try to access position 123, a Python script would say that
>>> >> > you're
>>> >> > trying to access a position beyond the bounds of the array. In C/C++
>>> >> > it
>>> >> > would cause a SegFault exception, in case it is detected by the OS.
>>> >> > If
>>> >> > the buffer overwrite does not produce a segment violation, it
>>> >> > wouldn't
>>> >> > be
>>> >> > either detected.
>>> >> >
>>> >> > The majority of the errors in a C/C++ program are caused by trying
>>> >> > to
>>> >> > access memory not allocated. This gives an idea of the low-level or
>>> >> > how
>>> >> > close the language is to the CPU.
>>> >> >
>>> >> > Indeed, it's not casual that the Linux kernel and device drivers are
>>> >> > written in C, since it's very low-level.
>>> >> >
>>> >> > Best,
>>> >> > Miguel
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > discuss mailing list
>>> >> > discuss at list.ipol.im
>>> >> > https://tools.ipol.im/mailman/listinfo/discuss
>>> >>
>>> >> _______________________________________________
>>> >> discuss mailing list
>>> >> discuss at list.ipol.im
>>> >> https://tools.ipol.im/mailman/listinfo/discuss
>>> >
>>> > --
>>> > Pascal Monasse
>>> > IMAGINE, Ecole des Ponts ParisTech
>>> > Tel: 01 64 15 21 76
>>> > _______________________________________________
>>> > discuss mailing list
>>> > discuss at list.ipol.im
>>> > https://tools.ipol.im/mailman/listinfo/discuss
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at list.ipol.im
>>> https://tools.ipol.im/mailman/listinfo/discuss
>>
>> --
>> Pascal Monasse
>> IMAGINE, Ecole des Ponts ParisTech
>> Tel: 01 64 15 21 76
>> _______________________________________________
>> discuss mailing list
>> discuss at list.ipol.im
>> https://tools.ipol.im/mailman/listinfo/discuss
>>
>
>
> _______________________________________________
> discuss mailing list
> discuss at list.ipol.im
> https://tools.ipol.im/mailman/listinfo/discuss


More information about the discuss mailing list