Hi everybody, I've been experimenting lately with the great FFTW library in C, and I've just encountered some very strange behavior. I have a 1024x1024 image (w=1024, h=1024), on which I want to do the following processing: a) Perform 1-D FFTs on every column of the image array b) Perform 1-D FFTs on every row of the same image array c) more computations using the outputs of the two previous steps ... The image is stored in RAM following a row-major order. I've read the FFTW documentation and I came up with the following two plans that make use of the advanced API: fftw_p = fftw_plan_many_dft_r2c (1, &h, w, fftw_in, NULL, w, 1, fftw_out, NULL, w, 1, FFTW_MEASURE); fftw_p_T = fftw_plan_many_dft_r2c (1, &w, h, fftw_in, NULL, 1, fftw_out_T, NULL, 1, h, FFTW_MEASURE); So fftw_in is real (double), of size w*h and there are two output arrays, fftw_out and fftw_out_T, both of size w*h/2+1. I first create the plans, then I load the image into fftw_in, and then I run the two plans, one after the other: fftw_execute(fftw_p); fftw_execute(fftw_p_T); The strange thing is that the second transform produces wrong results. If I comment out the first transform, then the second one works fine. My original version used a complex2complex transform. That one did work like a charm. So it must be something related to the real2complex version of the plans. I have tried the following: 1. switch to standard malloc instead of fftw_malloc for the input array 2. Use FFTW_ESTIMATE instead of FFTW_MEASURE 3. As ub 2m but also move the creation of the 2nd plan after the execution of the 1st transform 4. As in 3, but also destroy the 1st plan before executing the 2nd one 5. As in 4, but also call fftw_cleanup() in between 6. I checked out the values of w and h before and after executing the 1st transform, and they are always equal to 1024. None of these steps helped me in any way. If I execute the 1st transform, the 2nd one goes always wrong. If I dont, the 2nd one goes fine. Furthermore, the whole idea of planning first, is that I want to reuse the plans for real-time processing of a stream of images, so destroying and cleaning up is not a realistic option anyway. Does anybody have an idea what am I doing wrong? I'm very tempted to start shouting that I've discovered a bug, but simple reasoning dictates that I'm just a newbie in FFTW and I'm doing something incredibly stupid. Just for the record, I'm working on an Intel Core 2 Duo T7200, running 32-bit Debian GNU/Linux, GCC version 4.3.3, and FFTW 3.1.2. Thank you, Dimitris

# FFTW: successive real2complex 1-D FFTs fail

Started by ●July 10, 2009

Reply by ●July 10, 20092009-07-10

> > I have a 1024x1024 image (w=1024, h=1024), on which I want to do the > following processing: > > a) Perform 1-D FFTs on every column of the image array > b) Perform 1-D FFTs on every row of the same image array > c) more computations using the outputs of the two previous steps ... > > The image is stored in RAM following a row-major order. > > I've read the FFTW documentation and I came up with the following two > plans that make use of the advanced API: > > fftw_p = fftw_plan_many_dft_r2c (1, &h, w, fftw_in, NULL, w, 1, fftw_out, > NULL, w, 1, FFTW_MEASURE); > > fftw_p_T = fftw_plan_many_dft_r2c (1, &w, h, fftw_in, NULL, 1, fftw_out_T, > NULL, 1, h, FFTW_MEASURE); > > So fftw_in is real (double), of size w*h and there are two output arrays, > fftw_out and fftw_out_T, both of size w*h/2+1. >Maybe this is related to C memory management? I haven't used this API before but just some thoughts. It seems in fftw_p, h is used as a reference because of '&', same as &w in fftw_p_T. When you first use &h in fftw_h, h is at initialized correctly (because h should be a pointer type). After fftw_p, does h move to the end of a memory chunk before you you use it in fftw_p_T? That would explain why you can use fftw_p_T alone correctly because nothing (the h pointer) gets moved.

Reply by ●July 10, 20092009-07-10

>Hi everybody, > >I've been experimenting lately with the great FFTW library in C, andI've>just encountered some very strange behavior. > >I have a 1024x1024 image (w=1024, h=1024), on which I want to do the >following processing: > >a) Perform 1-D FFTs on every column of the image array >b) Perform 1-D FFTs on every row of the same image array >c) more computations using the outputs of the two previous steps ... > >The image is stored in RAM following a row-major order. > >I've read the FFTW documentation and I came up with the following two >plans that make use of the advanced API: > >fftw_p = fftw_plan_many_dft_r2c (1, &h, w, fftw_in, NULL, w, 1,fftw_out,>NULL, w, 1, FFTW_MEASURE); > >fftw_p_T = fftw_plan_many_dft_r2c (1, &w, h, fftw_in, NULL, 1,fftw_out_T,>NULL, 1, h, FFTW_MEASURE); > >So fftw_in is real (double), of size w*h and there are two outputarrays,>fftw_out and fftw_out_T, both of size w*h/2+1. > >I first create the plans, then I load the image into fftw_in, and then I >run the two plans, one after the other: > >fftw_execute(fftw_p); >fftw_execute(fftw_p_T); > >The strange thing is that the second transform produces wrong results.If>I comment out the first transform, then the second one works fine. > >My original version used a complex2complex transform. That one did work >like a charm. So it must be something related to the real2complexversion>of the plans. > >I have tried the following: > >1. switch to standard malloc instead of fftw_malloc for the input array >2. Use FFTW_ESTIMATE instead of FFTW_MEASURE >3. As ub 2m but also move the creation of the 2nd plan after theexecution>of the 1st transform >4. As in 3, but also destroy the 1st plan before executing the 2nd one >5. As in 4, but also call fftw_cleanup() in between >6. I checked out the values of w and h before and after executing the1st>transform, and they are always equal to 1024. > >None of these steps helped me in any way. If I execute the 1sttransform,>the 2nd one goes always wrong. If I dont, the 2nd one goes fine. > >Furthermore, the whole idea of planning first, is that I want to reusethe>plans for real-time processing of a stream of images, so destroying and >cleaning up is not a realistic option anyway. > >Does anybody have an idea what am I doing wrong? I'm very tempted tostart>shouting that I've discovered a bug, but simple reasoning dictates thatI'm>just a newbie in FFTW and I'm doing something incredibly stupid. > >Just for the record, I'm working on an Intel Core 2 Duo T7200, running >32-bit Debian GNU/Linux, GCC version 4.3.3, and FFTW 3.1.2. > >Thank you, > >Dimitris > > >It looks like your second plan function is missing an argument?

Reply by ●July 10, 20092009-07-10

labis <dlampridis@gmail.com> wrote: < I've been experimenting lately with the great FFTW library in C, and I've < just encountered some very strange behavior. < I have a 1024x1024 image (w=1024, h=1024), on which I want to do the < following processing: < a) Perform 1-D FFTs on every column of the image array < b) Perform 1-D FFTs on every row of the same image array < c) more computations using the outputs of the two previous steps ... Isn't the output of the first transform complex, such that the second has to be a complex transform? < The image is stored in RAM following a row-major order. I do wonder if a cache optimal transpose in between would be faster overall. That would depend on the cache properties of the transforms. -- glen

Reply by ●July 11, 20092009-07-11

On Jul 10, 9:53�am, "labis" <dlampri...@gmail.com> wrote:> fftw_p = fftw_plan_many_dft_r2c (1, &h, w, fftw_in, NULL, w, 1, fftw_out, > NULL, w, 1, FFTW_MEASURE); > > fftw_p_T = fftw_plan_many_dft_r2c (1, &w, h, fftw_in, NULL, 1, fftw_out_T, > NULL, 1, h, FFTW_MEASURE);The second call is missing an argument, I assume that should be "1,h" before fftw_out? I guess from the arguments that your fftw_in array is h-by-w in row-major order (so that the w direction is contiguous). In any case, the stride/dist parameters for the output arrays don't look right. See: http://www.fftw.org/fftw3_doc/Advanced-Real_002ddata-DFTs.html For, the first transform, where you transform the length-h columns of your h-by-w input array, will have an output array that is (h/2+1)-by- w. Thus, the input is stride=w, dist=1, and the output is stride=w, dist=1, and your parameters above are correct. However, the second transform, where you transform the length-w rows, has h-by-w input and h-by-(w/2+1) output. The input will have stride=1, dist=w, and the output should have stride=1, dist=(w/2+1). Passing "1,h" makes no sense at all to me. So, it looks like your parameters are incorrect, and your second transform is likely to stop randomly over memory, with unpredictable results. Regards, Steven G. Johnson

Reply by ●July 11, 20092009-07-11

On Jul 10, 7:11�pm, Verictor <stehu...@gmail.com> wrote:> > I have a 1024x1024 image (w=1024, h=1024), on which I want to do the > > following processing: > > > a) Perform 1-D FFTs on every column of the image array > > b) Perform 1-D FFTs on every row of the same image array > > c) more computations using the outputs of the two previous steps ... > > > The image is stored in RAM following a row-major order. > > > I've read the FFTW documentation and I came up with the following two > > plans that make use of the advanced API: > > > fftw_p = fftw_plan_many_dft_r2c (1, &h, w, fftw_in, NULL, w, 1, fftw_out, > > NULL, w, 1, FFTW_MEASURE); > > > fftw_p_T = fftw_plan_many_dft_r2c (1, &w, h, fftw_in, NULL, 1, fftw_out_T, > > NULL, 1, h, FFTW_MEASURE); > > > So fftw_in is real (double), of size w*h and there are two output arrays, > > fftw_out and fftw_out_T, both of size w*h/2+1. > > Maybe this is related to C memory management? I haven't used this API > before but just some thoughts. >This is indeed one possibility that I've been considering.> It seems in fftw_p, h is used as a reference because of '&', same as > &w in fftw_p_T. When you first use &h in fftw_h, h is at initialized > correctly (because h should be a pointer type). After fftw_p, does h > move to the end of a memory chunk before you you use it in fftw_p_T? > That would explain why you can use fftw_p_T alone correctly because > nothing (the h pointer) gets moved.h and w are declared as integers, that is why I reference them when a pointer is needed. I also thought that the execution of the plan messes up the h and w values, that is why I checked their values before and after the execution of the plans. They are always equal to 1024. One more thing that I forgot to mention is that I have tried duplicating the input, so that each plan uses a different pointer (and hence memory location) for the input array. Thanks you, Dimitris

Reply by ●July 11, 20092009-07-11

On Jul 10, 8:37�pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:> labis <dlampri...@gmail.com> wrote: > > < I've been experimenting lately with the great FFTW library in C, and I've > < just encountered some very strange behavior. > > < I have a 1024x1024 image (w=1024, h=1024), on which I want to do the > < following processing: > > < a) Perform 1-D FFTs on every column of the image array > < b) Perform 1-D FFTs on every row of the same image array > < c) more computations using the outputs of the two previous steps ... > > Isn't the output of the first transform complex, such that the > second has to be a complex transform? >No, I keep the first output, and then I execute a second transform on the original input (the image) again. So both of them have (the same) real input and a complex output.> < The image is stored in RAM following a row-major order. > > I do wonder if a cache optimal transpose in between would be > faster overall. �That would depend on the cache properties of > the transforms. >I'm not familiar with the term, I'll look into it, thanks. Dimitris

Reply by ●July 11, 20092009-07-11

On Jul 10, 11:25�pm, labis <dlampri...@gmail.com> wrote:> On Jul 10, 7:11�pm, Verictor <stehu...@gmail.com> wrote: > > > > > > > > I have a 1024x1024 image (w=1024, h=1024), on which I want to do the > > > following processing: > > > > a) Perform 1-D FFTs on every column of the image array > > > b) Perform 1-D FFTs on every row of the same image array > > > c) more computations using the outputs of the two previous steps ... > > > > The image is stored in RAM following a row-major order. > > > > I've read the FFTW documentation and I came up with the following two > > > plans that make use of the advanced API: > > > > fftw_p = fftw_plan_many_dft_r2c (1, &h, w, fftw_in, NULL, w, 1, fftw_out, > > > NULL, w, 1, FFTW_MEASURE); > > > > fftw_p_T = fftw_plan_many_dft_r2c (1, &w, h, fftw_in, NULL, 1, fftw_out_T, > > > NULL, 1, h, FFTW_MEASURE); > > > > So fftw_in is real (double), of size w*h and there are two output arrays, > > > fftw_out and fftw_out_T, both of size w*h/2+1. > > > Maybe this is related to C memory management? I haven't used this API > > before but just some thoughts. > > This is indeed one possibility that I've been considering. > > > It seems in fftw_p, h is used as a reference because of '&', same as > > &w in fftw_p_T. When you first use &h in fftw_h, h is at initialized > > correctly (because h should be a pointer type). After fftw_p, does h > > move to the end of a memory chunk before you you use it in fftw_p_T? > > That would explain why you can use fftw_p_T alone correctly because > > nothing (the h pointer) gets moved. > > h and w are declared as integers, that is why I reference them when a > pointer is needed. > > I also thought that the execution of the plan messes up the h and w > values, that is why > I checked their values before and after the execution of the plans. > They are always equal to 1024. > > One more thing that I forgot to mention is that I have tried > duplicating the input, so that each plan > uses a different pointer (and hence memory location) for the input > array. > > Thanks you, > Dimitris- Hide quoted text - > > - Show quoted text -Hmm... things get a little bit interesting: according to Steven J, you had the second function call incorrectly paramterized but you said you got correct results if commenting out the first function call. Something must be missing. If I assume you got the second function paramters right, I think the problem is still memory management although you said you had already used different memory locations (pointers). Does the API use free() (I remember you said this is a C library) for a whole 1024 memory? Since you said complex2complex version was correct but not real2complex, how does real data struct differ to complex, I assum this: struct complex { double real_part; double image_part; } complex; for real, is just double real; or struct real { double real_part; double image_part = 0.0; } real; This structs can be found in the source code (as least I think so). You can consequently look at how memory is handled between complex2complex and real2complex. Need to make sure how things work before claiming there are bugs in the source code too. Try to use simpler input sequence to verify that. Well, if you did make mistakes in assigning paramters to the second function but got lucky so that the resules were correct, then that is another story.

Reply by ●July 13, 20092009-07-13

On Jul 11, 7:48�am, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:> For, the first transform, where you transform the length-h columns of > your h-by-w input array, will have an output array that is (h/2+1)-by- > w. �Thus, the input is stride=w, dist=1, and the output is stride=w, > dist=1, and your parameters above are correct. > > However, the second transform, where you transform the length-w rows, > has h-by-w input and h-by-(w/2+1) output. �The input will have > stride=1, dist=w, and the output should have stride=1, dist=(w/2+1). > Passing "1,h" makes no sense at all to me. > > So, it looks like your parameters are incorrect, and your second > transform is likely to stop randomly over memory, with unpredictable > results.Steven, Thank you very much for your feedback, and the time you spent on understanding my problem. You are indeed correct and the odist of the second plan should be w/ 2+1. My mistake was that when I switched from c2c to r2c transforms, I thought that the reduced size (n/2+1) of the output arrays will only affect my fftw_malloc() calls. I never thought about the plans. (or maybe I did, but only for the first one, which happens to use the same as in the c2c case, don't remember anymore). Anyway, a big thanks again, to you and everybody else who took the time to answer! Dimitris