FreeGameDev Planet - Development

Also check out the games planet.

March 28, 2015

corsix-th

GCC target confusion

Not blogging for two years has left me with quite a lot of things that I'd like to get off my chest. I don't plan on making a habit of negative-outlook posts, but blogging is partially for me to let off steam, and partially for writing things things that other people might find interesting, and today probably leans toward the former.

I've tried submitting a two-character diff to clang, and instructions for reproducing a socket-timeout bug in OpenJDK, both to no avail. I've not personally submitted any bugs to gcc, but one of my friends reported a pretty nasty g++ code-generation bug three years ago, again to no avail. I'm not trying to argue that reporting bugs to open source projects is a waste of time, but I am trying to argue that public shaming is perhaps occasionally a viable alternative approach, and today's unlucky victim is gcc.

The first thing I'd like to point out is that asking gcc to create a precompiled header from stdin will result in a segfault:

peter@euphraios:~$ echo | gcc -x c-header - ; echo $?
<stdin>:1:0: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.8/README.Bugs> for instructions.
1

This is somewhat embarrasing, but maybe I'm the only person in the world who wants to create a precompiled header from stdin. Then again, on OSX, where gcc is an alias for clang, this works just fine:

[littlegate /Users/corsix]$ echo | gcc -x c-header - ; echo $?
0

The second thing I'd like to point out is the g++ code-generation bug linked above. Using gcc, the following small program exits with a return code of zero:

peter@euphraios:~$ cat test.cpp
struct Counter {
  Counter operator++ () { ++value; return *this; }
  int value;
};

int main (int argc, char *[]) {
  int a[] = {0}, b[] = {0};
  Counter c = {0};
  (void)((argc > 1) ? a : b)[++c, 0];  
  return c.value;
}

peter@euphraios:~$ g++ test.cpp ; ./a.out ; echo $?
0

This is clearly a code-generation bug; ++c should always happen, therefore ++value should always happen, therefore c.value should evaluate to one rather than zero. Again, when on a platform where g++ is an alias for clang, the same program correctly exits with a return code of one:

[littlegate /Users/corsix]$ g++ test.cpp ; ./a.out ; echo $?
1

I'm really rather surprised that this bug has been sitting around known to the g++ developers for the last three years; I'd have thought that if you're a compiler developer, then code generation bugs are both the most serious and most interesting kind of bugs to fix (at least the V8 developers seem to share this point of view).

The third and final thing I'd like to point out relates to the title of this blog post. Let us begin with a piece of code which tries to call __builtin_ia32_vzeroall. Obviously, as this function isn't defined, calling it results in a compile-time error:

peter@euphraios:~$ cat test.c
void f() { __builtin_ia32_vzeroall(); }
peter@euphraios:~$ g++ -c test.c ; echo $?
test.c: In function ‘void f()’:
test.c:1:36: error: ‘__builtin_ia32_vzeroall’ was not declared in this scope
 void f() { __builtin_ia32_vzeroall(); }
                                    ^
1

My previous statement was a slight lie. As you might have guessed, __builtin_ia32_vzeroall isn't a random name which I pulled out of thin air. In fact, if we pass -mavx, then it becomes a predeclared function, and we can call it without issue:

peter@euphraios:~$ cat test.c
void f() { __builtin_ia32_vzeroall(); }
peter@euphraios:~$ g++ -mavx -c test.c ; echo $?
0

Maybe we don't want to apply -mavx to our entire source file, and instead we just want to apply it to certain functions. One way of doing this is via __attribute__((target)). In the following example, we apply this attribute to f but not to h, and as such, the compiler is fine with the call within f, but raises an error about the call within h:

peter@euphraios:~$ cat test.c
void f() __attribute__((__target__("avx")));
void f() { __builtin_ia32_vzeroall(); }
void h() { __builtin_ia32_vzeroall(); }
peter@euphraios:~$ g++ -c test.c ; echo $?
test.c: In function ‘void h()’:
test.c:3:37: error: ‘__builtin_ia32_vzeroall’ needs isa option -m32
 void h() { __builtin_ia32_vzeroall(); }
                                     ^
1

I'm somewhat unsettled about function attributes affecting which symbols are valid, but hey, let's run with it for now.

If __attribute__((target)) isn't your cup of tea, then #pragma GCC target can be used to the same effect. Again, in the following example, the call within g is perfectly fine, but the call within h is not:

peter@euphraios:~$ cat test.c
#pragma GCC push_options
#pragma GCC target("avx")
void g() { __builtin_ia32_vzeroall(); }
#pragma GCC pop_options
void h() { __builtin_ia32_vzeroall(); }
peter@euphraios:~$ g++ -c test.c ; echo $?
test.c: In function ‘void h()’:
test.c:5:37: error: ‘__builtin_ia32_vzeroall’ needs isa option -m32
 void h() { __builtin_ia32_vzeroall(); }
                                     ^
1

What perplexes me is that if you combine both of these approaches to locally enabling -mavx, then it seems to become enabled globally. In the following example, __attribute__((target)) is applied to f, #pragma GCC target is applied to g, and then by the time we reach h, gcc seems to have suffered target confusion and accepts __builtin_ia32_vzeroall even though it shouldn't be valid:

peter@euphraios:~$ cat test.c
void f() __attribute__((__target__("avx")));
void f() { __builtin_ia32_vzeroall(); }
#pragma GCC push_options
#pragma GCC target("avx")
void g() { __builtin_ia32_vzeroall(); }
#pragma GCC pop_options
void h() { __builtin_ia32_vzeroall(); }
peter@euphraios:~$ g++ -c test.c ; echo $?
0

March 28, 2015 12:00 AM

March 25, 2015

corsix-th

Everything is broken

I'd like to tell you a story. Like all good stories, it starts with a stack trace obtained from a heap dump:

"jenkins.util.Timer [#6]" daemon prio=5 tid=34 RUNNABLE
  at java.net.SocketInputStream.socketRead0(Native Method)
  at java.net.SocketInputStream.read(SocketInputStream.java:129)
  at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
...
  at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
  at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
...
  at java.io.FilterInputStream.read(FilterInputStream.java:116)
  at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2676)
  at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
...
  at java.io.InputStreamReader.read(InputStreamReader.java:167)
  at java.io.Reader.read(Reader.java:123)
  at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2001)
  at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1980)
  at org.apache.commons.io.IOUtils.copy(IOUtils.java:1957)
  at org.apache.commons.io.IOUtils.toString(IOUtils.java:819)
  at org.kohsuke.github.Requester.parse(Requester.java:384)
...
  at org.kohsuke.github.Requester$1.hasNext(Requester.java:295)
  at org.kohsuke.github.PagedIterator.fetch(PagedIterator.java:44)
  at org.kohsuke.github.PagedIterator.hasNext(PagedIterator.java:32)
  at org.kohsuke.github.PagedIterable.asList(PagedIterable.java:19)
  at org.kohsuke.github.GHRepository.getPullRequests(GHRepository.java:504)
  at org.jenkinsci.plugins.ghprb.GhprbRepository.check(GhprbRepository.java:82)
  at org.jenkinsci.plugins.ghprb.Ghprb.run(Ghprb.java:101)
  at org.jenkinsci.plugins.ghprb.GhprbTrigger.run(GhprbTrigger.java:149)
  at hudson.triggers.Trigger.checkTriggers(Trigger.java:265)
  at hudson.triggers.Trigger$Cron.doRun(Trigger.java:214)
  at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
...
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)

Obviously it is a Java stack trace, with java.net.SocketInputStream.socketRead0 at the top, then some I/O layers, then some GitHub layers, then some ghprb (GitHub pull-request builder) plugin layers, then hudson.triggers.Trigger$Cron.doRun, and finally some Java threading layers. What isn't shown is the C part of the stack frame above socketRead0, but all that's there is a call to recv, and after that it is Linux kernel stuff.

Let's start with Jenkins, the continuous integration server formerly known as Hudson. I don't have much love for it, but it is something that we use and live with. One of its responsibilities is to run jobs periodically, which it does by waking up once per minute, checking if there are any jobs which need doing, and if so, scheduling them. In other words, it reimplements cron. I'm telling you this because it is the background for the story: we observed that our Jenkins instance stopped scheduling periodic jobs, which is a pretty serious problem. The stack trace was taken from our Jenkins instance, and from it, we can deduce why Jenkins broke.

The hudson.triggers.Trigger$Cron.doRun frame on the stack means that we're looking at the code which runs every minute to check whether any jobs need doing. The frames immediately above it are GitHub-related, so clearly our Jenkins instance is talking to our GitHub Enterpise instance in order to establish whether any jobs need doing (because someone somewhere decided that a particular job should be triggered by events on GitHub, and decided to configure it via polling rather than webhooks). The frames above those are what you'd expect to see when Java code reads from a secure socket. Finally, in the piece of stack trace you can't see, the call to recv is blocking.

The brokenness starts with hudson.triggers.Trigger$Cron.doRun. If for some reason doRun doesn't terminate, then Jenkins' cron breaks. Given that doRun can end up calling almost anything, and does so in the same thread without any explicit timeout, this seems like a pretty big problem.

OK, so, why isn't doRun terminating? We can see that the ghbrp plugin is trying to communiate with GitHub over a socket. Let's give the plugin the benefit of the doubt and assume that it configures its sockets with a read-timeout so that reading won't block forever. After working our way through various layers, we end up at socketRead0 - a function which accepts a socket and a timeout value and tries to read from the socket. At this point, we hit our next piece of brokenness. JDK-8049846, summarised as the timeout argument to socketRead0 is sometimes ignored, or alternatively summarised as Java's socket I/O is broken.

OK, so, why is socketRead0 blocking forever even though it was passed a timeout? As noted at JDK-8049846, the implementation of socketRead0 is basically:

if (poll(socket, POLLIN, timeout))
  recv(socket, buffer);

In other words, ask poll to wait (with a timeout) until the socket is readable, and then read from it. No need to set O_NONBLOCK on the socket, because poll is used to check that a read won't block. What could possibly go wrong? Well, let's check the man page for poll:

Bugs

See the discussion of spurious readiness notifications under the BUGS section of select(2).

Let's check the man page for select:

Bugs

...

Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.

In other words, Linux's poll is broken: it can report that a socket is ready for reading when, in fact, it isn't.

OK, so, why did poll give us a spurious readiness notification on a socket? The socket in question was talking to our GitHub Enterprise instance, and, uh, GHE seems kind of flakey on the network side of things, so I wouldn't be surprised if it sent bad packets. In other words, GitHub Enterprise is broken (if anyone from GitHub is reading, please advise on how best to diagnose GitHub Enterprise being flaky). Also, while I'm wishing for things which won't happen: if any JDK maintainers are reading, please note that the current classification of JDK-8049846 as P4 (Minor loss of function, or other problem where easy workaround is present) is rather inaccurate: there is no workaround, and when it hits, the entire calling thread ceases to function. Of course, I'd be happier if you fixed the bug rather than fiddling around with priorities, but I'll take what I can get.

To conclude, everything is broken:

  • Jenkins' cron implementation is broken (one bad virtual function call can stop all cron activity).
  • Java's socket I/O is broken (JDK-8049846).
  • Linux's poll is broken (documented in the man pages).
  • GitHub Enterprise is broken (flakey network interface).

When all of these things break at the same time, your continuous integration server stops continuously integrating.

March 25, 2015 12:00 AM

March 24, 2015

corsix-th

It's been a while

It has been twenty five months, almost to the day, since I last wrote anything here. Regular readers, if there are any left, will notice a major visual overhaul and an absence of comments. The comments may well come back, but in the mean time, you can reach me via email at anything at corsix.org.

I've been quiet, but not been dead, for the last two years. Mostly I've been busy earning a wage, but I've also published a few things around the web:

March 24, 2015 12:00 AM

March 22, 2015

Castle Game Engine

New! Support for many texture compression methods on GPU, in particular for GPU compression formats ...

New! Support for many texture compression methods on GPU, in particular for GPU compression formats popular on Android devices. And improvements to handle GPU compressed textures throughout the whole engine. This allows to squeeze much more texture data into a GPU memory on Android/iOS, and in general makes using GPU compressed textures much easier.

- New GPU compressed formats handled, see TGPUCompression enum type: http://michalis.ii.uni.wroc.pl/castle-engine-snapshots/docs/reference/html/CastleImages.html#TGPUCompression . We now cover popular compression formats on both OpenGL and OpenGL ES (Android, iOS). Previously we supported only S3TC.

- API improvements: All of previous S3TC-specific types and functions are extended to handle a much wider range of GPU compression types. And many more functions now accept GPU-compressed images, not just TCastleImage.

To use the compressed textures:

1. If you want to assume that the user graphic card will just be able to handle your GPU compression, then simply use the compressed textures.

They can be packed in DDS format. Be sure to flip them vertically (the engine can only automatically flip S3TC data, for the rest you have to manually flip them). We plan to implemt Khronos KTX format in the future to make this whole flipping mess obsolete.

2. To be prepared for various GPUs, you want to keep various versions of your textures (uncompressed, and maybe compressed with various algorithms). And use CastleImage.LoadImagePreprocess http://michalis.ii.uni.wroc.pl/castle-engine-snapshots/docs/reference/html/CastleImages.html#LoadImagePreprocess to redirect all image loading to use your compressed versions. Like the snippet below:

-------------------------
uses ...,
  CastleURIUtils, CastleGLUtils,
  CastleLog, CastleStringUtils,
  CastleFilesUtils, CastleWarnings;

procedure GPUTextureAlternative(var ImageUrl: string);
begin
  if IsPrefix(ApplicationData('animation/dragon/'), ImageUrl) then
  begin
    if GLFeatures = nil then
      OnWarning(wtMinor, 'GPUCompression', 'Cannot determine whether to use GPU compressed version for ' + ImageUrl + ' because the image is loaded before GPU capabilities are known') else
    if tcPvrtc1_4bpp_RGBA in
      GLFeatures.TextureCompression then
    begin
      ImageUrl :=
        ExtractURIPath(ImageUrl) +
        'compressed/pvrtc1_4bpp_rgba/' +
        ExtractURIName(ImageUrl) + '.dds';
      WritelnLog('GPUCompression',
        'Using compressed alternative ' + ImageUrl);
    end;
  end;
end;

initialization
  LoadImagePreprocess :=@GPUTextureAlternative;
end.
-------------------------

#gameengine #dds #pvrtc #atitc #gpu #texturecompression  

March 22, 2015 07:01 AM