demo file loading & saving with pointers.
category: code [glöplog]
waffle: In most cases a demotool saves the "resources" to a (binary) format-file, not the "demo.exe" itself. I think the later would cause problems with pointers again, unless you mean a final stage where you store the "demo.bin"-format file into the executable, then thats fine, but not the issue.
kb_: Is the pointer-code here, and does MapAll map the pointers?
I find the rest of the code to be a bit hard to understand, but I dont use hashes or compression or anything like that. Do you also have an own Map-class? Interesting code though.
kb_: Is the pointer-code here, and does MapAll map the pointers?
Code:
// copy (this could be better, but who cares about these few megabytes)
sU8 *mem=orig->MapAll();
file->Write(mem,e.PackedSize);
delete orig;
I find the rest of the code to be a bit hard to understand, but I dont use hashes or compression or anything like that. Do you also have an own Map-class? Interesting code though.
What I meant is that you don't need to account for serialization/deserialization too much, as long as you keep to values, arrays and pointers to other serializable objects. if you're good one partially specialized template function per data type that tags where in your type the pointers are should be enough for the system to a) catch all references when writing (basically depth-first search that skips everything that's already in the object list) and b) fix the pointers after reading. Which would make for a pretty lightweight serialization system (and you can use it for your own garbage collector, too! win-win scenario! :D)
Or you go full YOLO on read and just search&replace over the blob with your pointer table. I mean, what could possibly go wrong?
Or you go full YOLO on read and just search&replace over the blob with your pointer table. I mean, what could possibly go wrong?
rudi: In this particular case there is no pointer rewriting, I just showed as an example for mmap. MapAll() basically returns a pointer that guarantees you'll find the file contents there.
But as you see I just cast that address to the header type, and address+sizeof(header) to the directory entry type. The hashes are because I don't store full file names in the packfile - the directory just stores file name hashes, and it's sorted by hash so you can binary search in it. The whole point of the exercise was that an "open file" operation basically is a zero overhead operation :)
The rest of Masagin is a mixture of manual serialization (I mean, those 2D meshes really don't need long to load), but eg. the music player just passes the pointer to stb_vorbis_open_memory so stb_vorbis thinks it plays from memory when in reality it plays directly from Windows' disk cache :)
But as you see I just cast that address to the header type, and address+sizeof(header) to the directory entry type. The hashes are because I don't store full file names in the packfile - the directory just stores file name hashes, and it's sorted by hash so you can binary search in it. The whole point of the exercise was that an "open file" operation basically is a zero overhead operation :)
The rest of Masagin is a mixture of manual serialization (I mean, those 2D meshes really don't need long to load), but eg. the music player just passes the pointer to stb_vorbis_open_memory so stb_vorbis thinks it plays from memory when in reality it plays directly from Windows' disk cache :)
Have we even established _why_ OP wants to shoot himself in the foot yet? ;)
Is this for some uber-weird intro? Because, if it's really a demo, I'm with plek and t$ here. I'd just put the assets in a nice directory structure and either have a loading script or even some standardizied naming convention that both is simple and flexible to handle the loading into whatever asset slots you need. OR you go with an external tool that does the packing, serialization, and pointer data gathering for you. :)
Is this for some uber-weird intro? Because, if it's really a demo, I'm with plek and t$ here. I'd just put the assets in a nice directory structure and either have a loading script or even some standardizied naming convention that both is simple and flexible to handle the loading into whatever asset slots you need. OR you go with an external tool that does the packing, serialization, and pointer data gathering for you. :)
kb: Interesting that it's faster on Windows; it certainly is slower on Linux, at least in all cases where I've had to deal with it. Same with iOS.
In addition to the higher overhead from going through the page fault mechanism (which is pretty small, IIRC), you get very quickly into problems of readahead/readaround; it's much harder for the OS to figure out that you want the entire file than if you just issue the request explicitly. (Linux has some very complicated heuristics here; iOS just does 32 kB blocks with no readahead at all, IIRC.) Having a fast SSD reduces some of that overhead, but for rotating rust or cheap flash (e.g. mobile phones, where each request typically costs you 1 ms or so), it's quite icky.
Plus you get into all sorts of other practical issues for advanced usage: Address space is limited on 32-bit, page tables take lots of space if you try to map large amounts of data (in the terabytes; probably not relevant for demos), it's much harder to deal with compressed or authenticated data (although userfaultfd in recent Linux helps a bit), errors are nearly impossible to recover from, you can't use huge pages… I've come to the conclusion that mmap is something that's fine for the dynamic loader doing your executable, but for everything else, I'm done with it. :-)
In addition to the higher overhead from going through the page fault mechanism (which is pretty small, IIRC), you get very quickly into problems of readahead/readaround; it's much harder for the OS to figure out that you want the entire file than if you just issue the request explicitly. (Linux has some very complicated heuristics here; iOS just does 32 kB blocks with no readahead at all, IIRC.) Having a fast SSD reduces some of that overhead, but for rotating rust or cheap flash (e.g. mobile phones, where each request typically costs you 1 ms or so), it's quite icky.
Plus you get into all sorts of other practical issues for advanced usage: Address space is limited on 32-bit, page tables take lots of space if you try to map large amounts of data (in the terabytes; probably not relevant for demos), it's much harder to deal with compressed or authenticated data (although userfaultfd in recent Linux helps a bit), errors are nearly impossible to recover from, you can't use huge pages… I've come to the conclusion that mmap is something that's fine for the dynamic loader doing your executable, but for everything else, I'm done with it. :-)
Sesse: Yeah, it really depends on your use case. For demo/game packfiles I found it very practical, but yeah, if you have huge datasets, you probably want to have better control over memory access anyway.
Btw, I just hacked some 20 lines of C++ (open giant file, mmap/memcpy vs ReadFile) and tested it on my work machine (6-Core Xeon, Samsung 950 Pro SATA) - there's not much of a difference anymore. Both were at about 450MB/sec uncached and 3.5GB/sec cached, so mmap really only has the advantage of possibly one less copy when loading. Interestingly I know that I get 560MB/s out of this machine when using uncached overlapped reads, so there's that. :)
Btw, I just hacked some 20 lines of C++ (open giant file, mmap/memcpy vs ReadFile) and tested it on my work machine (6-Core Xeon, Samsung 950 Pro SATA) - there's not much of a difference anymore. Both were at about 450MB/sec uncached and 3.5GB/sec cached, so mmap really only has the advantage of possibly one less copy when loading. Interestingly I know that I get 560MB/s out of this machine when using uncached overlapped reads, so there's that. :)
(neat Windows trick that I learned today: If you open a file with FILE_FLAG_NO_BUFFERING and close it again, it evicts the whole thing from cache :) )
I tried the mmap method outlined above a while ago and it works pretty well - as long as i was only referencing files located in the base directory of my visual studio project. As soon as i had files in a subdirectory (like a data folder) i ran into issues with MapViewOfFile failing due to some win7 file permission stuff. Any tips on that?
Sesse:
- did you try if FILE_FLAG_SEQUENTIAL_SCAN yields better performance when reading large continuous blocks?
- are you sure mmapping is only 32-bits? The win32 functions allow 64-bit sizes afaik (atleast the *Ex-functions)
- did you try if FILE_FLAG_SEQUENTIAL_SCAN yields better performance when reading large continuous blocks?
- are you sure mmapping is only 32-bits? The win32 functions allow 64-bit sizes afaik (atleast the *Ex-functions)
@spike: I don't use Windows. And the point was that you _don't_ necessarily read large contiguous blocks.
@kb: BTW, the last project where I explored this thoroughly was routing for Offline Maps at work, working on the compression and data loading parts. I've never had so much benefit from doing 64k stuff, ever. “What do you mean splitting the data into streams will be better for the packer?” “Well, you know, it works for MIDI…” (Also, you know, “ASCII packs quite well”.)
@kb: BTW, the last project where I explored this thoroughly was routing for Offline Maps at work, working on the compression and data loading parts. I've never had so much benefit from doing 64k stuff, ever. “What do you mean splitting the data into streams will be better for the packer?” “Well, you know, it works for MIDI…” (Also, you know, “ASCII packs quite well”.)
sesse: ok, i misread your post, I thought you were using Linux, iOS _and_ Windows. And the sequential flag stuff was referring to "it's much harder for the OS to figure out that you want the entire file". Which I might also have misinterpreted. Whatever.
Anyone used "__based pointer" ? And did it work okay?
ref: https://msdn.microsoft.com/en-us/library/57a97k4e.aspx
I think the problem is not that i use linked list, but that I still get some wrong relative address. There may be other parts in my code that is buggy. Code is a bit too complicated to use maps, or i dont know how to implement it yet.
current code for saving is: UINT address = (UINT)vs->GetAddress(); UINT relativeAddress = address - (UINT)GetProcessHeap(); storetofile(pointer);
and for loading: readfromfile(relativeAddress) pointer = relativeAddress + (UINT)GetProcessHeap(); vs->SetAddress(pointer);
I cannot be sure if the calculated relative address is correct for the Heap's base address? This is what the pointers are dependent on.
ref: https://msdn.microsoft.com/en-us/library/57a97k4e.aspx
I think the problem is not that i use linked list, but that I still get some wrong relative address. There may be other parts in my code that is buggy. Code is a bit too complicated to use maps, or i dont know how to implement it yet.
current code for saving is: UINT address = (UINT)vs->GetAddress(); UINT relativeAddress = address - (UINT)GetProcessHeap(); storetofile(pointer);
and for loading: readfromfile(relativeAddress) pointer = relativeAddress + (UINT)GetProcessHeap(); vs->SetAddress(pointer);
I cannot be sure if the calculated relative address is correct for the Heap's base address? This is what the pointers are dependent on.
Come to think of it, i guess the problem with savning and loading the pointer is still the same wether or not it is a __based pointer or regular pointer. The example code i am working on only stores a couple of pointers and not a linked list of pointers. However if i only had to store one pointer for one object that containst other objects that refers to pointers in other parts of the code that would save alot of work i guess. but when refering to different objects that are stored in different parts of the heap i guess the object design have to be tailor-made for that, which i have not when i started.
Quote:
Sesse: At least under Windows mmaping is way faster than simple loading, at least when I've tried, yes. Consider it - when you use ReadFile, the system has to allocate some cache space, load from disk into cache, then memcpy into your destination buffer. With mmap, that last memcpy (and the associated cache shenanigans) doesn't need to happen.
And here comes the part where you explain how any of this is even remotely relevant to a use case that wasn't explicitly proposed to begin with but, let's be fair, can be guesstimated.