cbitcoin - Bitcoin implementation in C. Currently in development. - page 9.

Haplo

full member

Activity: 168

Merit: 100

As I said, C is not the best language for memory management x.x.

MatthewLM

legendary

Activity: 1190

Merit: 1004

Perhaps using the garbage collector would be nice since I'm now struggling with a memory leak made by CBCreateAddressVT. The virtual function table should get freed from everything I see. :-( Once again I've forgotten to pass a pointer by reference (ie. pointer to a pointer). :-( Easy enough to fix.

No idea how to install libgc on OSX. Sad

MatthewLM

legendary

Activity: 1190

Merit: 1004

A tracing GC looks through the stack finding pointers to memory and if that memory is an object which the GC is responsible for it recursively does the same think until it finds no more reachable objects. The rest of the objects are collected. So all you need to ensure is that when you are done with a reference you point the reference somewhere else. This can indeed be done by pointing to NULL but is not always necessary as you may reuse pointers for deprecate objects where you reassign pointers to new objects. And as I've said a reference can go out of scope. No need to assign NULL to a pointer that becomes unreachable afterwards.

Haplo

full member

Activity: 168

Merit: 100

Quote from: MatthewLM on May 19, 2012, 02:51:36 PM

No, you wont need reference counting to get rif of dangling pointers, you simply replace calls to "release" with a NULL assignment.

Also I guess the CG will also collect garbage when pointers go out of scope. Not sure if that is true but it makes sense so.

Maybe tomorrow or the next day I might do a test with ref counting vs tracing CG vs manual malloc/free placement. I might do a test where a loop creates CBAddress objects with random data until it finds an address beginning with particular characters. That would test a mixture of object creation/destruction and algorithm execution which may be a fair test even though the test would never be a proper representation of the final library.

But I'm not doing any more today.

Ah I think I get it now. As long as you always copy pointers when passing values into functions, then you get the same effect as refcounting. I would definitely like to see a test showing that that has lower overhead than dealloc/refcounting o.0. It looks to me like you'd be doing the same work as with refcounting, except without any explicit calls to dealloc.

I don't know about scope, but letting pointers fall out of scope is bad form anyway Tongue

.

MatthewLM

legendary

Activity: 1190

Merit: 1004

No, you wont need reference counting to get rif of dangling pointers, you simply replace calls to "release" with a NULL assignment.

Also I guess the CG will also collect garbage when pointers go out of scope. Not sure if that is true but it makes sense so.

Maybe tomorrow or the next day I might do a test with ref counting vs tracing CG vs manual malloc/free placement. I might do a test where a loop creates CBAddress objects with random data until it finds an address beginning with particular characters. That would test a mixture of object creation/destruction and algorithm execution which may be a fair test even though the test would never be a proper representation of the final library.

But I'm not doing any more today.

Haplo

full member

Activity: 168

Merit: 100

Quote from: MatthewLM on May 18, 2012, 12:44:31 AM

Well this is the issue, what can be figured out now and what has to be fixed later. ;-) If referencing counting can be rejected early on then it makes coding easier; no need to completely re-implement memory management at a later stage.

I'd prefer to try and get it right to begin with but of-course if I'm wrong to begin with then it will just have to be changed.

Well, reference counting does have one advantage. After reading through that link I sent you I realized that for C you still have to set your pointers to null in order to tell the GC that your pointer is dead. In terms of assuring correctness you'd still need reference counting to avoid dangling pointers x.x.

Weak types = no automatic GC. I'd suggest another language but most languages suck Tongue

.

MatthewLM

legendary

Activity: 1190

Merit: 1004

Well this is the issue, what can be figured out now and what has to be fixed later. ;-) If referencing counting can be rejected early on then it makes coding easier; no need to completely re-implement memory management at a later stage.

I'd prefer to try and get it right to begin with but of-course if I'm wrong to begin with then it will just have to be changed.

jgarzik

legendary

Activity: 1596

Merit: 1100

Some of this is putting the cart before the horse

You cannot know what is the best allocator until you have an idea of the lifetime and usage pattern of the allocated objects...

MatthewLM

legendary

Activity: 1190

Merit: 1004

Thanks for the feedback, it's made me think a bit. I'm thinking garbage collection may not even be required such that I can make the library without reference counting or tracing GC. I could make one version with GC and one not where objects just have to be freed at the right time.

I'll think more about this later... Should be asleep but I seem to have insomnia. Sad

Haplo

full member

Activity: 168

Merit: 100

Quote from: MatthewLM on May 17, 2012, 08:24:21 PM

Well what garbage collecting library would you recommend for C (This? http://www.hpl.hp.com/personal/Hans_Boehm/gc/)? Also does the mercury programming language use memory like C does? If it used the stack where memory management doesn't matter, then it could be an OK comparison. Remember only dynamically allocated memory needs to be managed.

Declarative languages are strange compared to imperative languages like C. By default, prolog (and its derivatives like mercury) use what they call a WAM, which basically means everything is allocated to the stack. However, mercury also has the option to use a Boehm collector, and does allocate things to the heap. I don't know everything about how that works in mercury, and unfortunately as awesome a language as it is, it isn't well documented (and I haven't gotten around to digging through the installed examples yet).

Also, declarative languages using a WAM can generally end up with poor memory usage due to the FILO nature of stacks, and thus started using garbage collectors to collect memory on the stack o.0.

I'm not sure if there are collectors available for C besides the boehm. I did, however, find a nifty article on using the boehm library and getting good mileage from it: from linuxjournal.

EDIT: And they get better performance out of the Boehm than they do from malloc/dealloc.. which really surprises me. The only consideration is that you'll want to make sure to disable any compiler optimizations that mess with pointers in your build. The Boehm library is also pretty configurable, so you can make it do whatever you need it to do (ie for embedded systems).

MatthewLM

legendary

Activity: 1190

Merit: 1004

Well what garbage collecting library would you recommend for C (This? http://www.hpl.hp.com/personal/Hans_Boehm/gc/)? Also does the mercury programming language use memory like C does? If it used the stack where memory management doesn't matter, then it could be an OK comparison. Remember only dynamically allocated memory needs to be managed.

Haplo

full member

Activity: 168

Merit: 100

Quote from: MatthewLM on May 17, 2012, 06:04:10 PM

Tracing GC would not be good for portability with embedded devices right?

It depends on if your embedded application has actual real-time requirements from bitcoin. I don't see bitcoin having any such requirements since most of what it does is networking and/or churning away at validating the blockchain, updating current balances and so on. Very little of it is actual user-interaction, which might be taken care of by a separate gui layer with different memory management.

If you actually need real-time performance for any reason, you'd have to use either something which has no pauses, ie refcounting with a freelist/standard C alloc/dealloc, or a concurrent GC. Concurrent GCs perform worse overall than stop-the-world, but they have minimal pauses so that things like real-time video chat or a phone's normal calling routines aren't choppy.

I know that they don't use garbage collection in the iOS obj-c framework, but I don't know what other phone developers do. I know that it's possible to compile GC for an embedded system, but there may be other reasons against it's use. Theoretically it shouldn't break things though, since the OS kernel being preemptible and using a realtime scheduler are the main mechanisms for enabling real-time performance. I would ask someone who actually has some experience with programming for phones. They would know better than I do.

Quote from: MatthewLM on May 17, 2012, 06:04:10 PM

I really can't see how reference counting is a major performance issue at all since the retain/release calls are in-between a lot of other code which is much more intensive.

Memory management is one of the biggest overheads incurred by any and every program. It also performs no useful work in regards to a program's 'business' code. As a result, every single instruction of MM overhead is significant WRT overall performance. Just to give you the gist, experiments with region-based memory management and compile time GC in mercury showed speedups of 100% or more for some applications, and no less than 33% speedup for any, while using 66% less memory (or less) than with standard GC. Of course, that was a declarative language with strong-static typing, modes, determinism, and scope declarations, but I think it should give you a good idea of what to expect with changes to memory management in general.

MatthewLM

legendary

Activity: 1190

Merit: 1004

Tracing GC would not be good for portability with embedded devices right?
I really can't see how reference counting is a major performance issue at all since the retain/release calls are in-between a lot of other code which is much more intensive.

Haplo

full member

Activity: 168

Merit: 100

Eh, I can understand not wanting to include any extra libraries, but I'm not so sure about GC having any effect on portability (except maybe on card-readers, which is hard to predict).

I see what you're trying to do with removing some unnecessary refcounts, which has been done (probably more effectively) in automatic reference counting systems, but I wouldn't recommend trying to do it manually. Basically, the case you listed (passing control of an object completely to another object with zero effect on count) is rarer than the more common case where an object is passed temporarily to an object which quickly uses and releases it with no effect on refcount (ie the original owner maintains control). If you want to cut out code like that, I'd suggest doing it on an expendable copy of your code later, and just counting everything for now. You're more likely to break stuff otherwise, which defeats the point of manual refcounting.

Btw, refcounting isn't only inefficient for many dead objects because of extra unnecessary counts; tracing is just that much more effective/efficient on dead objects. Age-oriented GC exploits that by having the young generation collected by a tracer and the old generation automatically refcounted. Personally I wouldn't go for all that complexity for a bitcoin library. It's not like you're decoding video for playback or anything, so a stop-the-world, copying tracing collector would be pretty efficient and work just fine, if you decide to go that direction.

MatthewLM

legendary

Activity: 1190

Merit: 1004

Using a tracing garbage collector might increase performance a tiny amount but at the cost of portability. It would add a dependency to the library that I don't want. The reference counting is simple enough in my opinion. Reference counting can be optimised by removing redundant retain/release calls and in fact might make the library easier to use. Here is an example of what I might do:

At the moment, as shown in the testCBAddress code, you make a CBAddress object like this:

Code:

CBString * addstr = CBNewStringByCopyingCString("1D5A1q5d192j5gYuWiP3CSE5fcaaZxe6E9");
CBAddress * add = CBNewAddressFromString(addstr, false, &events, &dep);
CBGetObjectVT(addstr)->release(&addstr);

As you can see the CBString is no longer needed by the calling function, so it is released. This is a redundant release. The CBNewAddressFromString constructor can be modified so that it assumes control over the reference of the calling function. That means the constructor no longer retains the object. This removes the need for a retain and a release call. THe code could be written like this:

Code:

CBAddress * add = CBNewAddressFromString(CBNewStringByCopyingCString("1D5A1q5d192j5gYuWiP3CSE5fcaaZxe6E9"), false, &events, &dep);

The CBAddress object makes a reference to the CBString, assuming the calling function no longer needs it. However in the cases where the function will need the CBString, something like this is needed:

Code:

CBString * addstr = CBNewStringByCopyingCString("1D5A1q5d192j5gYuWiP3CSE5fcaaZxe6E9");
CBGetObjectVT(addstr)->retain(addstr);
CBAddress * add = CBNewAddressFromString(addstr, false, &events, &dep);
// Continue using CBString

In fact if caching is enabled by passing true to the second argument in CBNewAddressFromString, then the calling function doesn't need to call retain on the CBString until the CBAddress is released, since the CBAddress would keep the CBString. This level of optimisation is of-course more confusing and probably practically worthless. It could also introduce problems with future compatibility. I don't want my library to guarantee that an object will hold another object for it's lifetime.

But the method of making the calling function responsible for retaining an object if it needs it after passing it to another object could make some things easier. Perhaps I could make special constructors like CBNewAddressByTakingString which assumes the calling function no longer needs the reference. And these constructors can be made for particular objects where it may be common that another object will no longer be needed after passing it to the object. So you'd have something like this:

Code:

CBAddress * add = CBNewAddressByTakingString(CBNewStringByCopyingCString("1D5A1q5d192j5gYuWiP3CSE5fcaaZxe6E9"), false, &events, &dep);

Maybe calling it CBNewAddressByStealingString would avoid any confusion. I will of-course make such things as clear as possible in the documentation.

So question: Should all constructors and methods that take objects not retain them so that the calling function needs to have a retain if necessary, should it only be some methods/constructors (Clearly named to avoid confusion), or none like it is now?

Haplo

full member

Activity: 168

Merit: 100

Quote from: MatthewLM on May 16, 2012, 08:22:11 PM

Quote from: Haplo on May 16, 2012, 07:07:59 PM

I don't see what reference counting has to do with endianness, but generally it's only useful for memory management of long-lived objects. Are you really writing your own memory management code? o.0

In my post I was talking about two separate issues. The endianness is a bit annoying but it's fine and I did decide to use reference counting for strings in the end.

The memory management is a simple reference counter. It is nothing complicated and makes the library much more straight-forward when passing objects everywhere, as no-doubt will be the case with a library like this. You can see what I've done by looking at the CBObject files. Look at the testCBAddress.c file for an example of using a CBAddress object with memory management included.

Ah gotcha. I feel that though, retain/release is at least 10 times easier than trying to place deallocs correctly (which is sometimes impossible). You mightcould use an already-been-designed C garbage collector instead and save yourself the trouble in that case. Refcounting is very inefficient when many objects die, and overall doesn't perform any better than a GC, which is zero effort and no chance of screwing anything up Tongue

.

Honestly, weak-typed imperative languages suck for memory management x.x

MatthewLM

legendary

Activity: 1190

Merit: 1004

Quote from: Haplo on May 16, 2012, 07:07:59 PM

Quote from: Pieter Wuille on May 09, 2012, 08:01:16 PM

If you're going to rewrite C++ in C (by building virtual function tables yourself), why not code it in C++ in the first place? If compatibility is a problem, you can always expose a C interface to the code.

Because C++ has craptastic performance and tramples all over C conventions, both for no reason. -The Linux Kernel Dev Team

"you can write object-oriented code (useful for filesystems etc) in C, _without_ the crap that is C++." That made me smile.

People tell me C++ can be just as efficient as C. Whether or not there is a performance issue, I don't care. I don't like C++, I like C.

Quote from: Haplo on May 16, 2012, 07:07:59 PM

That said, I approve of this project, and will probably donate to that end soonish.

Well that would be very kind, thank you.

Quote from: Haplo on May 16, 2012, 07:07:59 PM

I don't see what reference counting has to do with endianness, but generally it's only useful for memory management of long-lived objects. Are you really writing your own memory management code? o.0

In my post I was talking about two separate issues. The endianness is a bit annoying but it's fine and I did decide to use reference counting for strings in the end.

The memory management is a simple reference counter. It is nothing complicated and makes the library much more straight-forward when passing objects everywhere, as no-doubt will be the case with a library like this. You can see what I've done by looking at the CBObject files. Look at the testCBAddress.c file for an example of using a CBAddress object with memory management included.

Quote from: GroundRod on May 16, 2012, 07:45:48 PM

A project after my own heart, will be checking in from time to time, see if I can make sense of some of the things your talking about.

Wish I had more time this summer, this would be a fun one for me too...if by chance I get to the point of being helpful, will let you know MatthewLM!

Thanks! PM me anytime or send an email to [email protected].

Next I'm going to implement the script interpreter. A lot of op-codes to do and some complexities here and there but I think it wont be too difficult. I will probably start on saturday since I'm busy until then with other things.

GroundRod

full member

Activity: 206

Merit: 100

A project after my own heart, will be checking in from time to time, see if I can make sense of some of the things your talking about.

Wish I had more time this summer, this would be a fun one for me too...if by chance I get to the point of being helpful, will let you know MatthewLM!

Haplo

full member

Activity: 168

Merit: 100

Quote from: Pieter Wuille on May 09, 2012, 08:01:16 PM

If you're going to rewrite C++ in C (by building virtual function tables yourself), why not code it in C++ in the first place? If compatibility is a problem, you can always expose a C interface to the code.

Because C++ has craptastic performance and tramples all over C conventions, both for no reason. -The Linux Kernel Dev Team

Also why gnome decided to start from scratch with gobject, although I can't vouch for gobject's characteristics either.

That said, I approve of this project, and will probably donate to that end soonish.

Quote from: MatthewLM on May 16, 2012, 02:08:16 PM

Thanks. It may be useful to refer to parts that I can't get from bitcoinj such as the script interpreter. Wink

It seems I've made a little problem. I'm storing bitcoin addresses backwards at the moment since my base-58 conversion code works with little-endian data. Probably makes sense to reverse the result of the base-58 conversion so they are stored the right way around instead of reversing hashes and whatever.

etotheipi was right about the endianness issue! :-)

I have to make some interesting decisions. Should I have a reference counted string structure? It might help with the bitcoin address strings.

I don't see what reference counting has to do with endianness, but generally it's only useful for memory management of long-lived objects. Are you really writing your own memory management code? o.0

MatthewLM

legendary

Activity: 1190

Merit: 1004

Well I decided the easiest and most sensible thing to do was indeed make a new structure inheriting CBObject named CBString as a wrapper for strings which does reference counting. Then it is also consistent with everything else.

Addresses work! I think I'll start working on the script interpreter next.

Topic: cbitcoin - Bitcoin implementation in C. Currently in development. - page 9. (Read 20319 times)