Lets ignore sigops for a moment, as they're usually irrelevant anyways.
Lets also ignore dependencies for a moment (which are always significant), so lets just assume all transactions are just spending distinct confirmed outputs.
There is a simple greedy solution: Order transactions by feerate, take the highest feerate first. If the sizes of the transactions line up with the maximum block size such that the last transaction you add exactly fills the block then this trivial algorithm gives the optimal solution.
Now, of course, you seldom will fill exactly-- so how bad can this algorithm do? Well in the worst case that as soon as one transaction won't fit you just stop then this algorithm would waste the worst feerate included times the amount of weight left behind. The maximum amount of weight left behind is the maximum size for a transaction. In practice a greedy solution will be somewhat better as instead of terminating you'll continue to insert the next best that still fits. This won't be necessarily optimal because you might have been better off going back and skipping a transaction in order to better fit multiple lower feerate txn, but it'll still never be worse that just terminating a max size txn from the end.
So given that feerates are kinda low at the end of blocks and the maximum transaction size is small relative to the size of a block, the worst case approximation error of this algorithm is not so bad.
Reality isn't quite so kind, however, because transaction dependencies that we ignored above actually do exist.
So now lets consider those:
It turns out that if you have a set of dependent transactions (a "partial order knapsack"), finding the optimal order to include them appears to take potentially exponential time, though many cases can be solved quickly. (By potentially exponential I mean that the bitcoin core devs have an algorithim that produces an optimal ordering, and that algorithm has an exp-time worst case... but AFAIK they don't have a proof that there is no polytime solution to the specific problem of finding the best order).
If you take the set of transactions and divide them up into independent clusters and then find and optimal inclusion order for each cluster, you can get an optimal order (in the same sense the dumb greedy algorithm is optimal-- that it is the optimal if the length just happens to line up exactly, and has a worse case bound by the left-out cluster) for the whole block by round-robin taking best feerate part from each independent cluster. ... but in this case the worst case approximation error is the bottom feerate times the maximum weight of a cluster which is obviously less tight than the independent case because while transactions are limited to a portion of the block a cluster could be bigger than a whole block. Usually not, however, so in practice this approach gives pretty good solutions.
The current behavior in Bitcoin core mostly works like the initial greedy algorithm mentioned above, but there is some special handling (called 'ancestestor fee rate') intended to get better results when a child transaction effectively pays for its parents. There is a bunch of recent ongoing development to move to the second approach I described, which is called cluster mempool.
Going back to the OP's question, you absolutely can use an integer linear programming solver to pack blocks, including all the details like the sigops limit. A couple years back I made a little script that read the mempool from rpc and fed it to the coin-or CBC tool, and in my testing it would yield a proven optimal solution in a half second or so. That sounds kinda fast but really the node solving needs to be quite a bit faster, though perhaps most of my time was spent just doing IO.
In terms of the gain, my testing showed that sometimes it produced a fair bit of additional fees, like a quarter bitcoin. But extending the income difference to multiple blocks is hard and the results can be misleading: E.g. my tool would show block after block that 0.2 extra btc could be earned. But that wasn't true because the gain came from some transactions that were left out over and over again, as soon as a single miner picked up that excess it would be gone. The actual amount of additional fees that could be obtained from a smarter solver is, in fact, quite small (zero when there isn't a big backlog, in fact).... which probably explains why no one was doing it at the time.
So why is Bitcoin core bothering with the cluster mempool stuff? Because the same mempool logic needs to be used for transaction relay and eviction so that the network isn't wasting bandwidth relaying stuff that won't get mined but does relay all that will get mined. And for that it's extremely important that the solutions be incremental and fast, running a half second ILP solver for every single txn that gets relayed won't fly.