Backup of Quora content
In the last post I was talking about how I wanted to delete my Quora account. But first I wanted to make sure I actually backed up anything that was of interest there.
What I'm going to do in this post is have each heading be a Quora question then have the body be my answer.
Can I learn the Python language in one year or two?
While you can certainly get enough proficiency in a year to be able to implement projects getting to a level of mastery is a much tougher proposition. A large part of the appeal of Python is just how much you don’t need to know in order to be productive on many projects. Take for example writing a web app in Django, you have the ability to get a lot done without needing to know all the details of how it works. And it turns out there’s a lot of details, have a look at this article to get an idea of just how much goes on behind the scenes:
How a web page request makes it down to the metal
Mastering the language involves a deeper understanding into various Python implementations (CPython, PyPy, etc) and a deep knowledge of the standard library. These things will take more than a year to learn if only because Python is such a broad language. Everyone has to start somewhere though, so a year of learning will very much get you on the right track towards mastery. I’ve been working with Python just about every year since 2.5 and I find myself learning new things every year, mastery as always is a moving target.
Why is deep learning most commonly done in Python?
I'm not even sure that this is the case, because productionizing Python machine learning code is a massive pain in the ass, but assuming it is my answer would be as follows:
The popularity stems from the ease of writing the code and getting your machine learning systems to interface with the outside world. For this Python is very easy to be productive in and as a result Python often ends up being used as a domain specific language for machine learning frameworks as a result. With deep learning you are often specifying some sort of computation graph that then gets handed off to a library to be evaluated. Note that the computational heavy lifting is usually not done in Python but rather in other external libraries that are usually using highly optimized compiled code. Take for example tensorflow, it has a nice Python interface but the computational engine exists completely outside the Python interpreter.
What Python allows you to do is quickly test your hypotheses. Which network topology is best? Which feature engineering decisions are good for this project? You can fairly quickly iterate proofs of concepts.
For tasks where the quality of the code and the ease of deployment of the code are not requirements, for example doing some analysis for a one off report, the speed of development is a huge win. In these sorts of scenarios Python is popular.
What are multidimensional arrays in C++ actually used for?
They are good for storing non-sparse multidimensional data.
Of course you should profile your code carefully as the optimal data storage choices are very architecture dependent.
What's the best software to design a neat resume/cv?
I'm a fan of using LaTeX for writing resume/cv. This is because I can create a template and have full control over all aspects of the typesetting. That said there might be a number of better options that have been purpose made just for this task in this space these days that I'm not aware of.
Is using malloc() correct in C++?
If you are writing C++ code then you should almost always choose to use new
instead of malloc
. One of the main reasons malloc
is around is for compatibility with C.
When you call new
you are invoking a constructor call, however you may want to deal with raw memory. You may be tempted to use malloc
in this case but there's a built in part of C++ called operator new that does this in an idiomatic C++ way. Maybe you have some odd use case where you want to use realloc
to reuse memory but I'll assume that you have profiled your code and know what you are doing in that case.
So at the very best using malloc is just not idiomatic C++ but at worst its indicative of a mindset that's essentially writing C but with a C++ compiler.
Has OOP ever been utilized in Nginx creation?
The source code for the project can be found here:
So you should be able to see for yourself :)
I'd highly recommend learning how to do your own research in these sorts of situations, it will pay off longer term.
Is C++ a good first programming language?
The crux of why C++ is not a good first language is due to the fact that C++ is a multi-paradigm language which aims at supporting many ways of thinking. If you are a beginner however you aren't going to be proficient at thinking in terms of any of the paradigms of software construction. From the point of view of a beginner starting with a single paradigm and then branching outwards once you get more comfortable is likely to yield much better results than trying to learn many paradigms at once.
Some people get tempted to try teaching restricted subsets of C++ to people in order to make it easier to understand. By focusing in on a single paradigm and just using the C++ language features that pertain to that paradigm you do lessen the learning curve. However you start teaching not C++ but some subset of C++. If you want to keep the complexity of the learning curve down it is likely better from an educational point of view to find a language that was designed with a focus on one paradigm and therefore does so in a much simpler and easier to comprehend fashion.
Additionally C++ has prioritized being the ability to express a large number of concepts with minimal overhead but this has come at the expense of adding a lot of complexity to the language. There's nothing stopping you from dealing with low level details but this also means there's nothing stopping you from encountering the bugs that come about from dealing with low level details. The language itself is very complex (as seen by how big the specification is) and there are a ton of pitfalls that you need to avoid in order to use it effectively. Getting started with programming is already hard enough to learn without having to also deal with all these extra details.
I think there's a lot of value in learning C++ if at some point you want to learn systems programming. But I wouldn't recommend C++ as a first programming language.
What subset of C++ is most often used?
C++ is a large language with a lot of different parts. Depending on the problem being solved different subsets of the C++ language make sense.
There's 2 main reasons for choosing a subset of the language:
- Business value related reasons
- Lack of availability of certain language features
Both these reasons are heavily influence the levels of technical debt in a project.
Some subsets of the language I have seen:
"C with classes":
Unfortunately this is a fairly common approach that I have seen with less experienced C++ developers. Some people essentially write C code in C++. There might be small changes such as using new
instead of malloc
and iostreams instead of printf
and similar but the code is really idiomatic C code aside from that.
A good sign of this is when people don't use modern C++ data types (things like std::vector
and similar) in appropriate circumstances and just have raw arrays everywhere.
This is usually indicative of poor quality C++ as it misses out on many of the good things about C++ without the benefit of having the simplicity of C. I wouldn't consider this a good subset of the language but it is somewhat common.
Coding standard subsets:
Sometimes you might mandate your own coding standard that has various restrictions on what parts of the language are used. Which subset is best will depend on your team and what problems you are trying to solve. The cost/benefit of certain features will depend on your project or business and is therefore hard to say anything concrete about that would apply to everyone.
Certain language features can add additional complexity and maintenance overhead they add to a project. Sometimes this will be worth it from an overall productivity point of view and sometimes it won't be. Which features you choose to use will depend on the project and the team you have working on it. A common example of a feature that gets avoided are some of the template metaprogramming techniques. These are difficult to get correct (as they require a fair bit of knowledge) and can be hard to maintain due to toolchains having poor support for them (for example the horrendous error messages you get)[^1]. This problem is exacerbated in earlier revisions of the C++ standard that have less support for templates. As a result I've seen many teams avoid this part of the language.
Sometimes you might want to give the more junior developers on the team a more restricted subset of the language than the senior developers. A common breakdown of responsibilities is to have the memory management code developed by a core group of people (usually the most senior) and then have easier to use interfaces available to everyone else. By encapsulating some of the most nasty memory management you can let the team have the benefits of this without needing to know all the details.
Embedded systems:
Embedded systems cover a lot of ground but for the purposes of the example I'm referring to resource constrained environments. In this environment you have some language features that are undesirable and some that are just unavailable. Scott Meyers has some good material on using C++ in an embedded environment here: http://www.aristeia.com/c++-in-embedded.html
Avoiding Undesirable features on the target platform:
Usually you won't have good access to dynamic memory management in such an environment. Things such as new and malloc in their standard library implementations may either not exist at all or might introduce an unacceptable amount of memory fragmentation or management overhead for your platform. You might want to create your own memory management scheme using something like placement new but the cost to your project might be too high.
You might decide because of this to use a subset of the language that works well with the target platform.
Unavailable features:
Sometimes some language features might be outright missing as your compiler or platform or both don't have support for certain parts of the language.
Frequently you might not have a good means of dealing with exception handling and if you do have this available you might not have the resources to make exception handling a good choice for your project. Additionally there might be other missing language features, for example when I was working on a project with AVR-GCC implementation of C++ there was not access to pure virtual functions without having to implement them yourself. You might want to write your own __cxa_pure_virtual
handler or you might not. Once features aren't supported by the compiler toolchain or platform you are targeting it starts to become tempting to exclude those features from your coding standards.
So to summarize most times the best subset of the language will depend on your project and your team. No one subset of the language is best in all cases, make an informed choice based on your requirement/project spec.
What would Python be your first choice for?
Any time you are thinking of designing a program you are looking at various tradeoffs. Choices of language are no exception. One particular tradeoff is the amount of time you need to spend up front to get everything working vs the amount of time you could potentially save later on from a more time consuming setup.
For example let's say that you are trying to write a utility to decompress some files then rename them and re-compress them, this has to be cross platform and work on say Linux and Windows. For the sake of the example say we have the choice of C++ or Python to do this.
If we were to use a language such as C++ for this we would need to:
- Set up our build system which might be as simple as a small shell script to call our compiler or perhaps something like make or SCons if we also want to build tests and run them.
- Then we need to track down libraries for the file compression, perhaps we have to build these from source.
- Then for the cross platform support we might look into Boost (Boost Filesystem) or similar so that we don't spent hours to deal with various cross platform issues and edge cases. If we have to compile boost from source then we can pretty much write off the next few hours.
If we were using Python then we would need to:
- Write the code! all the functionality we need is in the standard libraries.
In this case we probably spent many multiples more time in the setup phase for the C++ than we did programming the logic for our program. So in this particular case reducing the setup time required would be a big win. Python often lets us be productive for this reason and allows many small utilities to be written in a much reduced amount of time. In an industry where saving time is extremely valuable this can be a huge ROI win. In the time it took for boost to compile from source I might have already wrote the entire utility in Python along with testing it out on some files.
Python tends to do very well when the main goal is to do one-off type things, in larger projects this benefit is reduced somewhat as the time spent on setup is a much smaller percentage of the overall development time. Python of course isn't limited to just one off applications but it is often my first choice for these.
Now lets say I was creating some software that many thousands or millions of people used in their jobs every day (like Git or an email client) then I'd really strongly hesitate to use a language like Python. The performance of Python relative to other languages is lower and there's a very real cost to your users if your software requires more RAM or is very slow. If you have a million users every day and the Python code takes say 3 seconds longer to load then 3000000
seconds is wasted or 34.72 days of time across all your users. These sorts of things add up very quickly. Compare this to some software where you are making a one off report then an extra few seconds of start up time doesn't matter at all, getting the code created faster is best.
Python tends to be my first choice when the main requirement is a short development time.
What are the uses of non deterministic algorithms?
They can be good for quickly getting bounds information that can be used in other algorithms.
For example if you can prune the search space by using information about the bounds of that portion of the space you have the potential to reduce the required effort of searching
I really want to program embedded systems, and I am studying and practicing C programming so hard, but I hate C programming really, what do I do?
You need to identify why you hate C then find ways to overcome that hate. If you want to be employable in the embedded systems realm (in 2017) then you really need to know either C or languages that are very similar to C.
Why do we have coding whales (developers that do a gazillion things)?
I'm going to hazard a guess that there are many underlying assumptions people have about productivity being linear to the skill of the developer that prompts questions like this. In development productivity is highly non-linear. It's not like factory work where the task might be simple but the difficulty is in making it fast. There are certain tasks where lesser skilled people simply have no chance of being able to complete the task at all. In software development there happens to be a lot of situations where there's a minimum amount of skill required to get any solution, let alone a well engineered one.
So if you have a development task that is a crucial part of the business then you need to get someone who can do it. 3 people who can't complete the task at all isn't going to provide the business as much value as one person on a 3 times higher salary who can. So in many of these situations the only way to go is to hire the people who can get the job done, and depending on the market they might be expensive.
Having worked on some highly specialized products I've seen countless examples where high quality talent was simply required or the product couldn't be made. These people don't come cheaply either. However in some projects the coding involved is mundane and requires a much low skill level, in these cases hiring more people with an average lower skill level for less pay is a more valid/viable approach.
Question from the comments:
Could you kindly give me an example of a project that would require a \\(120,000+ person to solve? And if you could kindly indulge me in how a \\)60,000-\\(70,000 brain with two \\)30,000 brains in assistance would not be able to solve this?
I was working on a product with a team and we were aiming to solve some hard problems in combinatorial optimization. From looking at the literature we couldn't find anything out there that would solve our particular problem in any reasonable amount of time given the resources we had so we had to do research and come up with our own algorithms. Then we started running into substantial problems implementing our algorithms with the equipment we had (not enough RAM, unsupported libraries for languages, limitations of the systems we were using, etc). Solving these problems was difficult and required a high level of expertise at times. Now another group that was working on a very similar problem to us took a different approach, they just threw millions of dollars at getting more computing power and used a slightly less sophisticated approach. Would it have been cheaper for them to hire more people? Because I don't know their business model I really can't say. Was it cheaper to hire me and some other elite staff and save millions of dollars of OpEx on server costs, possibly yes?
In the last example it was possible to spend money elsewhere instead of on staff, but this isn't always possible. There's a lot of things where you just have to know how to do something in order to do it, no amount of money will help if you get something done if you don't know how to do what you need to do. Whenever a certain level of seniority is required to make any progress at all then hiring people below that skill bar is not going to actually save you any money. This will just cause a lot of frustration as these staff flounder around.
There's a bunch of other examples in finance or any other business situation that are accurately modeled by a zero-sum (or more precisely a constant sum game). In these situations if your product is not as good as your competitors you simply don't make as much money. Depending on what's at stake hiring more skilled developers can generate the business a ton of value.
Another similar situation are winner-take all markets. If you are in one of these markets and are making some app then you'll want to win the market. Because of the non linearity of rewards it came make a lot of sense to invest in more skilled staff.
Then there's another common class of problems that's entirely different and aren't driven by solving technical difficulties. Often you have some specifications that aren't cutting edge and you need to solve a bunch of tasks that, while important to the business, are mundane tasks from a technical point of view. In a situation like this having one skilled team lead to come up with an overall plan and architecture and a bunch of people doing the smaller less skill intensive tasks is a perfectly valid approach.
[^1] I wrote this answer in 2015 based on my experience of older compilers, modern C++ standards while compiled with modern C++ compilers do a lot better job these days.