1. don't can contain either one or two words, depending on whether your corpus query system treats the apostrophe as a word separator or not, or whether you have decided to split this string into "do" and "n't" when you annotated the corpus.
2. as well as can contain either one or three words, depending on whether multi-word prepositions have been annotated as single items in the corpus.
3. give up would normally be considered as containing two words (a verb and a particle), though of course you can decide to treat it as a single word, i.e. a phrasal verb. Normally phrasal verbs are not annotated as single units in corpora, because it is very difficult to automatically distinguish all and only the phrasal verb uses from the verb + preposition uses.
4. Normally, cardiovascular system will be treated as two separate words, though you can decide to annotate your corpus so that this phrase is treated as a single unit.
Summing up:
Using the white space as word separator works in most cases, but not always.
- In corpora stored as raw text, contractions are usually treated as single words, while multiword entries are always treated as sequences of several words;
- In annotated corpora, the situation can be different, depending on the decisions made when annotating the corpus.