In this case, the Twitter profiles of the authors are available, but these consist of freeform text rather than fixed information fields.
And, obviously, it is unknown to which degree the information that is present is true.
Gender recognition has also already been applied to Tweets. (2010) examined various traits of authors from India tweeting in English, combining character N-grams and sociolinguistic features like manner of laughing, honorifics, and smiley use.
With lexical N-grams, they reached an accuracy of 67.7%, which the combination with the sociolinguistic features increased to 72.33%. (2011) attempted to recognize gender in tweets from a whole set of languages, using word and character N-grams as features for machine learning with Support Vector Machines (SVM), Naive Bayes and Balanced Winnow2.
The resource would become even more useful if we could deduce complete and correct metadata from the various available information sources, such as the provided metadata, user relations, profile photos, and the text of the tweets.
The authors do not report the set of slang words, but the non-dictionary words appear to be more related to style than to content, showing that purely linguistic behaviour can contribute information for gender recognition as well.In this paper we restrict ourselves to gender recognition, and it is also this aspect we will discuss further in this section.A group which is very active in studying gender recognition (among other traits) on the basis of text is that around Moshe Koppel. 2002) they report gender recognition on formal written texts taken from the British National Corpus (and also give a good overview of previous work), reaching about 80% correct attributions using function words and parts of speech.While Hall’s survey didn’t ask for sexual orientation, he suggests that gay men may use social media for dating more than women or heterosexual men.Maybe you should ditch Match.com, OKCupid and J-Date look for love in another corner of the Internet -- World of Warcraft.For our experiment, we selected 600 authors for whom we were able to determine with a high degree of certainty a) that they were human individuals and b) what gender they were.