Skip to main content
Tom Flynn perguntou em #Trailhead
I feel like I must be mising something with the code provided for this method. Near the beginning of the method the instructions tell you to create a window of size 'n'.  

Later on, you pop the center word out of the window so I believe the length of the window would now be n-1.

At the end of the CBOW senction there is the following if statement:

if len(window)<n:

   continue

It seems to me that len(window) is always n-1 at this point. As a result, you continue and thus never add anyting to the examples list so the CBOW samples end up zero.

What am I missing???

Tom

 
3 respostas
  1. 30 de mar. de 2019, 05:26
    Hi Tom,

    Greetings to you!

    Use this code :- 

    def construct_examples(numericalized_sentences, vocabulary, num_examples=int(1e6), n=5, sg=True, k=0):

      examples = []

      while True:

        ⌗ TODO: select a random sentence index using random.randint and get that

        ⌗ sentence. Be careful to avoid indexing errors.

        sentence_idx = random.randint(0,len(numericalized_sentences)-1)

        sentence = numericalized_sentences[sentence_idx]

        ⌗ TODO: Select a random window index using random.randint

        ⌗ and obtain that window of size n. Be careful to avoid indexing errors.

        window_idx = random.randint(0,len(sentence)-1)

        window = sentence[window_idx:n]

        

        if len(window) <= n//2:

          continue

          

        ⌗ TODO: Get the center word and the context words 

        center_word = window[int(round(len(window)/2))]

        context_words = window

        context_words.remove(center_word)

        

        ⌗ TODO: Create examples using the guidelines above

        if sg: ⌗ if Skip-Gram

          context_word = context_words[random.randint(0, len(context_words)-1)]

          example = [center_word, context_word]

        else: ⌗ if CBOW

          example = [context_words, center_word]

          if len(window) < n:

            continue

          

        if k > 0: ⌗ if doing negative sampling

          samples = [random.randint(0, len(vocabulary.index_to_word)-1) 

                     for _ in range(k)]

          example.append(samples)

          

        examples.append(example)

        if len(examples) >= num_examples:

          break

      

      return examples

    I hope you find the above solution helpful. If it does, please mark as Best Answer to help others too.

    Thanks and Regards,

    Deepali Kulshrestha
0/9000